-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add crates.io enichment option for rust audit binary, json schema and spdx license updates. #3554
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jimmystewpot, thanks for this enhancement; overall this looks great, and it's much appreciated that you've followed conventions really well. I left a few specific comments, but the biggest takeaways, I think are:
- "duplicate" the configuration struct (but it won't be completely duplicated -- for the multilevel configuration, you'll use
*bool
whereas therust.CatalogerConfig
would have abool
, for example) - we probably don't want to choose between one metadata type or the other, but rather add a way to keep both (though the suggestions I have are only suggestions and I'd like to run these by the team when we start to introduce new patterns for things). maybe the best thing is just to add the fields to the existing structs and not worry about having to support multiple metadata types just yet
- you'll need to Sign-off your commit(s) see contributing.md
|
||
// NewCargoLockCataloger returns a new Rust Cargo lock file cataloger object. | ||
func NewCargoLockCataloger() pkg.Cataloger { | ||
return generic.NewCataloger("rust-cargo-lock-cataloger"). | ||
func NewCargoLockCataloger(opts CatalogerConfig) pkg.Cataloger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this opts is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, and I wanted to get this pull request up for review before I invested the time in making it available to both catalogers. Once the details are refined, do you want me to add them both to the same pull request, or can I submit a second pull request with the second cataloger?
type CatalogerConfig struct { | ||
InsecureSkipTLSVerify bool `yaml:"insecure-skip-tls-verify" json:"insecure-skip-tls-verify" mapstructure:"insecure-skip-tls-verify"` | ||
UseCratesEnrichment bool `json:"use-crates-enrichment" yaml:"use-crates-enrichment" mapstructure:"use-crates-enrichment"` | ||
Proxy string `yaml:"proxy,omitempty" json:"proxy,omitempty" mapstructure:"proxy"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this equivalent to the go http_proxy
environment variable? I don't think we would need a special config for this, rather just advise users to use the environment variable such that it's used for all http calls instead of needing to configure each individually. If there's really some reason that we need configuration other than the environment variable, we should figure out how to set this globally for all http requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've touched on this in the larger comment regarding a single HTTP helper or keeping it package/cataloger local.
func newCratesResolver(name string, opts CatalogerConfig) *rustCratesResolver { | ||
base, err := url.Parse(opts.CratesBaseURL) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never panic but instead use error returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any examples of the cataloger configuration handling errors. Is there an example that I could use to better handle this?
if r := recover(); r != nil { | ||
fmt.Fprintf(os.Stderr, "recovered from panic while resolving license at: \n%s", string(debug.Stack())) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there shouldn't be a need for panic recovery here -- what is panicking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the crates.io service is unavailable, the HTTP request may panic. This allows the processing to continue without the additional metadata available from the remote service.
This might be legacy behaviour in Go; traditionally, if the request fails, it might result in a panic, so I've been in the habit of having a panic in place. If this is not the case now or it's handled elsewhere in the code I can remove it.
p = newPackageFromAudit(&dep, location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.PrimaryEvidenceAnnotation)) | ||
continue | ||
} | ||
p = newPackageWithEnrichment(&dep, cratesEnrichment, location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.PrimaryEvidenceAnnotation)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the thing that we need to discuss a bit as a team how best to handle: in the case enrichment is enabled, there is no pkg.RustBinaryAuditEntry
metadata created, instead populating a richer, but different, metadata struct type. I've long been a proponent of allowing multiple metadata types, but we don't really have a standard way of doing this yet. I don't think we should have less data when enriching, but we would end up in a situation that potentially something is checking for the pkg.RustBinaryAuditEntry
type and it's not found in this case.
I've talked with @wagoodman about this, but I don't think we came to a concrete solution, although since metadata types are arbitrary we could easily add a []any
or something similar, and maybe have a helper function to find and return metadata. I don't know if we need this yet, but it definitely looks like some of the fields are being read when outputting different formats from the new enriched data.
If it were me, and the restrictions we have today exist, I might think adding a helper function in the syft/pkg
package of something like:
func GetMetadata[T any](p *Package) *T {
if t, ok := p.Metadata.(T); ok {
return &t
}
if t, ok := p.Metadata.(*T); ok {
return t
}
if metadatas, ok := p.Metadata.([]any); ok {
for _, m := range metadatas {
if t, ok := m.(T); ok {
return &t
}
if t, ok := m.(*T); ok {
return t
}
}
}
return nil
}
... or something of the sort. which would let us use it fairly simply where we need it, like:
if m := pkg.GetMetadata[pkg.RustBinaryAuditEntry](p); m != nil {
// do something with the metadata
}
... and we then could set metadata to []any{ RustBinaryAuditEntry{...}, RustCargoMetadata{...} }
. And, though it's not directly applicable here, if we migrated usage of the metadata types to this function instead of the direct type assertions we have, we could then also support merging packages more completely without losing certain metadata, etc..
Sorry for the long-winded comment here, just noting this for discussion along with some background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for the new metadata type because I wasn't comfortable with how teams integrate syft as a library and if it would break those integrations. If teams expect a particular struct to be passed back and it now has new fields, depending on how that is handled, it could break.
I directly contradict myself by making the existing structs from the catalogers uniform/consistent, which would be a potentially breaking change. This could be a complete outlier. I'll consider your recommendations and see if I can develop a plan.
I'll add more to this in a comment on the main thread rather than in the code so that it's easier to follow.
09e9ea0
to
b5b08ad
Compare
@kzantow Thanks for your prompt response; I was on leave for a week and didn't have time to get back to you with updates until today. Several pieces would be better highlighted in a response on the main thread rather than in the code review. Several other package catalogers also have enrichment opportunities from remote sources, a pattern I have added in this pull request for early feedback. Once we arrive at something everyone is happy with, I'll continue contributing to them. The following is a list of responses and comments to your feedback on this pull request (and thank you so much for being so detailed; I appreciate it and know that it takes time and is often a very thankless task). Some of the challenges with this type of feature addition are that it's hard to know how different teams/companies/integrations use syft, and anticipating them is difficult for me. The use case I am working to solve is external enrichment for a mono repo containing five primary languages/packaging types. This will introduce the need to be a good internet citizen; caching is part of that; the other part is having a client rate limiter so that the local mirrors or external sites will not be flooded with requests. This introduces a few questions that arise from your comments, and it was better to bring them up in a more direct conversation. Should the HTTP client be an internal helper library that can be reused between different catalogers? I think so. However, I want to work within the architecture you and your team desire. The benefits of this approach mean that if teams have internal artifactory mirrors (as an example), the rate limiter can be reused across the different package catalogers. It also means common proxies, user agents, and other configurations can be set globally. Adding a user-agent allows more traceability in who or what requests the remote service, aiding fast diagnostics. I've removed the user-agent from this pull request. However, having a standard or settable pattern is a worthy goal. Caching in this scenario becomes more interesting if the remote service used for enrichment supports more than a single package type. Then, a single cache for all package types should be possible. However, many organisations wouldn't mirror all their packages and rely on remote third-party sites, usually independently. In this scenario, would you imagine having a single cache or one per remote site/package type? Are there any other areas that align with adding remote enrichment that I should consider? |
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
dc11cd4
to
b3aad03
Compare
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Signed-off-by: James Lamb <[email protected]>
Description
This pull request supports remotely enriching Rust auditable binaries using crates.io. It adds the license, supplier, originator, description, and other fields (optionally if enabled) to the manifest.
This information is unavailable in the cargo lock and binary; if approved, I will add this capability to the other rust cataloger.
Type of change
I've also updated the SPDX license list, as that was failing the
make test,
and updated the JSON schema version to support the new crates-enriched metadata. There are still some missing unit tests, specifically the mocks for the crates.io lookup and caching functionality. I wanted to submit a PR early, seek guidance, and ensure this would benefit the community before investing more time in standardising it across the Rust catalogers.The new feature adds a
rust
key to the configuration that allows the feature to be turned on/off and some settings tuned for site-specific needs.Checklist: