Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed uri regex issue #3815

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

kashifkhan0771
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 commented Dec 23, 2024

Description:

This PR fixes github issue #3686
Screenshot from 2024-12-23 19-05-15

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@kashifkhan0771 kashifkhan0771 requested a review from a team as a code owner December 23, 2024 14:13
@@ -23,7 +23,7 @@ var _ detectors.Detector = (*Scanner)(nil)
var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil)

var (
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`)
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\w-\.]{3,50}:([\w-\.]{3,50})@[-.%\w\/:]+\b`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\S matches any non-whitespace character, which is very broad. Instead, we are now using \w, which matches [A-Za-z0-9_], and extending it by adding a few special characters to suit our needs.

pkg/detectors/uri/uri.go Outdated Show resolved Hide resolved
pkg/detectors/uri/uri.go Outdated Show resolved Hide resolved
pkg/detectors/uri/uri_test.go Outdated Show resolved Hide resolved
@kashifkhan0771 kashifkhan0771 self-assigned this Jan 10, 2025
@@ -23,7 +23,7 @@ var _ detectors.Detector = (*Scanner)(nil)
var _ detectors.CustomFalsePositiveChecker = (*Scanner)(nil)

var (
keyPat = regexp.MustCompile(`\b(?:https?:)?\/\/[\S]{3,50}:([\S]{3,50})@[-.%\w\/:]+\b`)
keyPat = regexp.MustCompile(`\b(?:https?:\/\/)?[\w-\.$~!]{3,50}:([\w-\.%$^&#]{3,50})@[-.\w]+\b`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing a large number of valid characters for usernames and passwords. The host pattern is also still fairly permissive and would match things that could never be valid, e.g. @----__-2as-2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

\b(?:https?:\/\/)?[\w-\.$~!&'()*+,;=:%-]{3,50}:([\w-\.%$^#&'()*+,;=:%-]{3,50})@[a-zA-Z0-9.-]+(?:\.[a-zA-Z]{2,})?\b

Added additional valid characters and fixed the host pattern too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's looking a bit better. Some notes:

  1. The scheme prefix shouldn't be optional
  2. : isn't a valid username character
  3. Username isn't always required (example)
  4. The username and password patterns are both missing special characters. It would be pragmatic to add all applicable special characters from [[:graph:]], and remove them later if they're causing issues. e.g.,
\bhttps?:\/\/[\w!#$%&()*+,\-./;<=>?@[\\\]^_{|}~]{0,50}:([\w!#$%&()*+,\-./:;<=>?[\\\]^_{|}~]{3,50})@[a-zA-Z0-9.-]+(?:\.[a-zA-Z]{2,})?\b
  1. The pattern needs to be able to detect port as well as path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

URI detector: regex matches invalid characters, greedily overlaps with other URIs
4 participants