Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x-] is not correctly handled #2374

Closed
8 tasks done
danny0838 opened this issue Nov 19, 2022 · 24 comments
Closed
8 tasks done

[x-] is not correctly handled #2374

danny0838 opened this issue Nov 19, 2022 · 24 comments
Labels
bug Something isn't working fixed issue has been addressed

Comments

@danny0838
Copy link

danny0838 commented Nov 19, 2022

Prerequisites

  • I verified that this is not a filter list issue. Report any issues with filter lists or broken website functionality in the uAssets issue tracker.
  • This is not a support issue or a question. For any support, questions or help, visit /r/uBlockOrigin.
  • I performed a cursory search of the issue tracker to avoid opening a duplicate issue.
  • The issue is not present after disabling uBO in the browser.
  • I checked the documentation to understand that the issue I am reporting is not normal behavior.

I tried to reproduce the issue when...

  • uBO is the only extension.
  • uBO uses default lists and settings.
  • using a new, unmodified browser profile.

Description

The revised regex analyzer does not correctly handle [x-], which is treated as [\-] and thus x is not calculated during token extraction.

I have raised an issue to the library developer: foo123/RegexAnalyzer#5. Before the library is fixed, we can preprocess the regex string to escape problem-making "-"s in a character group before sending it to the analyzer:

const REGEX_RE_STR_FIXER = /\\[\s\S]|\[([^\\\]]*(?:\\[\S\s][^\\\]]*)*)\]/g;
const REGEX_RE_STR_FIXER_FUNC = (m, cg) => (cg ? '[' + cg.replace(REGEX_RE_STR_FIXER_CG, REGEX_RE_STR_FIXER_CG_FUNC) + ']' : m);
const REGEX_RE_STR_FIXER_CG = /^\^|(?:[^\\\]]|\\(?:x[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}|[^dDsSwW]))-(?:[^\\\]]|\\(?:x[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}|[^dDsSwW]))|\\[\s\S]|(-)/g;
const REGEX_RE_STR_FIXER_CG_FUNC = (m, h) => (h ? '\\-' : m);

// fix analyzer bug for e.g. [x-], [-x], [\d-x], [x-\d]
// https://github.com/foo123/RegexAnalyzer/issues/5
reStr = reStr.replace(REGEX_RE_STR_FIXER, REGEX_RE_STR_FIXER_FUNC);

This is somehow ugly and imperformant, but it may be the only way if we're not going to implement a full fix of the library.

Also note that this does not fix the bug in 5th case yet. Fortunately the case should not exist in a real-world URL filter.

A specific URL where the issue occurs.

https://example.com/xfoo

Steps to Reproduce

  1. Add a user rule /^https?://(?:example\.com|ex\.com)/[x-]foo\b/$doc
  2. Visit the URL

Expected behavior

The page should be blocked.

Actual behavior

The page is not blocked.

uBO version

1.45.3b4

Browser name and version

Independant

Operating System and version

Independant

@gwarser
Copy link

gwarser commented Nov 19, 2022

Maybe this is why regex was reported as invalid when - was last in the group?

@danny0838
Copy link
Author

@gwarser Where is it reported?

ECMAscript spec seems to allow such use case.

@gwarser
Copy link

gwarser commented Nov 19, 2022

@danny0838
Copy link
Author

danny0838 commented Nov 19, 2022

@gwarser I have read the thread but I don't think there is a consensus that trailing - is invalid. I think it is the app/library that has to support it as browsers and the spec support it.

@gwarser
Copy link

gwarser commented Nov 19, 2022

I mean, it's this library limitation.

gorhill added a commit to gorhill/uBlock that referenced this issue Nov 19, 2022
@gorhill
Copy link
Member

gorhill commented Nov 19, 2022

Any more of these cases with regex tokenizer, please add here instead of opening new issue.

@danny0838
Copy link
Author

The developer of Regex.js is fixing the issues and it seems that this issue is mostly resolved by the latest release (with a bug that may happen in a very rare case). Maybe we can simply wait for a while for his working.

@danny0838
Copy link
Author

Issues in foo123/RegexAnalyzer#5 seems resolved in 1.2.0

gorhill added a commit to gorhill/uBlock that referenced this issue Jan 1, 2023
gorhill added a commit to gorhill/uBlock that referenced this issue Jan 1, 2023
@gorhill
Copy link
Member

gorhill commented Jan 8, 2023

Version of 1.2.0 was imported.

@gorhill gorhill closed this as completed Jan 8, 2023
@gorhill gorhill added fixed issue has been addressed bug Something isn't working labels Jan 8, 2023
@micsthepick

This comment was marked as off-topic.

@garry-ut99

This comment was marked as abuse.

@micsthepick

This comment was marked as off-topic.

@micsthepick

This comment was marked as off-topic.

@garry-ut99

This comment was marked as abuse.

@micsthepick

This comment was marked as off-topic.

@micsthepick

This comment was marked as off-topic.

@micsthepick

This comment was marked as off-topic.

@micsthepick

This comment was marked as off-topic.

@micsthepick
Copy link

micsthepick commented Sep 2, 2024

Syntax highlighting is confusing:

/ex-ample\.com/ has no coloration
/example\.com/ has coloration
and yet no message to explain why.

@garry-ut99

This comment was marked as abuse.

@gorhill
Copy link
Member

gorhill commented Sep 2, 2024

/ex-ample\.com/ has no "error"
/example\.com/ has 1 "error"

Orange is not an error, it just means no token can be extracted from the regex, and as such the filter will be less efficient.

A token can be extracted from the first one, ample. No token can be extracted from the second one.

@garry-ut99

This comment was marked as abuse.

@gorhill
Copy link
Member

gorhill commented Sep 2, 2024

The ultimate revision is the one based on a regex analyzer, introduced in gorhill/uBlock@426395a.

@garry-ut99

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests

5 participants