Skip to content

Releases: cmccomb/human_regex

human_regex v0.3.0

12 Feb 17:11
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.3.0

human_regex v0.2.0

02 Nov 19:21
Compare
Choose a tag to compare

Major restructuring ad well as lots of new functionality!

Single Character (3 of 7)

Implemented? Expression Description
any() . any character except new line (includes new line with s flag)
digit() \d digit (\p{Nd})
non_digit() \D not digit
\pN One-letter name Unicode character class
\p{Greek} Unicode character class (general category or script)
\PN Negated one-letter name Unicode character class
\P{Greek} negated Unicode character class (general category or script)

Character Classes (4 of 11)

Implemented? Expression Description
or(&['x', 'y', 'z']) [xyz] A character class matching either x, y or z (union).
[^xyz] A character class matching any character except x, y and z.
[a-z] A character class matching any character in range a-z.
See below [[:alpha:]] ASCII character class ([A-Za-z])
[[:^alpha:]] Negated ASCII character class ([^A-Za-z])
or() [x[^xyz]] Nested/grouping character class (matching any character except y and z)
[a-y&&xyz] Intersection (matching x or y)
[0-9&&[^4]] Subtraction using intersection and negation (matching 0-9 except 4)
[0-9--4] Direct subtraction (matching 0-9 except 4)
[a-g~~b-h] Symmetric difference (matching a and h only)
[\[\]] Escaping in character classes (matching [ or ])

Perl Character Classes

Implemented? Expression Description
digit() \d digit (\p{Nd})
non_digit() \D not digit
whitespace() \s whitespace (\p{White_Space})
non_whitespace() \S not whitespace
word() \w word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control})
non_word() \W not word character

ASCII Character Classes

Implemented? Expression Description
alphanumeric() [[:alnum:]] alphanumeric ([0-9A-Za-z])
alphabetic() [[:alpha:]] alphabetic ([A-Za-z])
ascii() [[:ascii:]] ASCII ([\x00-\x7F])
blank() [[:blank:]] blank ([\t ])
control() [[:cntrl:]] control ([\x00-\x1F\x7F])
digit() [[:digit:]] digits ([0-9])
graphical() [[:graph:]] graphical ([!-~])
uppercase() [[:lower:]] lower case ([a-z])
printable() [[:print:]] printable ([ -~])
punctuation() [[:punct:]] punctuation ([!-/:-@[-`{-~])
whitespace() [[:space:]] whitespace ([\t\n\v\f\r ])
lowercase() [[:upper:]] upper case ([A-Z])
word() [[:word:]] word characters ([0-9A-Za-z_])
hexdigit() [[:xdigit:]] hex digit ([0-9A-Fa-f])

Repetitions

Implemented? Expression Description
zero_or_more(x) x* zero or more of x (greedy)
one_or_more(x) x+ one or more of x (greedy)
zero_or_one(x) x? zero or one of x (greedy)
zero_or_more(x) x*? zero or more of x (ungreedy/lazy)
one_or_more(x).lazy() x+? one or more of x (ungreedy/lazy)
zero_or_more(x).lazy() x?? zero or one of x (ungreedy/lazy)
between(n, m, x) x{n,m} at least n x and at most m x (greedy)
at_least(n, x) x{n,} at least n x (greedy)
exactly(n, x) x{n} exactly n x
between(n, m, x).lazy() x{n,m}? at least n x and at most m x (ungreedy/lazy)
at_least(n, x).lazy() x{n,}? at least n x (ungreedy/lazy)

Composites

Implemented? Expression Description
+ xy concatenation (x followed by y)
or() x|y alternation (x or y, prefer x)

Empty matches

Implemented? Expression Description
beginning() ^ the beginning of text (or start-of-line with multi-line mode)
end() $ the end of text (or end-of-line with multi-line mode)
beginning_of_text() \A only the beginning of text (even with multi-line mode enabled)
end_of_text() \z only the end of text (even with multi-line mode enabled)
word_boundary() \b a Unicode word boundary (\w on one side and \W, \A, or \z on other)
non_word_boundary() \B not a Unicode word boundary

Groupings (3 of 5)

Implemented? Expression Description
capture(exp) (exp) numbered capture group (indexed by opening parenthesis)
named_capture(exp, name) (?P<name>exp) named (also numbered) capture group
Handled implicitly through functional composition (?:exp) non-capturing group
(?flags) set flags within current group
(?flags:exp) set flags for exp (non-capturing)

Flags (0 of 6)

Implemented? Expression Description
i case-insensitive: letters match both upper and lower case
m multi-line mode: ^ and $ match begin/end of line
s allow . to match \n
U swap the meaning of x* and x*?
u Unicode support (enabled by default)
x ignore whitespace and allow line comments (starting with #)

human_regex v0.1.3

31 Oct 20:39
Compare
Choose a tag to compare

Character Classes

Single Character

Implemented? Expression Description
any() . any character except new line (includes new line with s flag)
digit() \d digit (\p{Nd})
non_digit() \D not digit
\pN One-letter name Unicode character class
\p{Greek} Unicode character class (general category or script)
\PN Negated one-letter name Unicode character class
\P{Greek} negated Unicode character class (general category or script)

Perl Character Classes

Implemented? Expression Description
digit() \d digit (\p{Nd})
non_digit() \D not digit
whitespace() \s whitespace (\p{White_Space})
non_whitespace() \S not whitespace
word() \w word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control})
non_word() \W not word character

ASCII Character Classes

Implemented? Expression Description
[[:alnum:]] alphanumeric ([0-9A-Za-z])
[[:alpha:]] alphabetic ([A-Za-z])
[[:ascii:]] ASCII ([\x00-\x7F])
[[:blank:]] blank ([\t ])
[[:cntrl:]] control ([\x00-\x1F\x7F])
digit() [[:digit:]] digits ([0-9])
[[:graph:]] graphical ([!-~])
[[:lower:]] lower case ([a-z])
[[:print:]] printable ([ -~])
[[:punct:]] punctuation ([!-/:-@[-`{-~])
[[:space:]] whitespace ([\t\n\v\f\r ])
[[:upper:]] upper case ([A-Z])
word() [[:word:]] word characters ([0-9A-Za-z_])
[[:xdigit:]] hex digit ([0-9A-Fa-f])

Repetitions

Implemented? Expression Description
zero_or_more(x) x* zero or more of x (greedy)
one_or_more(x) x+ one or more of x (greedy)
zero_or_one(x) x? zero or one of x (greedy)
zero_or_more(x) x*? zero or more of x (ungreedy/lazy)
one_or_more(x).lazy() x+? one or more of x (ungreedy/lazy)
zero_or_more(x).lazy() x?? zero or one of x (ungreedy/lazy)
between(n, m, x) x{n,m} at least n x and at most m x (greedy)
at_least(n, x) x{n,} at least n x (greedy)
exactly(n, x) x{n} exactly n x
between(n, m, x).lazy() x{n,m}? at least n x and at most m x (ungreedy/lazy)
at_least(n, x).lazy() x{n,}? at least n x (ungreedy/lazy)

Composites

Implemented? Expression Description
+ xy concatenation (x followed by y)
or() x|y alternation (x or y, prefer x)

Empty matches

Implemented? Expression Description
begin() ^ the beginning of text (or start-of-line with multi-line mode)
end() $ the end of text (or end-of-line with multi-line mode)
\A only the beginning of text (even with multi-line mode enabled)
\z only the end of text (even with multi-line mode enabled)
word_boundary() \b a Unicode word boundary (\w on one side and \W, \A, or \z on other)
non_word_boundary() \B not a Unicode word boundary

Groupings and Flags

Implemented? Expression Description
(exp) numbered capture group (indexed by opening parenthesis)
(?P<name>exp) named (also numbered) capture group
Handled implicitly through functional composition (?:exp) non-capturing group
(?flags) set flags within current group
(?flags:exp) set flags for exp (non-capturing)
Implemented? Expression Description
i case-insensitive: letters match both upper and lower case
m multi-line mode: ^ and $ match begin/end of line
s allow . to match \n
U swap the meaning of x* and x*?
u Unicode support (enabled by default)
x ignore whitespace and allow line comments (starting with #)

human_regex v0.1.2

31 Oct 01:44
Compare
Choose a tag to compare

This release includes some significant additional functionality! There are now functions for the following special and escaped characters:

  • Any
  • Digit
  • Non-Digit
  • Word
  • Non-Word
  • Whitespace
  • Non-Whitespace
  • Beginning
  • End

There are also functions that allow the following relationships

  • At least n
  • At most n
  • At least n and at most m
  • Exactly n
  • One or more
  • Zero or one
  • Zero or more

human_regex v0.1.1

30 Oct 21:59
Compare
Choose a tag to compare

Some initial functionality! There are a few character classes and an exactly-n relationship.