Releases: cmccomb/human_regex
Releases · cmccomb/human_regex
human_regex v0.2.0
Major restructuring ad well as lots of new functionality!
Single Character (3 of 7)
Implemented? |
Expression |
Description |
any() |
. |
any character except new line (includes new line with s flag) |
digit() |
\d |
digit (\p{Nd}) |
non_digit() |
\D |
not digit |
|
\pN |
One-letter name Unicode character class |
|
\p{Greek} |
Unicode character class (general category or script) |
|
\PN |
Negated one-letter name Unicode character class |
|
\P{Greek} |
negated Unicode character class (general category or script) |
Character Classes (4 of 11)
Implemented? |
Expression |
Description |
or(&['x', 'y', 'z']) |
[xyz] |
A character class matching either x, y or z (union). |
|
[^xyz] |
A character class matching any character except x, y and z. |
|
[a-z] |
A character class matching any character in range a-z. |
See below |
[[:alpha:]] |
ASCII character class ([A-Za-z]) |
|
[[:^alpha:]] |
Negated ASCII character class ([^A-Za-z]) |
or() |
[x[^xyz]] |
Nested/grouping character class (matching any character except y and z) |
|
[a-y&&xyz] |
Intersection (matching x or y) |
|
[0-9&&[^4]] |
Subtraction using intersection and negation (matching 0-9 except 4) |
|
[0-9--4] |
Direct subtraction (matching 0-9 except 4) |
|
[a-g~~b-h] |
Symmetric difference (matching a and h only) |
|
[\[\]] |
Escaping in character classes (matching [ or ]) |
Perl Character Classes
Implemented? |
Expression |
Description |
digit() |
\d |
digit (\p{Nd}) |
non_digit() |
\D |
not digit |
whitespace() |
\s |
whitespace (\p{White_Space}) |
non_whitespace() |
\S |
not whitespace |
word() |
\w |
word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}) |
non_word() |
\W |
not word character |
ASCII Character Classes
Implemented? |
Expression |
Description |
alphanumeric() |
[[:alnum:]] |
alphanumeric ([0-9A-Za-z]) |
alphabetic() |
[[:alpha:]] |
alphabetic ([A-Za-z]) |
ascii() |
[[:ascii:]] |
ASCII ([\x00-\x7F]) |
blank() |
[[:blank:]] |
blank ([\t ]) |
control() |
[[:cntrl:]] |
control ([\x00-\x1F\x7F]) |
digit() |
[[:digit:]] |
digits ([0-9]) |
graphical() |
[[:graph:]] |
graphical ([!-~]) |
uppercase() |
[[:lower:]] |
lower case ([a-z]) |
printable() |
[[:print:]] |
printable ([ -~]) |
punctuation() |
[[:punct:]] |
punctuation ([!-/:-@[-`{-~]) |
whitespace() |
[[:space:]] |
whitespace ([\t\n\v\f\r ]) |
lowercase() |
[[:upper:]] |
upper case ([A-Z]) |
word() |
[[:word:]] |
word characters ([0-9A-Za-z_]) |
hexdigit() |
[[:xdigit:]] |
hex digit ([0-9A-Fa-f]) |
Repetitions
Implemented? |
Expression |
Description |
zero_or_more(x) |
x* |
zero or more of x (greedy) |
one_or_more(x) |
x+ |
one or more of x (greedy) |
zero_or_one(x) |
x? |
zero or one of x (greedy) |
zero_or_more(x) |
x*? |
zero or more of x (ungreedy/lazy) |
one_or_more(x).lazy() |
x+? |
one or more of x (ungreedy/lazy) |
zero_or_more(x).lazy() |
x?? |
zero or one of x (ungreedy/lazy) |
between(n, m, x) |
x{n,m} |
at least n x and at most m x (greedy) |
at_least(n, x) |
x{n,} |
at least n x (greedy) |
exactly(n, x) |
x{n} |
exactly n x |
between(n, m, x).lazy() |
x{n,m}? |
at least n x and at most m x (ungreedy/lazy) |
at_least(n, x).lazy() |
x{n,}? |
at least n x (ungreedy/lazy) |
Composites
Implemented? |
Expression |
Description |
+ |
xy |
concatenation (x followed by y) |
or() |
x|y |
alternation (x or y, prefer x) |
Empty matches
Implemented? |
Expression |
Description |
beginning() |
^ |
the beginning of text (or start-of-line with multi-line mode) |
end() |
$ |
the end of text (or end-of-line with multi-line mode) |
beginning_of_text() |
\A |
only the beginning of text (even with multi-line mode enabled) |
end_of_text() |
\z |
only the end of text (even with multi-line mode enabled) |
word_boundary() |
\b |
a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
non_word_boundary() |
\B |
not a Unicode word boundary |
Groupings (3 of 5)
Implemented? |
Expression |
Description |
capture(exp) |
(exp) |
numbered capture group (indexed by opening parenthesis) |
named_capture(exp, name) |
(?P<name>exp) |
named (also numbered) capture group |
Handled implicitly through functional composition |
(?:exp) |
non-capturing group |
|
(?flags) |
set flags within current group |
|
(?flags:exp) |
set flags for exp (non-capturing) |
Flags (0 of 6)
Implemented? |
Expression |
Description |
|
i |
case-insensitive: letters match both upper and lower case |
|
m |
multi-line mode: ^ and $ match begin/end of line |
|
s |
allow . to match \n |
|
U |
swap the meaning of x* and x* ? |
|
u |
Unicode support (enabled by default) |
|
x |
ignore whitespace and allow line comments (starting with # ) |
human_regex v0.1.3
Character Classes
Single Character
Implemented? |
Expression |
Description |
any() |
. |
any character except new line (includes new line with s flag) |
digit() |
\d |
digit (\p{Nd}) |
non_digit() |
\D |
not digit |
|
\pN |
One-letter name Unicode character class |
|
\p{Greek} |
Unicode character class (general category or script) |
|
\PN |
Negated one-letter name Unicode character class |
|
\P{Greek} |
negated Unicode character class (general category or script) |
Perl Character Classes
Implemented? |
Expression |
Description |
digit() |
\d |
digit (\p{Nd}) |
non_digit() |
\D |
not digit |
whitespace() |
\s |
whitespace (\p{White_Space}) |
non_whitespace() |
\S |
not whitespace |
word() |
\w |
word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}) |
non_word() |
\W |
not word character |
ASCII Character Classes
Implemented? |
Expression |
Description |
|
[[:alnum:]] |
alphanumeric ([0-9A-Za-z]) |
|
[[:alpha:]] |
alphabetic ([A-Za-z]) |
|
[[:ascii:]] |
ASCII ([\x00-\x7F]) |
|
[[:blank:]] |
blank ([\t ]) |
|
[[:cntrl:]] |
control ([\x00-\x1F\x7F]) |
digit() |
[[:digit:]] |
digits ([0-9]) |
|
[[:graph:]] |
graphical ([!-~]) |
|
[[:lower:]] |
lower case ([a-z]) |
|
[[:print:]] |
printable ([ -~]) |
|
[[:punct:]] |
punctuation ([!-/:-@[-`{-~]) |
|
[[:space:]] |
whitespace ([\t\n\v\f\r ]) |
|
[[:upper:]] |
upper case ([A-Z]) |
word() |
[[:word:]] |
word characters ([0-9A-Za-z_]) |
|
[[:xdigit:]] |
hex digit ([0-9A-Fa-f]) |
Repetitions
Implemented? |
Expression |
Description |
zero_or_more(x) |
x* |
zero or more of x (greedy) |
one_or_more(x) |
x+ |
one or more of x (greedy) |
zero_or_one(x) |
x? |
zero or one of x (greedy) |
zero_or_more(x) |
x*? |
zero or more of x (ungreedy/lazy) |
one_or_more(x).lazy() |
x+? |
one or more of x (ungreedy/lazy) |
zero_or_more(x).lazy() |
x?? |
zero or one of x (ungreedy/lazy) |
between(n, m, x) |
x{n,m} |
at least n x and at most m x (greedy) |
at_least(n, x) |
x{n,} |
at least n x (greedy) |
exactly(n, x) |
x{n} |
exactly n x |
between(n, m, x).lazy() |
x{n,m}? |
at least n x and at most m x (ungreedy/lazy) |
at_least(n, x).lazy() |
x{n,}? |
at least n x (ungreedy/lazy) |
Composites
Implemented? |
Expression |
Description |
+ |
xy |
concatenation (x followed by y) |
or() |
x|y |
alternation (x or y, prefer x) |
Empty matches
Implemented? |
Expression |
Description |
begin() |
^ |
the beginning of text (or start-of-line with multi-line mode) |
end() |
$ |
the end of text (or end-of-line with multi-line mode) |
|
\A |
only the beginning of text (even with multi-line mode enabled) |
|
\z |
only the end of text (even with multi-line mode enabled) |
word_boundary() |
\b |
a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
non_word_boundary() |
\B |
not a Unicode word boundary |
Groupings and Flags
Implemented? |
Expression |
Description |
|
(exp) |
numbered capture group (indexed by opening parenthesis) |
|
(?P<name>exp) |
named (also numbered) capture group |
Handled implicitly through functional composition |
(?:exp) |
non-capturing group |
|
(?flags) |
set flags within current group |
|
(?flags:exp) |
set flags for exp (non-capturing) |
Implemented? |
Expression |
Description |
|
i |
case-insensitive: letters match both upper and lower case |
|
m |
multi-line mode: ^ and $ match begin/end of line |
|
s |
allow . to match \n |
|
U |
swap the meaning of x* and x* ? |
|
u |
Unicode support (enabled by default) |
|
x |
ignore whitespace and allow line comments (starting with # ) |
human_regex v0.1.2
This release includes some significant additional functionality! There are now functions for the following special and escaped characters:
- Any
- Digit
- Non-Digit
- Word
- Non-Word
- Whitespace
- Non-Whitespace
- Beginning
- End
There are also functions that allow the following relationships
- At least n
- At most n
- At least n and at most m
- Exactly n
- One or more
- Zero or one
- Zero or more
human_regex v0.1.1
Some initial functionality! There are a few character classes and an exactly-n relationship.