-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathhistory_en.txt
930 lines (800 loc) · 40.3 KB
/
history_en.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
20241208; version 4.063:
* Adjusted the pattern compiler to consider surrogate pairs when
generating SIMD data for UTF-16.
20241208; version 4.062:
* Simplified debugging macro checks.
* Minor improvements and cleanup.
20241204; version 4.061:
* Introduced a simple SIMD acceleration (x86/x64 only).
20241101; version 4.060:
* Corrected several offset values that had been off by 1 in
srell_updata3.h since the format change of data tables in 4.030.
Because of this problem, the pattern compiler had failed to look up
a Unicode property value being last in code point order, e.g.,
\p{sc=Zzzz}, \p{space}. (Thanks to Eugene Levelev for the bug
report).
* Updated unicode/updataout3.cpp to output a data file in which the
problem above is fixed.
* Cancelled the code change in 4.050 that had caused a problem which
had been fixed in 4.059, in favour of code simplification.
20241016; version 4.059:
* Fixed a problem that caused match_results::length(), position(), and
str() not to compile when an argument was omitted since version
4.050 (Thanks to Winfried Schenke for the bug report).
20241004; version 4.058:
* Now modifiers (?ims-ims:) are enabled by default.
20240922; version 4.057:
* Fixed an issue that caused the pattern compiler not to return an
error even when an invalid Unicode property name or value is
specified in \p{}/\P{} since version 4.054.
* Removed replace() and split() from basic_regex.
* Removed SRELL_CPP* macros.
* Updated misc/conftest.cpp. The wrong mask value bug fix (same fix in
srell.hpp 4.052) and changes for the removal of SRELL_CPP* macros.
* Updated unicode/updataout3.cpp.
20240911; version 4.056:
* Updated ucfdata2.h and updata3.h to support Unicode 16.0.0.
* Other minor modifications.
20240904; version 4.055:
* Adjusted the prefilter for UTF-8/UTF-16 so that search against a
string containing not a few characters encoded in multiple code
units (0080..10FFFF in UTF-8, 10000..10FFFF in UTF-16) would not be
slowed down to excess.
* Lowered the default value of limit_counter from 1 << 24 to 1 << 21.
* Removed all member functions from regex_traits as unused.
20240831; version 4.054:
* Code size reduction. Unified parsers that had been separate for
u-mode and v-mode.
20240824; version 4.053:
* Simplified the creation of the predefined character classes.
* Improved internal UTF-8 iterators. Reduced the number of conditional
jumps.
20240818; version 4.052:
* Fixed the wrong mask value in utf16_traits.
* Minor improvements.
20240816; version 4.051:
* Reimplemented the optimisation introduced in 4.050 in a different
way to minimise memory usage.
* Minor improvements.
20240810; version 4.050:
* Added a new optimisation for C{n,m} where C is a character or
character class and n < m != infinity.
* Minor improvements.
20240720; version 4.049:
* Added the new flag, "sticky" to syntax_option_type.
* Added split_aptrange() to regex_iterator2.
* Removed the feature of generating a data file in an old format from
ucfdataout2.cpp and updataout3.cpp.
20240714; version 4.048:
* Removed two types of internal iterators, which read a codepoint
value at the current position or the previous position with keeping
its pointing position.
* Modified internal UTF-8 iterators not to accept non-shortest forms.
20240707; version 4.047:
* Performance improvement in searching with srell::regex (only if
CHAR_BIT is 8), srell::u8cregex, or srell::u8regex.
20240613; version 4.046:
* Code size reduction. SRELL no longer gives priority to finding a
literal sequence.
* Various minor improvements and fixes.
20240608; version 4.045:
* Implemented the regex modifiers feature. But until the proposal is
merged into the draft specification of ECMAScript, this feature is
disabled and available only when SRELL_ENABLE_MODIFIERS is defined.
* Added a missing check to see whether a backreference number exceeds
the max number of capturing groups or not, which should have been
added with the modification for the duplicate named capturing groups
support in version 4.043.
20240602; version 4.044:
* Added several missing #if ~ #endif directives for
SRELL_NO_NAMEDCAPTURE.
* Retired the older state insertion function in favour of the newer
one.
20240526; version 4.043:
* Implemented the duplicate named capturing groups feature.
20240524; version 4.042:
* Expanded the scope of the optimisation for * and + also to support
C{n,} where C is a character or character class and n >= 2.
* Introduced the unified stack, which is used when either of the
following conditions is met:
1) The iterator passed to the matching function is a pointer, or
2) std::is_trivially_copyable is supported by the compiler and for
the type I of the passed iterator,
std::is_trivially_copyable<I>::value is true.
Otherwise separate stacks that have been present from early versions
are used.
20240519; version 4.041:
* Completed the temporary fix in 4.040.
* Removed unused functions.
* Fixed a potential issue on systems where memory of more than 64 GB
can be allocated.
20240131; version 4.040:
* Restored one more line for ?? (non-greedy {0,1}) not to cause an
optimisation bug.
20240127; version 4.039:
* Restored some code that had been removed mistakenly in 4.037.
20240124; version 4.038:
* Fixed a bug that caused /(?:ab)+|cd/ to match "ababcd".
Condition: Both sides of | begin with different characters, and the
left side character is contained in (?:)+.
* Minor improvements.
20240122; version 4.037:
* Fixed an optimisation bug that caused /(?:a|ab|abc)$/ to match "ac"
since version 4.021.
Condition: (?:A|B|C) where A is a prefix of B, and B is a prefix of
C.
-> A path from the end of A to a suffix of C occured through the
wrong optimisation. This path was usually hidden, but could be
used when backtracking is performed.
* Other various improvements and fixes.
20240114; version 4.036:
* Improvement and bugfix of lookaround (lookahead and lookbehind):
1. Removed unnecessary stack operations.
2. Restored the state type that had been removed in version 3.003 so
that the value of the first capturing group in /(?:(?=(\w))|b)c$/
against "abc" will be undefined, not "b".
Condition: 1. A lookaround assertion contains a capturing group,
2. After the lookaround assertion is successful, matching with
succeeding expressions fails, 3, Another subpattern separated by
'|' is tried and a match with total expressions is found.
-> A subsequence captured by the group in the lookaround remained
without being reverted to "undefined".
* Replaced misc/sample01.cpp with conftest.cpp.
* Tagged each kind of epsilon.
20231229; version 4.035:
* Improved case folding of character classes. (Compilation of \p{Any}
was a bit slow when the icase flag was set).
* Several preparations for (?i:) support.
* Updated updataout3.cpp. It could not be compiled because an internal
namespace was changed in the previous version.
20231209; version 4.034:
* Modified to use std::contiguous_iterator when it is available, to
check if the iterator passed to the matching function is a
contigous_iterator.
* Modified match_results::operator[]() not to throw error_backref when
a group name not existing in the regular expression is passed to as
an argument, but to return a reference to a sub_match object
representing an unmatched sub-exression.
In accordance with this change, now
match_results::operator[](size_type n) also returns the same object
when n >= size(). (Behaviour accordant to std::regex. Until the
previous version, this object was returned only when
SRELL_STRICT_IMPL is defined).
* Implemented the no throw/exception mode.
* For the no throw mode, added basic_regex::ecode() that returns
error_type that should have been thrown during the previous pattern
compilation.
* For the no throw mode, added match_results::ecode() that returns
error_type that should have been thrown during the previous pattern
matching/searching.
20230926; version 4.033:
* Fixed a bug that could cause a crash on 64-bit systems since version
4.020 (Thanks to Yuriy Skvortsov for the bug report).
Condition: 3 or more Alternatives begin with the same character such
as /ab|ac|ad/.
* Removed an unused member function from utf_traits.
* Some clean-ups.
20230916; version 4.032:
* Added directives for decoders of UTF-8/UTF-16 to be always inlined.
* Improved several internal functions that call the automaton.
20230913; version 4.031:
* Updated ucfdata2.h and updata3.h to support Unicode 15.1.0.
* Updated updataout3.cpp so that the "Unknown" value can be used as a
value for Script/Script_Extensions of Unicode property escapes.
Although this value is mentioned in Scripts.txt, SRELL did not
support it because it was not included in the table in the
ECMAScript specification which shows what script names must be
supported. However, as the table was removed from the specification
and the rationale to exclude it has disappeared, SRELL has begun to
support this value, following V8.
20230909; version 4.030:
* Modified the pattern compiler not to create a rewinder only for ^,
$, or \b/\B.
* Introduced the binary search to look up names and values for Unicode
properties.
* In accordance with the change above, updated unicode/updataout2.cpp
to updataout3.cpp.
Furthermore, since the ECMAScript specification ceased to list the
script names that must be supported, modified to read them from
Scripts.txt and PropertyValueAliases.txt.
* Changed the suffix of Unicode data files from *.hpp to *.h.
* Updated unicode/ucfdataout2.cpp to follow the suffix change above.
20230903; version 4.029:
* Updated unicode/updataout2.cpp to fix an issue that caused
compilation error because internal integer types were unified in
version 4.023.
* Recreated srell_updata2.hpp (Data for the two scripts that were
newly added to Unicode 15 were missing from the previous version.
Apparently, it was output by the previous version of
updataout2.cpp).
20230831; version 4.028:
* Improved not to call the automaton for Unicode code point values
that cannot be held by a single char/wchar_t when the regex or
wregex type is used.
20230821; version 4.027:
* Fixed a bug that caused the first capturing group to be empty in the
match result of /(?:(\d+-)?)+(\d{1,2})-(\d{1,2})/ against
"2023-8-21" (This bug had been existing from an early version).
* Fixed a bug that caused the entire match to be only "23-8-21" in the
same match result (This bug was introduced in 4.019 and was not
covered by the fix in 4.026).
20230820; version 4.026:
* Fix a bug that caused a search for /(\d+-)?\d{1,2}-\d{1,2}/ in
"2023-8-20" matched only "23-8-20" since version 4.019.
20230819; version 4.025:
* To avoid movzx, changed the type to hold a flag in the internal
representation from bool to an integer type.
* Repleaced names of member variables in structs that are used
frequently in the automaton with shorter names.
20230817; version 4.024:
* Commented out the code for an optimisation that had become unused
since 4.019.
* Various minor improvements and fixes.
20230804; version 4.023:
* Unified two internal integer types to one type.
* Simplified several optimisations that had become less effective
because of the entry state selector introduced in 4.019.
* Improved the entry state selector.
* Corrected misnamed variable names.
20230730; version 4.022:
* Refinement of source code and various minor fixes.
20230727; version 4.021:
* Improved the branch optimisation so that SRELL can optimise
Alternatives without inserting new additional internal states.
20230724; version 4.020:
* Simplified the method of converting properties of strings to
internal representations.
* Other minor improvements.
* [4.000-4.019, v flag mode] Because there was a mistake in the setting
for compiling my own source files to a release version, the v flag
mode was not correctly implemented in SRELL 4.000-4.019. Adding the
following line at the last of the optimise_pos() function fixes the
problem:
insert_btbranch(piece, ins_bt);
The bug originated from the fact that this function was not called
from anywhere.
20230114; version 4.019:
* Implemented a new entry state selector.
20230109; version 4.018:
* Cancelled the mergence of automata that was done in version 4.016,
because once I began to modify the pattern compiler for i-modifier
support, icase search performance degradation that had not surface
in preliminary examinations appeared.
20230107; version 4.017:
* Fixed a bug that caused compilation to fail since version 4.006 when
bidirectional iterators were passed to the matching function.
20230106; version 4.016/3.018 (@ only):
* Merged four automata into two (preparation for i-modifier support).
@ Fixed the pattern compiler not to treat /a{0,0}/ as an error.
@ Other minor fixes.
20221227; version 4.015:
* Fixed a minor issue in regex_iterator2 that was treated as an error
by VC when _ITERATOR_DEBUG_LEVEL >= 1.
* Other improvements.
20221220; version 4.014:
* Supplemented some member functions that were missing accidentally
from match_results in the previous release.
* Simplified regex_token_iterator.
20221220; version 4.013:
* Fixed a minor issue in split(). When splitting "abc" by /$/, split()
had returned {"abc", ""} instead of {"abc"} that is correct.
* Reduced the number of overload functions of replace(). Now when the
lambda expression is used, the type of match_results that will be
passed to a callback function needs specifying explicitly as the
template argument.
* Added regex_iterator2.
20221216; version 4.012:
* Fixed replace(). VC2005 could not compile it.
20221214; version 4.011/3.017 (@ only):
@ [LWG Issue 3204] Added swap() to sub_match.
* Modified replace() so that it can replace any container type that
looks like std::basic_string.
* Added srell::str_clip.
* Added overload functions to split() that support a pair of iterators
and a pointer.
20221212; version 4.010:
* Adjusted the behaviour of split() in accordance with the document.
While the document says sub_match is pushed to a list container,
in the code basic_string was pushed to the list container.
* Added overloads to the sub_match class that support implicit and
explcit converting to std::basic_string instantiated with a custom
traits/allocator.
20221210; version 4.009/3.016 (@ only):
@ Fixed a problem so that regex_iterator.prefix().matched can be set
to true in incrementing after it matched an empty sequence.
@ Fixed the core of the matching functions not to cause a compilation
error when an object of match_results instantiated with a custom
allocator is provided.
* Added new member functions to basic_regex as API extensions.
20221130; version 4.008:
* Priority was given back to the BMH matcher from the finder
introduced in 4.006.
* Minor improvements of ^ and $ in the multiline mode and \b, \B.
20221124; version 4.007:
* Added support for the un-bounded flag modifiers ((?ims-ims)), which
are available only at the beginning of a regular expression (the
same as Python 3.11).
Note: This feature is not defined in the ECMAScript specification
nor compatible with the regexp-modifiers proposal. This feature can
be disabled by defining SRELL_NO_UBMOD.
20221123; version 4.006:
* Added a new finder for expressions whose first matching character is
a single character.
20221030; version 4.005/3.015 (@ only):
@ Fixed a problem that caused undefined behaviour in conditions where
sizeof (int) != sizeof (long), e.g. LP64 (4/8/8). (Thanks to Travers
Ching for the report).
* Updated unicode/ucfdataout2.cpp and updataout2.cpp. Now they can
compile even without srell_ucfdata2.hpp and srell_updata2.hpp.
* Some clean-ups.
20221022; version 4.004/3.014:
* Updated srell_ucfdata2.hpp and srell_updata2.hpp to support Unicode
15.0.0.
* Updated unicode/updataout2.cpp to support Unicode 15. (Support in
advance new script names that are expected to be available in RegExp
of ECMAScript 2023).
* Removed some code that had become unused or meaningless as a result
of the previous backreference bug fixes.
20221012; version 4.003/3.013:
* Re-refixed the backreference bug. Incidentally, this bug was
introduced along with the addition of the variable-length lookbehind
feature. Therefore, SRELL versions 2.000- have this bug.
(It originated from the fact that in a variable length lookbehind
assertion, it is possible that the parser encounters a backreference
prior to the corresponding bracket pair, such as /(?<=\1\s+(\d+))/,
and the parser cannot know immediately whether the corresponding
capturing bracket pair really exists in the expression).
20221012; version 4.002/3.012:
* Refixed the backreference bug in a different way because the fix of
20221011 did not cover the problem caused by such an expression as
/(?:\1+)*()/. Fixed also an infinite loop caused by an expression
like /()(?:\1+)*/.
20221011; version 4.001/3.011 (@ only):
@ Fixed a bug that caused dereferencing a null pointer or infinite
loop when a backreference is followed by * or +, and the
backreference appears prior to the close bracket of the
corresponding pair of capturing brackets, such as /\1*()/, /(\1+)/.
(Thanks to @datadiode, the author of srellcom, for finding the bug).
* In accordance with the ECMAScript specification, restricted
positions where '-' can be written without escaping in character
classes. Now '-' following a predefined character class such as \d,
\s causes an error, unless it is the final character in a character
class. ([\s-0] causes an error, [\s-] is accepted).
* Adjusted internal UTF-8 iterators.
20220618; version 4.000:
* Added support for the v-flag mode that is expected to be added to
a future version of ECMAScript.
* Changed the format of srell_updata.hpp and renamed to
srell_updata2.hpp.
* In accordance with the change above, unicode/updataout.cpp was
updated and renamed to updataout2.cpp.
* Fixed an issue of a struct layout that clang-tidy warns as
"excessive padding" on 64-bit systems (Thanks for the report).
* Updated unicode/ucfdataout2.cpp.
20220529; version 3.010:
* Reduced the amount of memory used to hold a character class that
contains Unicode property escapes.
* Changed the value of error_type thrown when an invalid name or value
is specified in curly brackets of \p or \P, from
regex_constants::error_escape to newly-introduced
regex_constants::error_property.
* Other minor improvements.
20220511; version 3.009:
* Fixed an optimisation bug that caused /abcd|ab/ not to match "abc".
20220504; version 3.008:
* Fixed the behaviour of [^\P{...}] when the icase flag is set, as it
behaved similarly to the one in v-mode that has been proposed in
TC39.
20220429; version 3.007:
* Further modification to the counter mechanism.
20220428; version 3.006:
* Modified the mechanism of the counter used for repetition.
* Re-removed the implementation of linear search for small character
classes.
20220424; version 3.005:
* Fixed a bug that caused /(?<=$.*)/ not to match the end of "a" when
the multiline flag is set
* Preparations for \A, \z, (?m:) that have been proposed in TC39.
20220420; version 3.004:
* Added a new optimisation for /A*B/ and /A+B/ where a character class
A overlaps a character or character class B, such as /[A-Za-z]+ing/,
/".*"/.
20220416; version 3.003:
* Combined two optimisation functions into one.
* Reduced the amount of code for lookaround (lookahead and lookbehind)
assertions.
20220416; version 3.002:
* Fixed a bug that caused regex_match or regex_search with the
match_continuous flag being set to fail when the entry state
selector introduced in version 3.000 was used internally.
20211025; version 3.001:
* Removed the code for splitting counter as it seemed to be no effect
or to make performance a bit worse.
* Fixed potential bugs.
* Minor improvements.
20211023; version 3.000:
* Updated srell_ucfdata2.hpp and srell_updata.hpp to support Unicode
14.0.0.
* Updated unicode/updataout.cpp to support Unicode 14. (Support in
advance new script names that are expected to be available in RegExp
of ECMAScript 2022).
* Changed the type used to store a Unicode value when char32_t is not
available, from an "unsigned integer type with width of at least 21
bits" to a "one of at least 32 bits".
* Changed the type used to store a repetition count or character class
number when char32_t is not available, from "unsigned int" to
"unsigned integer type of at least 32-bit width".
* Added overflow check in the function that translates digits into a
numeric value. For example, while up to the previous version
/a{0,4294967297}/ was treated as /a{0,1}/ because of overflow when
the unsigned int type is 32-bit width, SRELL now throws error_brace
in cases like this.
* Fixed a bug that caused /[^;]*^;?/ not to match the beginning of an
input string when the multiline flag is not set.
* Implemented a very simple and limited entry state selector.
20211004; version 2.930:
* Added new typedefs whose prefix is u1632w- and support UTF-16 or
UTF-32 depending on the value of WCHAR_MAX. (When 0xFFFF <=
WCHAR_MAX < 0x10FFFF, u1632w- types are aliases of u16w- types.
When 0x10FFFF <= WCHAR_MAX, u1632w- types are aliases of u32w-
types).
* Reduced the amount of memory used for Eytzinger layout search.
* Various improvements. (Some of them are based on suggestions to NIRE
by Marko Njezic).
20210624; version 2.920:
* Added a new optimisation for the quantifier '?' (I.e., {0,1}).
* Changed the version number of the ECMAScript specification
referenced in misc/sample01.cpp to 2021.
20210429; version 2.912:
* Fixed another bug in the optimisation introduced in version 2.900,
which caused /aa|a|aa/ not to match "a" (Thanks to Jan Schrötter for
the report).
Incidentally, this optimisation can be disabled by defining
SRELLDBG_NO_BRANCH_OPT2 prior to including srell.hpp.
20210424; version 2.911:
* Fixed a bug in the optimisation introduced in version 2.900, which
caused /abc|ab|ac/ not to match "ac". (Thanks for the bug report [As
my email to the reporter was rejected by the email server and
returned, it is unclear whether mentioning the name here is okay
with the reporter. So, I refrain]).
20210407; version 2.910:
* Fixed a potential memory leak in move assignment operators used by
the pattern compiler since 2.900. (Thanks to Michal Švec for the
report).
20210214; version 2.901:
* Removed redundant template specialisations.
20210214; version 2.900:
* Added a new optimisation for the alternative expression that consist
of string literals, such as /abc|abd|acde/.
* Fixed the problem that brought u(8|16)[cs]regex_(token_)?iterator
(i.e., regex (token) iterators specialised for char8_t or char16_t)
to a compile error.
* Minor improvements.
20210131; version 2.810:
* Improved internal UTF-8 iterators.
20200724; version 2.800:
* Introduced the Eytzinger layout for binary search in the character
class.
* Reimplemented linear search for small character classes.
* Modified handling of the property data used for parsing the name for
a named capturing group. Now they are loaded only when needed
instead of being loaded into an instance of basic_regex always.
20200714; version 2.730:
* Added code to prevent redundant save and restore operations when
nested capturing round brackets are processed.
* Improved regex_iterator.
20200703; version 2.720:
* Improved case-insensitive (icase) search using the
Boyer-Moore-Horspool algorithm for UTF-8 string that includes
non-ASCII characters or UTF-16 string that includes non-BMP
characters.
* Fixed a bug that caused regex_iterator->prefix().first to point to
the beginning of the subject string instead of the end of the
previous match (regression introduced in version 2.650, when
three-iterators overloads were added to regex_search()).
* In accordance with the fix above, when a three-iterators version of
regex_search() is called, now match_results.position() returns a
distance from the position passed to as the lookbehind limit (3rd
param of regex_search) and match_results.prefix().first points to
the position passed to as the beginning of the subject string (1st
param of regex_search).
* Fixed a bug that could cause a valid UTF-8 sequence being adjacent
to an invalid UTF-8 sequence to be skipped when the BMH algorithm
was used (regression introduced in version 2.630, when UTF-8
handling was modified).
20200701; version 2.710:
* Minor modifications to Boyer-Moore-Horspool search.
20200630; version 2.700:
* Optimisation adjustments.
20200620; version 2.651:
* Move the group name validity check to after parsing the \u escape.
* Updated misc/sample01.cpp to version 1.103. Changed the version
number of the ECMAScript specification referenced by to 2020 (ES11).
20200618; version 2.650:
* To element access functions in match_results, added overload
functions for specifying the group name by a pointer.
* When a three-iterators version of regex_search() is used, SRELL now
sets match_results::prefix::first to the position passed to as the
lookbehind limit (third param) instead of the position passed to as
the beginning of the subject (first param).
* Removed some operations that seem to be redundant.
20200601; version 2.643:
* Added "inline" to operators in syntax_option_type and
match_flag_type types, based on a report that it is needed not to
cause the multiple definition error.
* Minor improvements.
20200530; version 2.642:
* Reduced the size of memory allocated by the basic_regex instance.
20200528; version 2.641:
* The fix in 2.640 was incomplete. Fixed the optimisation bug 1 again.
* Optimisation adjustments.
20200516; version 2.640:
* Fixed an optimisation bug 1: It was possible for regex_match to pass
the end of a subject string under certain conditions.
* Fixed an optimisation bug 2: ^ and $ were not given a chance to
match an appropriate position in some cases when the multiline flag
is set to true.
* Updated srell_ucfdata2.hpp and srell_updata.hpp.
20200509; version 2.630:
* SRELL's pattern compiler no longer permits invalid UTF-8 sequences
in regular expressions. It throws regex_utf8. (Invalid UTF-8
sequences in the subject string are not treated as an error.)
* Fixed BMH search functions not to include extra (invalid) UTF-8
trailing bytes following the real matched substring, in a returned
result.
* Fixed minor issues: 1) basic_regex.flags() did not return the
correct value in some cases, 2) match_results.format() did not
replace $<NAME> with an empty string when any capturing group whose
name is NAME did not exist.
20200502; version 2.620:
* Removed methods used for match_continuous and regex_match in the
class for the Boyer-Moore-Horspool algorithm. Now SRELL always uses
the automaton like earlier versions when they are processed.
* Some clean-ups.
20200428; version 2.611:
* Fixed a bug that caused /\d*/ not to match the head of "abc" but to
match the end of it. (regression introduced in version 2.210.)
20200426; version 2.610:
* Fixed a bug that caused case-insensitive (icase) BMH search to skip
a matched sequence at the beginning of the entire text, when 1)
search is done against UTF-8 or UTF-16 text, and 2) the searched
pattern ends with a character that consists of multiple code units
in that encoding.
* Now SRELL parses a capturing group name according to the ECMA
specification and strictly checks its validity. Group names like
/(?<,>...)/ cause regex_error.
20200418; version 2.600:
* To pass to regex_search() directly the limit of a sequence until
where the automaton can lookbehind, added three-iterators versions
of regex_search().
* [Breaking Change] Removed the match_lblim_avail flag from
match_flag_type and the lookbehind_limit member from match_results
which were added in version 2.300.
* Updated srell_ucfdata2.hpp and srell_updata.hpp to support Unicode
13.0.0.
* Updated unicode/updataout.cpp to support Unicode 13. (Support in
advance new script names that will be available in RegExp of
ECMAScript 2020).
20191118; version 2.500:
* Modified basic_regex to hold precomputed tables for icase matching,
instead of creating them from case folding data when its instance is
first created.
* In accordance with the change above, srell_ucfdata.hpp and
ucfdataout.cpp were replaced with srell_ucfdata2.hpp and
ucfdataout2.cpp, accordingly.
* Changed the method of character class matching from linear search to
binary search.
* Changed the timing of optimisation of a character class from "when a
closing bracket ']' is found" to "every time a character or
character range is pushed to its character class array".
* Removed all asserts.
* Modified the pattern compiler to interpret sequential \uHHHH escapes
as a Unicode code point value if they represent a valid surrogate
pair. (By this change, incompatibilities with the ECMAScript
specification disappeared.)
* Fixed the position of an endif directive that caused a compiler
error when -DSRELL_NO_NAMEDCAPTURE is specified.
* Updated updataout.cpp to version 1.101.
* Added a standalone version of SRELL in the single-header directory.
20190914; version 2.401:
* Reduced the size of basic_regex. (It was bloated by my carelessness
when support for Unicode property escapes was added).
* Improved basic_regex::swap().
20190907; version 2.400:
* Improved the performance of character class matching.
* Modified the pattern compiler to interpret the \u escape sequence in
the group name in accordance with the ECMAScript specification.
* Updated ucfdataout.cpp to version 1.200. A new member has been added
to the unicode_casefolding class in srell_ucfdata.hpp that
ucfdataout.cpp generates.
Because SRELL 2.400 and later need this added member, they cannot be
used with srell_ucfdata.hpp output by ucfdataout.cpp version 1.101
or earlier. (No problem in using an older version of SRELL with a
newer version of srell_ucfdata.hpp).
* Some clean-ups and improvements.
20190902; version 2.304:
* Fixed regex_iterator that had been broken by the code clean-up in
version 2.303.
20190810; version 2.303:
* Refixed the problem that was fixed in version 2.302 as the fix was
incomplete.
* Cleaned up code.
20190809; version 2.302:
* Bug fix: When (?...) has a quantifier, strings captured by round
brackets inside it were not cleared in each repetition but carried
over to the next loop. For example,
/(?:(ab)|(cd))+/.exec("abcd") returned ["abcd", "ab", "cd"], instead
of ["abcd", undefined, "cd"]. (The latter is correct).
* Updated misc/sample01.cpp to version 1.102. Rewrote the chapter
numbers in accordance with ECMAScript 2019 (ES10).
20190724; version 2.301:
* In accordance with the ECMAScript spec, restricted the characters
which can be escaped by '\', to the following fifteen characters:
^$\.*+?()[]{}|/
Only in the character class, i.e., inside [], '-' also becomes a
member of the group.
20190717; version 2.300:
* Added a feature for specifying the limit until where the automaton
can lookbehind, separated from the beginning of a target sequence.
(Addition of the match_lblim_avail flag to match_flag_type and the
lookbehind_limit member to match_results).
And, lookbehind_limit of match_results being private and used
internally in regex_iterator is also set in its constructor.
* Removed order restriction of capturing parentheses and
backreferences, in accordance with the ECMAScript spec. Now /\1(.)/,
/(?<=(.)\1)/, and /\k<a>(?<a>.)/ are all okay.
* Updated misc/sample01.cpp to version 1.101. Added one compliance
test from misc.js.
20190714; version 2.230:
* Improved the performance of searching when regular expressions begin
with a character or character class followed by a '*' or '+'. (E.g.
/[A-Za-z]+ing/).
20190707; version 2.221:
* Changed the feature test macro used for checking availability of
std::u8string, from __cpp_char8_t to __cpp_lib_char8_t.
* When icase specified, if all characters in a character class become
the same character as a result of case-folding, the pattern compiler
has been changed to convert the character class to the character
literal (e.g. /r[Ss\u017F]t/i -> /rst/i).
* Fixed a minor issue.
20190617; version 2.220:
* Changed the internal representation of repetition in the case that
it becomes more compact by not using the counter.
* Fixed an optimisation bug that caused searching for /a{1,2}?b/
against "aab" to return "ab" instead of "aab". (Condition: a
character or character class with a non-greedy quantifier is
followed by its exclusive character or character class).
20190613; version 2.210:
* Improved a method of matching for expressions like /ab|cd|ef/ (where
string literals separaterd by '|' begin with a character exclusive
to each other).
20190603; version 2.202:
* Fixed a bug that caused regex_match to behave like regex_search in
the situation where the BMH algorithm is used.
20190531; version 2.200:
* For searching with a ordinary (non-regex) string, added an
implementation based on the Boyer-Moore-Horspool algorithm.
* Improved UTF-8 iterators.
* Fixed behaviours of \b and \B when icase specified, to match /.\B./i
against "s\u017F".
* Fixed minor issues.
20190508; version 2.100:
* Fixed a bug that caused failure of capturing when 1) a pair of
capturing brackets exists in a lookbehind assertion, and 2) variable
length expressions exist in both the left side of and the inside of
the pair of brackets. E.g. given "1053" =~ /(?<=(\d+)(\d+))$/, no
appropriate string was set for $2.
* Updated srell_ucfdata.hpp and srell_updata.hpp to support Unicode
12.1.0.
* Updated unicode/updataout.cpp to support Unicode 12. (Support in
advance a new binary property and new script names that will be
available in RegExp of ECMAScript 2019 and new script names that are
anticipated to be available in RegExp of ECMAScript 2020).
* Changed the newline character in srell.hpp from CR+LF to LF.
* Modified unicode/*.cpp to output LF as a newline instead of CR+LF.
* Updated misc/sample01.cpp to version 1.100:
1. Rewrote the chapter numbers in subtitles of compliance tests, in
accordance with ECMAScript 2018 Language Specification (ES9).
(The old chapter numbers were based on ECMAScript specifications
up to version 5.1).
2. Added one compliance test from ECMAScript 2018 Language
Specification 21.2.2.3, NOTE.
* Modified the macros for detecting C++11 features.
* Changed the method of the character class.
* For all the constructors and assign functions of basic_regex to have
a default argument for flag_type, reimplemented syntax_option_type
and match_flag_type (missed changes between TR1 -> C++11).
* Experimental support for the char8_t type. If a compiler supports
char8_t (detected by the __cpp_char8_t macro), classes whose names
have the "u8-" prefix accept a sequence of char8_t and handle it as
a UTF-8 string. If char8_t is not supported, the classes handle a
sequence of char as a UTF-8 string, as before.
* As classes that always handle a sequence of char as a UTF-8 string,
new classes whose names have the "u8c-" prefix were added. They
correspond to the classes having the "u8-" prefix in their names up
to version 2.002:
* u8cregex; u8ccmatch, u8csmatch; u8ccsub_match, u8cssub_match;
u8ccregex_iterator, u8csregex_iterator; u8ccregex_token_iterator,
u8csregex_token_iterator.
20180717; version 2.002:
* Changed the maximum number of hexdigits in \u{h...} from six to
'unlimited' in accordance with the ECMAScript specification. ("one
to six hexadecimal digits" of the old implementation was based on
the proposal document).
* Updated updataout.cpp to version 1.001. Encounting unknown
(newly-encoded) script names is no longer treated as an error.
* Updated srell_ucfdata.hpp and srell_updata.hpp to support Unicode
11.0.0.
20180204; version 2.001:
* When icase is specified, [\W] (a character class containing \W) no
longer matches any of [KkSs\u017F\u212A] (ecma262 issue #512).
20180127; version 2.000:
* Added the following features that are to be included into RegExp of
ECMAScript 2018:
* New syntax option flag for '.' to match every code point, dotall,
was added to srell::regex_constants as a value of
syntax_option_type and to srell::basic_regex as a value of
flag_type.
* New expressions to support the Unicode property, \p{...} and
\P{...}.
* Named capture groups (?<NAME>...) and the new expression for
backreference to a named capture group, \k<NAME>.
* The behaviors of lookbehind assertions changed. Now both (?<=...)
and (?<!...) support variable-length lookbehind.
20180125; version 1.401:
* Limited the maximum of numbers that are recognised as backreference
in match_results.format() up to 99, in accordance with the
ECMAScript specification. (I.e., restricted to $1..$9 and $01..$99).
* Removed an unused macro and its related code.
20180101; version 1.400:
* Changed the behaviour of the pattern compiler so that an empty
non-capturing group can have a quantifier, for example, /(?:)*/. It
is a meaningless expression, but changed just for compatibility with
RegExp of ECMAScript.
* Fixed a hang bug: This occured when 1) a non-capturing group has a
quantifier, 2) and the length of the group itself can be zero-width,
3) and a backreference that can be zero-width is included in the
group somewhere other than the last, such as /(.*)(?:\1.*)*/.
20171216; version 1.300:
* Fixed an important bug: /^(;[^;]*)*$/ did not match ";;;;" because
of a bug in optimisation. This problem occured when a sequence of
regular expressions ended like /(A...B*)*$/ where a character or
character set that A represents and the one that B represents are
exclusive to each other.
20170621; version 1.200:
* Updated srell_ucfdata.hpp to support Unicode 10.0.0.
* Improved u8regex_traits to handle corrupt UTF-8 sequences more
safely.
20150618; version 1.141:
Updated srell_ucfdata.hpp to support Unicode 8.0.0.
20150517; version 1.140:
* Modified the method for regex_match() to determine whether a
sequence of regular expressions is matched against a sequence of
characters. (Issue raised at #2273 in C++ Standard Library Issues
List).
* Restricted the accepted range of X in the expression "\cX" to
[A-Za-z] in accordance with the ECMAScript specification.
* Fixed the problem that caused parens in a lookaround assertion not
to capture a sequence correctly in some circumstances because the
bug fix done in version 1.111 was imperfect.
20150503; version 1.130:
* Improved case-folding functions.
* Updated unicode/ucfdataout.cpp to version 1.100.
* Fixed a typo in #if directives for u(16|32)[cs]match.
20150425; version 1.120:
* Fixed the bug that caused characters in U+010000-U+10FFFF in UTF-8
(i.e., four octet length characters) not to have been recognised.
* Updated misc/sample01.cpp to version 1.010.
20150402; version 1.111:
* Fixed the problem that caused $2 of "aaa" =~ /((.*)*)/ to be empty
instead of "aaa" because of a bug in optimisation.
20141101; version 1.110:
* Several fixes based on a bug report:
1. Added "this->" to compile() in basic_regex::assign().
2. Implemented operator=() functions explicitly instead of using
default ones generated automatically.
* unicode/ucfdataout.cpp revised and updated to version 1.001.
20140622; version 1.101:
Updated srell_ucfdata.hpp to support Unicode 7.0.0.
20121118; version 1.100:
The first released version.