-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
22 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,12 +28,12 @@ Don't hesitate to contact me: [email protected] | |
## TODO | ||
|
||
- [X] Fixme | ||
- [ ] More tests, objectif 1000, current 321 | ||
- [ ] More tests, objectif 1000, current 338 | ||
- [ ] Optimization | ||
- [X] Parser size | ||
- [ ] Benchmark | ||
- [X] Math ident | ||
- [ ] Maybe extras? | ||
- [X] Extras | ||
|
||
- [X] ~Use the unicode database to implement a test based on binary search to find math identifier.~ | ||
|
||
|
@@ -126,34 +126,6 @@ If you downloaded the already generated grammar, the `tree-sitter generate` step | |
|
||
Failing tests are found in [`corpus/fixme.scm`](https://github.com/uben0/tree-sitter-typst/blob/master/corpus/fixme.scm). | ||
|
||
### Optimization with extras | ||
|
||
When searching ways to optimize the parser and simplify the grammar, I thought about using the *extras* feature for spaces and comments. I don't know if it will significantly reduce parser size, but I want to try it to see. The only problem arises with function calls and, in inline code, field access. They must be directly joined (no space nor comment in between). The use of the *immediate* feature won't solve the problem as it only takes in acount inline regex (which would be ok with spaces but not comments, as they have to appear in output tree). | ||
|
||
The solution is to rely on external scanner when parsing spaces or comments. Lets call a "pre-immediate" token, a token susceptible to be followed by immediate token. When a pre-immediate token is parsed, it sets a flag to `true`, and when a space or comment is parsed, it resets the flag to `false` (this flag is stored in scanner's state as a boolean). | ||
|
||
This way when a token has to be immediate, an external token can be required and will only match if flag is `true`. It means, any pre-immediate token have to be preceded by a token that will set to `true` the flag. | ||
|
||
- [X] `string` | ||
- [X] `number` | ||
- [X] `ident` | ||
- [X] `']'` | ||
- [X] `'}'` | ||
- [X] `')'` | ||
- [X] math shorthand | ||
- [X] math ident | ||
- [X] math letter | ||
|
||
The immediate token has to be parsed by external scanner because the use of `immediate_get` is impossible. | ||
|
||
Spaces and comments must have precedence over the marker token (called `_is_immediate`). | ||
|
||
- [X] Space and comments as externals | ||
- [ ] Detection of non-immediate tokens | ||
- [ ] `require` and `reset` token | ||
- [ ] Enable extras | ||
- [ ] Remove explicit extras | ||
|
||
### Inlined `return` | ||
|
||
An inlined `return` statement, for some obscur reasons, is allowed to be followed by text and markup on the same line. So, the following code is valid Typst code: `#return a + b Hello World` | ||
|
@@ -163,3 +135,11 @@ To have it correctly recognized by the grammar, the termination token of a state | |
At the moment, I chose performance over correctness due to the very unlikelyhood of a return statement to be followed by text or markup. Finding a solution to have both performance and correctness would be truly awesome. | ||
|
||
I open a thread on Typst's github discussion [#2103](https://github.com/typst/typst/discussions/2103), and an issue [#2104](https://github.com/typst/typst/issues/2104) | ||
|
||
### Optimization with extras | ||
|
||
When searching ways to optimize the parser and simplify the grammar, I thought about using the *extras* feature for spaces and comments (and line breaks as well). At the end, it significantly reduced parser size. The only problem arises with function calls and, in inline code, field access. They must be directly joined (no space nor comment in between). The use of the *immediate* feature won't solve the problem as it only takes in acount inline regex (which would be ok with spaces but not comments, as they have to appear in output tree). | ||
|
||
The solution is to rely on external scanner when parsing spaces or comments. Lets call a "pre-immediate" token, a token susceptible to be followed by immediate token. When a pre-immediate token is parsed, it sets a flag to `true`, and when a space or comment is parsed, it resets the flag to `false` (this flag is stored in scanner's state as a boolean). | ||
|
||
This way when a token has to be immediate, an external token can be required and will only match if flag is `true`. It means, any pre-immediate token have to be preceded by a token that will set to `true` the flag. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
8350504 src/parser.c | ||
6533481 src/parser.c |