Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add anonymous token support. #42

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pragmaticpandy
Copy link

Work in progress; haven't fixed all the tests yet. Threw this together and wanted to confirm you are interested in the patch and get any other feedback before I spend more time on it.

See changes to the readme and OrTest to get a quick idea how this changes the public API.

#36

@h0tk3y
Copy link
Owner

h0tk3y commented May 2, 2021

@pragmaticpandy Thanks a lot for looking into it! I like the idea overall, but what actually bothers me is that anonymous tokens (or tokens collected from the grammar's parsers in general) are ordered implicitly. When the set of tokens is ambiguous, their order matters, and the easiest way to order tokens in the way you want is to declare them explicitly so that the precedence comes from the order of declaration.

What do you think about sorting the anonymous (parser-provided) tokens by their enclosing parsers order? I mean, if an anonymous token is declared and used in a parser that goes in a grammar before another parser, then this token takes precedence over the other parser's anonymous tokens? I'm not sure this kind of ordering is intuitive enough, but I'd say it's a bit more intuitive than ordering the tokens by their occurrences in the nested, complex parser structure.

I also think about experimenting with token-less parsing, where a Parser<T> can directly match characters of the input string. If I manage to get it to work fast enough, then the problem of explicit tokens declaration goes away, too, because there will be no need to tokenize the input sequence in the first place.

@BenjaminHolland
Copy link

@h0tk3y re: Tokenless parsers
I believe this is what Superpower (C#) does. Out of the box provides string-based tokenless parsing, and provides the tokenization layer above it.

@pragmaticpandy
Copy link
Author

Apologies for abandoning this PR for so long.

Totally agree on explicit declaration being most clear.

I was burned on this issue again today because I tried to use something like the following:
val positiveInt by regexToken("\\d+") use { text.toInt() }
Being so simple, I expected it to just work; I typed it alongside my other tokens and didn't give it a second thought.

Then, I was confused for a while when I got a NoMatchingToken error.

What I really want is to have not been confused by such a situation; I have been at least twice now. My original draft PR suggests dealing with this by supporting anonymous tokens, but perhaps a better solution is simply to detect any anonymous tokens, and throw an exception that explains the situation and prompts the user to add explicit token declarations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants