-
Notifications
You must be signed in to change notification settings - Fork 57
Overview over the QLever codebase
This page should provide an overview over the large QLever codebase, as an entry assistance for developers joining the project. It is currently still a stub, but it's a start.
QLever's current Turtle parser is hand-written (not generated via a parser generator like for the SPARQL parser, which uses ANTLR). The main code is in src/parser/RdfParser.{h,cpp}
. It uses functions like bool TurtleParser::iriref()
, which try to parse a piece of the grammar (an IRIREF in this case) from the current input and return true
if they succeed (in which case lastParseResult_
is updated and the respective part is removed from the input) and false
otherwise. The classes are:
template <class Tokenizer> class TurtleParser : public RdfParserBase
template <class Tokenizer> class RdfMultifileParser : public RdfParserBase
template <class Tokenizer> class NQuadParser : public TurtleParser<Tokenizer>
template <typename Parser> class RdfStringParser : public Parser
template <typename Parser> class RdfStreamParser : public Parser
template <typename Parser> class RdfParallelParser : public Parser
Note that TurtleParser
sets UseRelaxedParsing
to true iff Tokenizer == TokenizerCtre
iff ascii-prefixes-only == true
in the settings.jons
file.
The code for tokenization is in src/parser/Tokenizer.{h,cpp}
. The struct TurtleToken
holds the regexes for the various tokens, e.g. Dot = grp("\\.")
, where grp
puts (...)
around its argument (and cls
puts [...]
around its argument).
Another example is PnLocal = grp(PnLocalString)
, where PnLocalString
is ([%BASE%_:0-9]|%[0-9A-Fa-f]{2}|\\[_~.\-!$&'()*+,;=/?#@%])(\.*([%BASE%_\-0-9\x{00B7}\x{0300}-\x{036F}\x{203F}-\x{2040}:]|%[0-9A-Fa-f]{2}|\\[_~.\-!$&'()*+,;=/?#@%]))*