GitHub - adityaviswanathan/parser-kit: Tools to build a parser wrapped as a service

Requirements/Setup

System requirements:

Python 3.x (brew install python3)
pip package manager for Python (the latest stable install script can be found on their website)

Managing dependencies

I recommend using a tool like virtualenvwrapper to manage dependencies from project to project, particularly because this project is written in Python3 while MacOS ships with Python2. You can install it via pip:

$ sudo pip install virtualenvwrapper
...
$ source /usr/local/bin/virtualenvwrapper.sh

Once that's installed, find the path of (newly installed) Python3 on your system with:

$ which python3
> PATH_TO_PY3

Use that path when you create your virtualenv for this project, and then install the Python prerequisites via pip:

$ mkvirtualenv --python=PATH_TO_PY3 parser-kit 
(parser-kit)$ pip install -r requirements.txt

Tools to build a parser

parser-kit provides tools to build a language model from the ground up. Start with:

Example text T
Custom Semantics S (entity types that have arcs to other entities in the dependency tree)

With these inputs, we can build a parser with the following workflow:

labeler.py is a CLI labeling tool of the form f(T, S); it emits labeled training data L (T annotated with S)

$ python labeler.py --inputfile=examples/examples.txt --outfile=training/labeled.out

model_generator.py is a tool of the form f(L); it emits a language model M

$ python model_generator.py --inputfile=training/labeled.out --outfile=models/model1

With a language model, suppose we want to parse a new blob of text B:

parser.py is a tool of the form f(M, B); it emits a parse that adheres to S

$ python parser.py --text "how many wood could a wood chuck chuck if a wood chuck could chuck wood" --inputfile=models/model1
...

HTTP interface (WIP)

parser-kit wraps a static language model with a HTTP server that handles inbound text blobs and returns their parses according to M. Parses will be serialized via protobuf so clients will need to import the parse proto defined in this repo.

Clients can serialize/deserialize messages to the server by using interfaces in semantics.proto.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
proto_tests		proto_tests
.gitignore		.gitignore
README.md		README.md
labeler.py		labeler.py
model_generator.py		model_generator.py
parser.py		parser.py
proto.sh		proto.sh
requirements.txt		requirements.txt
run.py		run.py
semantics.proto		semantics.proto
semantics_pb2.py		semantics_pb2.py
server.py		server.py
test_server.py		test_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements/Setup

Managing dependencies

Tools to build a parser

HTTP interface (WIP)

About

Releases

Packages

Languages

adityaviswanathan/parser-kit

Folders and files

Latest commit

History

Repository files navigation

Requirements/Setup

Managing dependencies

Tools to build a parser

HTTP interface (WIP)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages