Overview

This package is for testing Polish word sense disambiguation with BERT. Currently we're focusing on performing tests on the small plWordnet3-annotated corpus made for CoDeS. We compare the BERT embedding of the token to disambiguate with embeddings of tokens of the same lemma that we know are of certain sense (because they appear in the reference corpus or Wordnet glosses).

This is a work in progress. It's also intended to deprecate the gibber code down the line (better code quality, models etc.).

Installation

Requirements

Python 3.7 or newer
pip
virtualenv
Docker

Resources needed

Slavic BERT files for pytorch from DeepPavlov
Polish Wordnet (Słowosieć) XML file
the CoDeS small sense-annotated corpus of Polish
optionally NKJP1M (i.e., the 1-million subcorpus)
KRNNT (we install it below inside Docker)

Installation process

docker pull djstrong/krnnt:1.0.1
virtualenv .
source bin/activate
pip3 install -r requirements.txt # this may be just pip on some platforms
deactivate

Running

In one terminal window:

docker run -p 9003:9003 -it djstrong/krnnt
# To kill, ctrl+c

In another terminal window:

source bin/activate
# After you review local_settings.py, run this to see the options:
python3 run.py --help
# (this may be just python instead of python3 on your machine)
# Plain `python3 run.py` will just train and test an embedding dictionary from Wordnet and the train corpus.
# After you're done:
deactivate

To test:

source bin/activate
python3 test.py
# After you're done:
deactivate

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
wsd		wsd
.gitignore		.gitignore
README.md		README.md
local_settings.py		local_settings.py
requirements.txt		requirements.txt
run.py		run.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installation

Requirements

Resources needed

Installation process

Running

About

Releases

Packages

Languages

szmer/BERTPolishWSD

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Requirements

Resources needed

Installation process

Running

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages