Skip to content

Commit

Permalink
Merge pull request #6 from tabbydoc/dev
Browse files Browse the repository at this point in the history
A prototype of TabbyLD2 (modified version of CEA and CTA tasks)Dev
  • Loading branch information
LedZeppe1in authored Mar 3, 2023
2 parents a1677d6 + 112272b commit 9a31f69
Show file tree
Hide file tree
Showing 7,171 changed files with 482,060 additions and 6,798 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
24 changes: 24 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[flake8]

import-order-style = pycharm
max-line-length = 140
max-complexity = 15
ignore =
E722, # duplicates B001 from flake8-bugbear
E731, # we want use lambdas
C408, # dict(), list(), tuple() is ok
W503, # line breaks before binary operator is ok according to PEP8, flake8 error
A003, # python builtins as class attributes is ok
F541 # f-string without placeholders is ok
exclude =
.git,
.idea,
*.pyc,
__pycache__,
resources,
etc,
model,
wlcoref,
api_schema.py,
.venv,
venv
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,9 @@ results/

# db
bd.xls
w2v_model

cnn/
predictions/
predictions/
/tabbyld2/table_annotation/colnet/in_out
/tabbyld2/table_annotation/w2v_model/
/tabbyld2/table_annotation/colnet/w2v_model/
35 changes: 27 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# TabbyLD2

A web-based application to annotate relational tables and generate knowledge graphs.
**TabbyLD2** is a web-based application for semantic annotation of relational tables and generation of facts from annotated tabular data to populate knowledge graphs.

## Version

0.3
0.4

## Preliminaries

Expand Down Expand Up @@ -90,13 +90,19 @@ def __str__(self):

* `datasets` contains datasets of source tables for experimental evaluation:
* `T2Dv2` contains [T2Dv2 Gold Standard](http://webdatacommons.org/webtables/goldstandardV2.html) dataset, where `col_class_checked_fg.csv` was formed by [SemAIDA](https://github.com/alan-turing-institute/SemAIDA/tree/master/AAAI19/T2Dv2) and is fine-grained ground truth class for all columns;
* `Tough_Tables` contains [Tough Tables (2T)](https://zenodo.org/record/4246370#.Yf5AO-pBw2w) dataset. **NOTE:** `CEA_2T_gt.zip` must be unzipped before receiving an experimental evaluation.
* `Tough_Tables` contains [Tough Tables (2T)](https://zenodo.org/record/4246370#.Yf5AO-pBw2w) dataset. **NOTE:** `CEA_2T_gt.zip` must be unzipped before receiving an experimental evaluation;
* `GitTables_SemTab_2022` contains [GitTables](https://gittables.github.io/) dataset that was applied in the [SemTab-2022](https://sem-tab-challenge.github.io/2022/) competition for Column Type Annotation by DBpedia (GT-CTA-DBP).
* `examples` contains table examples in the CSV format for testing;
* `experimental_evaluation` contains scripts for obtaining an experimental evaluation on tables presented in `datasets` directory;
* `results` contains processing results of tables (*this directory is created by default*);
* `source_tables` contains examples of source tables in the CSV format for testing;
* `tabbyld2` contains software TabbyLD2 modules, including `main.py` for a console mode and `app.py` for a web mode, and also:
* `colnet` contains ColNet framework for annotating categorical columns (NE-columns).
* `w2v_model` contains pre-train word2vec model. **NOTE:** this model is installed and placed independently.
* `source_tables` is the folder in which you need to place CSV files of source tables for processing (*contains two table files for testing by default*);
* `tabbyld2` contains TabbyLD2 modules, including `main.py` for a console mode and `app.py` for a web mode, and also:
* `datamodel` contains description of tabular data and knowledge graph models;
* `helpers` contains various useful functions for working with files, data, etc.;
* `preprocessing` contains table preprocessing module, which includes data cleaning, atomic column classification, subject column identification;
* `table_annotation` contains semantic table annotator for CEA and CTA tasks. This module also contains:
* `colnet` contains ColNet framework for annotating categorical columns (NE-columns);
* `w2v_model` contains pre-train word2vec model. **NOTE:** this model is installed and placed independently.

## Usage

Expand Down Expand Up @@ -125,4 +131,17 @@ python app.py
## Authors

* [Nikita O. Dorodnykh](mailto:[email protected])
* [Daria A. Denisova](mailto:[email protected])
* [Aleksandr Yu. Yurin](mailto:[email protected])

## Developers

* [Nikita O. Dorodnykh](mailto:[email protected])
* [Daria A. Denisova](mailto:[email protected])
* [Vitaliy V. Biryuckov](mailto:[email protected])
* [Ilgar V. Amiraslanov](mailto:[email protected])

## References

* Dorodnykh N.O., Shigarov A.O., Yurin A.Yu. **Using the Semantic Annotation of Web Table Data for Knowledge Base Construction.** AICCC'21: Proceedings of the 4th Artificial Intelligence and Cloud Computing Conference, 2022, P. 122-129. DOI: 10.1145/3508259.3508277
* Dorodnykh N.O., Yurin A.Yu. **TabbyLD: A Tool for Semantic Interpretation of Spreadsheets Data.** Communications in Computer and Information Science. Modelling and Development of Intelligent Systems (MDIS 2020), 2021, Vol. 1341, P. 315-333. DOI: 10.1007/978-3-030-68527-0_20
* Dorodnykh N.O., Yurin A.Yu. **Towards a universal approach for semantic interpretation of spreadsheets data.** IDEAS'20: Proceedings of the 24th Symposium on International Database Engineering & Applications, 2020, No. 22, P. 1-9. DOI: 10.1145/3410566.3410609
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.4
Loading

0 comments on commit 9a31f69

Please sign in to comment.