Table Understanding Research
This research project aims at developing methods and software for the extraction of entities and their relationships from tables represented in unstructured and semi-structured data formats
This work was supported by the Russian Science Foundation (grant no. 18-71-10001). Our prior works were supported by the Russian Foundation for Basic Research (grant no. 12-07-31051 and grant no. 15-37-20042) and the Council for grants of the President of the Russian Federation (Scholarship No. SP-3387.2013.5)
- tabbypdf, Rule-based PDF table extraction
- tabbypdf2, Deep-learning-based PDF table extraction
- tabbyxl, Rule-based spreadsheet data extraction
- tabbyld, Semantic table interpretation using open knowledge graphs
Shigarov A. (2022). Table understanding: Problem overview. WIREs Data Mining and Knowledge Discovery, 13(1), e1482.
Kostyleva O., Paramonov V., Shigarov A., Vetrova V. (2022). Towards comparison of table type taxonomies. 45th Jubilee Int. Conv. on Information, Communication and Electronic Technology (MIPRO), 1461-1465.
Dorodnykh N., Yurin A., Shigarov A., Turdakov D. (2021). Ontology engineering at the assertion level based on semantic annotation of tabular data. 2021 Ivannikov Memorial Workshop (IVMEM). 28-34.
Yurin A., Dorodnykh N., Shigarov A. (2021). Semi-automated formalization and representation of the engineering knowledge extracted from spreadsheet data. IEEE Access. 9, 157468-157481.
Paramonov V., Shigarov A., Vetrova V. (2021). Rule-driven spreadsheet data extraction from statistical tables: case study. Information and Software Technologies. ICIST 2021. CCIS 1486, 84-95.
Dorodnykh N., Yurin A. (2021). TabbyLD: a tool for semantic interpretation of spreadsheets data. Modelling and Development of Intelligent Systems. MDIS 2020. CCIS 1341, 315-333.
Dorodnykh N., Shigarov A., Yurin A. (2022). Using the semantic annotation of web table data for knowledge base construction. Proc. 4th Artificial Intelligence and Cloud Computing Conference. AICCC'21, 122-129.
Dorodnykh N., Yurin A. (2022). Extraction of facts from web-tables based on semantic interpretation tabular data. 2022 Ivannikov Memorial Workshop (IVMEM), 7-17.
Mikhailov A., Shigarov A. Page layout analysis for refining table extraction from PDF documents. 2021 Ivannikov Ispras Open Conference (ISPRAS), 114-119.
Mikhailov A., Shigarov A., Rozhkov E., Cherepanov I. (2020). On graph-based verification for PDF table detection. 2020 Ivannikov ISPRAS Open Conference (ISPRAS). 91-95.
Cherepanov I., Mikhailov A., Shigarov A., Paramonov V. (2020). On automated workflow for fine-tuning deep neural network models for table detection in document images. 2020 43rd International Convention on Information, Communication and Electronic Technology. 1130-1133.
Dorodnykh N. & Yurin A. (2020). Towards a universal approach for semantic interpretation of spreadsheets data. Proc. 24th Symposium on International Database Engineering & Applications. Article 22, 1-9.
Paramonov V., Shigarov A., Vetrova V. (2020). Table header correction algorithm based on heuristics for improving spreadsheet data extraction. Information and Software Technologies. 1283 CCIS, 147-158.
Yurin A. & Dorodnykh N. (2020). Experimental evaluation of a spreadsheets transformation in the context of domain model engineering. Ural S. Biomedical Engineering, Radioelectronics and Information Technology. 0388-0391.
Dorodnykh N., Yurin A., Shigarov A. (2020). Conceptual model engineering for industrial safety inspection based on spreadsheet data analysis. Modelling and Development of Intelligent Systems. 1126 CCIS, 51-65.
Paramonov, V., Shigarov, A., Vetrova, V., Mikhailov, A. (2020). Heuristic algorithm for recovering a physical structure of spreadsheet header. Information Systems Architecture and Technology. 1050 AISC, 140-149.
Yurin A. & Dorodnykh N. (2019). A reverse engineering process for inferring conceptual models from canonicalized tables. 2019 Int. Multi-Conf. on Engineering, Computer and Information Sciences (SIBIRCON). 0485-0490.
Shigarov, A., Khristyuk, V., Mikhailov, A., Paramonov, V. (2019). TabbyXL: rule-based spreadsheet data extraction and transformation. Information and Software Technologies. 1078 CCIS, 59-75.
Presentation -
Shigarov, A., Khristyuk, V., Mikhailov, A. (2019). TabbyXL: software platform for rule-based spreadsheet data extraction and transformation. SoftwareX, 10.
Preprint -
Shigarov, A., Cherepanov, I., Cherkashin, E., Dorodnykh, N., Khristyuk, V., Mikhailov, A., Paramonov, V., Rozhkow, E., Yurin A. (2019). Towards end-to-end transformation of arbitrary tables from untagged portable documents (PDF) to linked data. CEUR-WS Proc. 2463, 1-12.
Article -
Shigarov, A., Khristyuk, V., Mikhailov, A., Paramonov, V. (2019). Software development for rule-based spreadsheet data extraction and transformation. Proc. 42nd Int. Convention on Information and Communication Technology, Electronics and Microelectronics. 1132-1137.
Preprint -
Cherkashin, E., Shigarov, A., Paramonov, V., Mikhailov, A. (2019). Digital archives supporting document content inference. Proc. 42nd Int. Convention on Information and Communication Technology, Electronics and Microelectronics. 1037-1042.
Preprint -
Dorodnykh, N., Yurin, A. (2019). Towards ontology engineering based on transformation of conceptual models and spreadsheet data: a case study. Intelligent Systems Applications in Software Engineering. 1046 AISC, 233-247.
Preprint -
Paramonov, V., Shigarov, A., Ruzhnikov, G., Cherkashin, E. (2019). Phonetic string matching for languages with Cyrillic alphabet. Information Systems Architecture and Technology. 852 AISC, 301-311.
Shigarov, A., Altaev, A., Mikhailov, A., Paramonov, V., Cherkashin, E. (2018). TabbyPDF: web-based system for PDF table extraction. Information and Software Technologies. 920 CCIS, 257-269.
Preprint -
Yang, S., Wei, R., Shigarov, A. (2018). Semantic interoperability for electronic business through a novel cross-context semantic document exchange approach. Proc. 18th ACM Symposium on Document Engineering. 28:1-28:10.
Cherkashin, E., Kopaygorodsky, A., Kazi, L., Shigarov, A., Paramonov, V. (2018). Model driven architecture implementation using linked data. Information and Software Technologies. 920 CCIS, 412-423.
- Shigarov, A., Mikhailov, A. (2017). Rule-based spreadsheet data transformation from arbitrary to relational tables. Information Systems. 71, 123-136.
Shigarov, A., Mikhailov, A., Altaev, A. (2016). Configurable table structure recognition in untagged PDF documents. Proc. 16th ACM Symposium on Document Engineering. 119-122.
Poster -
Shigarov, A., Paramonov, V., Belykh, P., Bondarev, A. (2016). Rule-based canonicalization of arbitrary tables in spreadsheets. Information and Software Technologies. 639 CCIS, 78-91.
Preprint -
Paramonov, V., Shigarov, A., Ruzhnikov, G., Belykh, P. (2016). Polyphon: an algorithm for phonetic string matching in Russian language. Information and Software Technologies. 639 CCIS, 568-579.
Preprint -
Шигаров, А. (2016). Методологическое и программное обеспечение трансформации табличных данных от произвольной к реляционной форме. Научная секция заседания Объединенного ученного совета СО РАН по нанотехнологиям и информационным технологиям.
Shigarov, A. (2015). Table understanding using a rule engine. Expert Systems with Applications. 42(2), 929-937.
Presentation -
Shigarov, A. (2015). Rule-based table analysis and interpretation. Information and Software Technologies. 538 CCIS, 175-186.
Preprint -
Шигаров, А. О., Бычков, И. В., Парамонов, В. В., Белых, П. В. (2015). Анализ и интерпретация произвольных таблиц на основе исполнения CRL-правил. Вычислительные технологии. 20(6), 87-112.
Preprint -
Shigarov, A., Paramonov, V. (2015). CRL: a rule language for analysis and interpretation of arbitrary tables. CEUR-WS Proc. 1536, 22-29.
Шигаров, А. О. (2014). Восстановление логической структуры таблиц из неструктурированных текстов на основе логического вывода. Вычислительные технологии. 19(1), 87-99.
Preprint -
Shigarov, A. (2014). Automated table understanding using a rule engine. CEUR-WS Proc. 1297, 216-223.
- Шигаров, А. О., Бычков, И. В., Ружников, Г. М., Хмельнов, А. Е., Федоров, Р. К. (2013). Система трансформации таблиц. Информационные технологии и вычислительные системы. 3, 15-26.
- Shigarov, A., Fedorov, R. (2011). Simple algorithm for page layout analysis. Pattern Recognition and Image Analysis. 21(2), 324-327.
Шигаров, А. О. (2009). Технология извлечения табличной информации из электронных документов разных форматов. Дис. канд. техн. наук.
PhD Thesis
PhD Abstract
Presentation -
Shigarov, A., Bychkov, I., Hmelnov, A., Ruzhnikov, G. (2009). A method for table detection in metafiles. Pattern Recognition and Image Analysis. 19(4), 693-697.
Poster -
Бычков, И. В., Ружников, Г. М., Хмельнов, А. Е., Шигаров, А. О. (2009). Эвристический метод обнаружения таблиц в разноформатных документах. Вычислительные технологии. 14(2), 58-73.
- Хмельнов, А. Е., Шигаров, А. О. (2008). Метод извлечения таблиц из неформатированного текста. Вычислительные технологии. 13(1), 93-101.
Office 222, Block EVM, Lermontov st. 134, Irkutsk, Russia, 664033 Department of Information Technology and Systems, Matrosov Institute for System Dynamics and Control Theory, Siberian Branch of the Russian Academy of Sciences
Alexey Shigarov (e-mail: [email protected])