Issues when using a paragraph as input #4

mpedraza98 · 2024-05-01T04:08:57Z

I have been recently using the model. However, when trying to use a paragraph or some long string as input the annotator shows an error due to

---> 22         assert len(tokens) == len(pos_tags)
     23         assert len(tokens) == len(ner_tags)
     24         annotation = {}

AssertionError:

It seems like the number of tokens differs from the number of tags. This doesn't happen with a shorter string. I tried using POSTagger.generate_tags() and I get a list of tags that is around a third of the number of words in the paragraph. Is there some size restriction in the text that can be used as input? How can I work around this issue?

This is the text I was using as an example

През 1878 г., след почти век на културно и икономическо възраждане, неуспешни въстания и дипломатически борбиБългария възстановява държавността си под формата на монархия и се освобождава от петвековното османско владичество с помощтана Руската империя в Руско-турската Освободителна война. Малко след това България започва да води редица войни със своите съседии се съюзява с Германия по време на двете световни войни. На 15 септември 1946 г. монархията е заменена с народна република, от съветски тип и държавата се преименува на Народна република България, ръководена от Българската комунистическа партия. Социалистическият строй съществува до 1990 г., след което България поема по пътя на либералната демокрация и пазарната икономика. На 29 март 2004 г. страната се присъединява към НАТО, а на 1 януари 2007 г. – към Европейския съюз.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues when using a paragraph as input #4

Issues when using a paragraph as input #4

mpedraza98 commented May 1, 2024

Issues when using a paragraph as input #4

Issues when using a paragraph as input #4

Comments

mpedraza98 commented May 1, 2024