You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 3, 2022. It is now read-only.
I would like very much to be able to correct OCR recognition errors in the hOCR files before converting them to djvused scripts and using as the hidden text.
It can be a run-time option or a separate utility hocr2djvused.
Actually it would be probably better to replace hOCR files with TSV (currently available only in 3.05-dev in master branch on github).
A real life example: in the OCR results of Linde's dictionary I have to replace in particular 8626 occurences of Boss by Ross (an abreviation for Russian language). It seem easiest to do it on the hOCR level, which is used not only for the hidden text, but also for the Poliqarp corpus.
Thanks for your work on ocrodjvu!
JSB
The text was updated successfully, but these errors were encountered:
Issue reported by @jsbien:
I would like very much to be able to correct OCR recognition errors in the hOCR files before converting them to djvused scripts and using as the hidden text.
It can be a run-time option or a separate utility hocr2djvused.
Actually it would be probably better to replace hOCR files with TSV (currently available only in 3.05-dev in master branch on github).
A real life example: in the OCR results of Linde's dictionary I have to replace in particular 8626 occurences of Boss by Ross (an abreviation for Russian language). It seem easiest to do it on the hOCR level, which is used not only for the hidden text, but also for the Poliqarp corpus.
Thanks for your work on ocrodjvu!
JSB
The text was updated successfully, but these errors were encountered: