Skip to content
This repository has been archived by the owner on Oct 3, 2022. It is now read-only.

Allow editing hOCR (or TSV) files #19

Open
jwilk opened this issue Jun 1, 2016 · 1 comment
Open

Allow editing hOCR (or TSV) files #19

jwilk opened this issue Jun 1, 2016 · 1 comment

Comments

@jwilk
Copy link
Member

jwilk commented Jun 1, 2016

Issue reported by @jsbien:

I would like very much to be able to correct OCR recognition errors in the hOCR files before converting them to djvused scripts and using as the hidden text.

It can be a run-time option or a separate utility hocr2djvused.

Actually it would be probably better to replace hOCR files with TSV (currently available only in 3.05-dev in master branch on github).

A real life example: in the OCR results of Linde's dictionary I have to replace in particular 8626 occurences of Boss by Ross (an abreviation for Russian language). It seem easiest to do it on the hOCR level, which is used not only for the hidden text, but also for the Poliqarp corpus.

Thanks for your work on ocrodjvu!

JSB

@jsbien
Copy link

jsbien commented Aug 15, 2018

I think the issue can be closed now, as I moved its content to #27 and #28.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

2 participants