Allow editing hOCR (or TSV) files #19

jwilk · 2016-06-01T06:21:10Z

Issue reported by @jsbien:

I would like very much to be able to correct OCR recognition errors in the hOCR files before converting them to djvused scripts and using as the hidden text.

It can be a run-time option or a separate utility hocr2djvused.

Actually it would be probably better to replace hOCR files with TSV (currently available only in 3.05-dev in master branch on github).

A real life example: in the OCR results of Linde's dictionary I have to replace in particular 8626 occurences of Boss by Ross (an abreviation for Russian language). It seem easiest to do it on the hOCR level, which is used not only for the hidden text, but also for the Poliqarp corpus.

Thanks for your work on ocrodjvu!

JSB

jsbien · 2018-08-15T07:04:43Z

I think the issue can be closed now, as I moved its content to #27 and #28.

jwilk added the enhancement label Nov 22, 2016

This was referenced Aug 15, 2018

Conversion of hocr to djvused as a separate utility #27

Closed

TSV support (tsv2djvused) #28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow editing hOCR (or TSV) files #19

Allow editing hOCR (or TSV) files #19

jwilk commented Jun 1, 2016

jsbien commented Aug 15, 2018

Allow editing hOCR (or TSV) files #19

Allow editing hOCR (or TSV) files #19

Comments

jwilk commented Jun 1, 2016

jsbien commented Aug 15, 2018