error msg "No image suitable for OCR" is too vague #21

ghost · 2017-03-15T18:29:52Z

Every document I receive from a particular source is deemed "unsuitable" by ocrodjvu and results in a session that looks like this:

$ ocrodjvu --debug --engine=tesseract -l eng --in-place document.djvu
Processing 'document.djvu':

Page process multiple html files with hocr2djvused #1
No image suitable for OCR.
Intermediate files were left in the '/tmp/ocrodjvu.ErLquU' directory.

The same error results if cuneiform is the engine, so apparently the error is not coming from the engine. Is ocrdjvu enforcing a certain image property, such as DPI? I see no image requirements in the manpage, so certainly It would be useful if the error message would list the requirements, and ideally indicate the unmet ones.

The text was updated successfully, but these errors were encountered:

jwilk · 2017-04-06T19:37:09Z

Thanks for the bug report.

Yes, the warning comes from ocrodjvu itself. I agree that the message is rather obscure.

By default, ocrodjvu passes only page's mask to the OCR engine. (See the --render option in the manpage.)
The warning is emitted if there was no mask at all for this page.

jwilk added bug help wanted labels Feb 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error msg "No image suitable for OCR" is too vague #21

error msg "No image suitable for OCR" is too vague #21

ghost commented Mar 15, 2017 •

edited by ghost

Loading

jwilk commented Apr 6, 2017

error msg "No image suitable for OCR" is too vague #21

error msg "No image suitable for OCR" is too vague #21

Comments

ghost commented Mar 15, 2017 • edited by ghost Loading

jwilk commented Apr 6, 2017

ghost commented Mar 15, 2017 •

edited by ghost

Loading