Skip to content
This repository has been archived by the owner on Oct 3, 2022. It is now read-only.

OCR engine executable path should be configurable #36

Open
xelxebar opened this issue Apr 24, 2019 · 3 comments
Open

OCR engine executable path should be configurable #36

xelxebar opened this issue Apr 24, 2019 · 3 comments

Comments

@xelxebar
Copy link

Overview

On Void Linux, the tesseract binary resides at /usr/bin/tesseract-ocr due to a naming conflict with the game Tesseract. It would be nice if the paths to the OCR engine could be explicitly specified, e.g. via a command line option, environment variable, or configuration file.

Version Information

$ ocrodjvu --version
ocrodjvu 0.11
+ Python 2.7.16
+ subprocess32
+ python-djvulibre 0.8.4
+ lxml 4.3.3

$ lsb_release --all
LSB Version:	1.0
Distributor ID:	VoidLinux
Description:	Void Linux
Release:	rolling
Codename:	void

Comments

For the moment, I am hacking around this issue by packing ocrodjvu on my distro with the following patch:

--- a/lib/engines/tesseract.py
+++ b/lib/engines/tesseract.py
@@ -111,7 +111,7 @@
     image_format = image_io.TIFF
     needs_utf8_fix = True
 
-    executable = utils.property('tesseract')
+    executable = utils.property('tesseract-ocr')
     extra_args = utils.property([], shlex.split)
     use_hocr = utils.property(None, int)
     fix_html = utils.property(0, int)
@jwilk
Copy link
Member

jwilk commented Apr 24, 2019

It's not documented at the moment, but you can specify the executable via command line with:

-X executable=tesseract-ocr

@xelxebar
Copy link
Author

Oh! Nice. Thanks for the quick feedback.
Are there any gotchas? If it's a reasonably stable option, would be nice to put it in the docs.

@jwilk
Copy link
Member

jwilk commented May 1, 2019

I considered using the Tesseract API (maybe through tesserocr), instead of using the CLI, which would would render the executable setting meaningless.
But realistically, the switch to API is unlikely to happen in the foreseeable future.

Yes, -X executable=… (and other -X goodies) should be documented.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants