Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"stage output" -> mixed: picture zones layer #191

Open
Golddouble opened this issue Jul 4, 2022 · 1 comment
Open

"stage output" -> mixed: picture zones layer #191

Golddouble opened this issue Jul 4, 2022 · 1 comment

Comments

@Golddouble
Copy link

Golddouble commented Jul 4, 2022

This is a question to the function "stage output" -> mixed

I do not understand, what the advantage (or sense) of the function "mixed" has compared with "Colour / Gray scale" mode?

Question 1:
Is the sense, to save place? (I mean to make the tiff-file smaller compared with "Colour / Gray scale" mode)?

When I choose mixed-mode, then I can use the "picture zones to automatically detect pictures and separate them from text.

One problem of tesseract based OCR programmes is, that they can not proper separate text from picture. It looks like ScanTailor can this better. And in the tesseract based OCR programmes we have not the possibility to manually mark/select "text areas" to help tesseract only to apply OCR on areas that are really text.

So I ask me, if I can in any way use the mixed -> picture/text zones detected through ScanTailor in my OCR programme.

Question 2:
You speak about "auto layers" that can be seen in the tab "picture zones". Are this zones somehow saved in the resulting tiff?
And if yes, can my OCR programme this zones use, to decide, if it should apply OCR to find text in a certain zone or not.

Would appreciate some answer.
Thank you.

@mara004
Copy link

mara004 commented Jul 4, 2022

I think your guesses are largely correct.
The mixed mode is used to separate image and text areas. Since the results produced by free algorithms often aren't perfect, it makes sense to have means of manual adjustment in a GUI like ScanTailor, as you say.
The result may then be used for a variety of purposes, including the creation of mixed raster content DjVu or PDF files with particularly efficient compression, due to the use of different encoders for text and images. Or it could be passed to an OCR tool that needs this information as well. Whether that's possible with tesseract on the command-line I don't know, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants