-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help: Code that fixes the issue of inflections on Kindle dictionaries #1
Comments
Hello, my solution is here. Note that you need to replace 1 function and a constant in pyglossary as described at the top of the file. The best future scenario would be if we could add this to pyglossary directly. |
Here is a project that converts a tabfile to a fixed kindle dictionary: https://github.com/Vuizur/pyglossary-kindle-test/tree/master/pyglossary_kindle_test |
I have created a short tab (and | ) separated txt file for Finnish-English and tested it, it works brilliantly, though some of the words shows up twice in the dictionary. But I suspect this is an expected behavior. Thank you. Checked words below showed up twice in the dictionary:
|
With great regret, anguish and disappointment, I must tell you that after creating a Finnish-English dictionary many times from a 143 250 line txt file, the look-ups for inflected forms of the words failed almost entirely. I checked the txt file to see if anything is wrong with the formatting and also checked the xhtml files, and inflected forms were recorded inside infl tags. Look-up for inflections only worked when txt file were small, for example I selected 35 words out of 143 250 and created a 35-line txt to make a dictionary and it worked. I also always get this message at the beginning after executing
|
The error message about If I understand it correctly you tried to use these dictionaries on kindle? I think it might have problems with the huge number of inflections. I would try to re-run the program like described in the README.md of this repo with the option try_to_fix_failed_inflections set to False. Maybe kindle will work better with this one. |
Yes, I am sending the created dictionaries to my Kindle e-reader to test them. I already have the same dictionary with 143 250 entries that I am trying to fix in my e-reader, the device handles it with all inflections. I created it using mobigen (much faster than kindlegen), it is about 10 MB. But of course the inflections are messed up because of the Kindle algorithm and headwords clash with inflections. I later created sample dictionaries with 100, 1000, and 10 000 entries using pyglossary-kindle-test repository, all seemed to work fine. I also created Spanish-English dictionary using the En-Es.txt file that comes with that repo, and this one, too, worked fine. But not the Finnish dictionary with all entries included. A moment ago I tried ebook_dictionary_creator repo to create a Finnish dictionary and I got this error: Traceback (most recent call last):
File "C:\Users\user\Desktop\Py Project\Project Dictio\trial\dictio.py", line 5, in
<module>
dict_creator.create_database()
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\ebook_dictionary_creator\e_dictionary_creator\dictionary_creator.py", line 64, in create_database
create_database.create_database(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\ebook_dictionary_creator\database_creator\create_database.py", line 728, in create_database
obj = json.loads(line)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\json\__init__.py",
line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 266 (char 265)
Traceback locals:
self = <json.decoder.JSONDecoder object at 0x0000020371ABFE50>
s = '{"pos": "noun", "head_templates": [{"name": "head", "args": {"1": "f...
len(s) = 277
idx = 0``` |
I think Finnish simply has too many inflections for that terrible kindlegen program. If you hit some completely arbitrary limits, it will refuse to work correctly, and also gives you no hint on how to fix this or which entry exactly is responsible for the failure. Very bad software, but unfortunately there is no solution. Hmm, I tried creating a Finnish dictionary on my Windows system, but here it worked well. So I would need system/Python version info to maybe replicate it. |
Hello Hannes,
I've been trying to create a decent Kindle dictionary based on Wiktionary [English-Finnish] for some time and I came across the issue of inflections clashing with headwords when I tried to create the dictionary like you did, I even created a thread on mobileread two days ago about this topic.
I see that you have solution, an algorithm to fix this issue. Can you please explain how you solved this issue? I've read the document titled "The stupid kindle algorithm" and you mention that it can be fixed with three lines of code, can you please share this code and explain how this code can be used to fix the issue?
The text was updated successfully, but these errors were encountered: