-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ted crashs/hangs on binary files [bug] #40
Comments
Maybe its a problem with the utf8 encoding/decoding mechanism |
Try opening without utf8 |
Sorry, closed for accident. |
Is there a way to disable utf8? |
Ok I'll try this asap |
Ok I tried it and seems like that when commenting that part of the code, lines are no longer misplaced, but now ted hangs on line ~450. |
Maybe the problem lies within a call to utf8ToMultibyte from show_lines, but I am unsure on how to handle that |
I have retried and ted reads successfully till the end of the binary file, however it still misplaces the cursor sometimes and up to a certain point becomes so clunky I can't even tell if its crashed or just slow, so this seems like a bug to me |
I think we need a system to ignore characters that don't represent anything in text. |
By replacing with something like this (https://www.compart.com/en/unicode/U+16844) ? I think its a very good idea, and even production grade editors do that. |
The worst problem is that in case of a malformed unicode sequence we could read arrays out of bound or end up in other UB situations |
Also I have successfully read over 5000 lines with ted, so this theory about memory is officially disproved.
|
In theory, ted can read ~4gb |
I'll look into this, quoting from rfc 3629
|
I have tried to add the utf8 validation but I am unsure if it is actually working well, I would appreciate if you can check here https://github.com/bynect/Teditor/blob/utf8try/src/utf8.c (I have made only small changes) I tried to replace all invalid codepoints with this one |
I cut it out from the screenshot, but the file was detected as lf, and to my knowledge all files, binary or not, are lf terminated on Linux |
If you download a file that was made in Windows, it probally will be CRLF. |
Well yes, but in this case the file was produced on Linux, specifically by gcc from a little test c code |
I will take a look at it later, now I am changing other things in the code. |
Regarding this I think I have just found the root cause of misplaced lines et al, which is the way read_lines read linebreaks. Ive also found out that binary files are CRLF terminated even on linux, and this problem is related to that |
Probably the problem is that ted is trying to find the type of the line break, while it does not have a consistent line break type. |
Exactly, furthermore I found that read_lines discards a character after having found a carriage return without checking for the linefeed in CRLF mode. Also I was trying to implement an heuristics that would guess in an acceptable manner if the file is valid utf8/non binary. |
If we remove CR line break support, we can just ignore carriage returns when reading the file. |
CR is not used in any modern operating system. There is no why to support it. |
|
Still, it seems like the editor isn't dealing well with corrupt unicode |
Corrupted unicode is not being displayed correctly but it's good enough. No crashes should be happening and the screen is being displayed correctly. |
I erroneously opened a binary file in ted and it behaved strangely (misplaced cursor and line numbers), and after a certain line (300 or something) just hanged. I have yet to reproduce this behavior on multiple files, but I will post updates on this bug soon.
The text was updated successfully, but these errors were encountered: