-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLiM fails inscrutibly on Windows when there are funny characters in the script at treeSeqOutput(..., includeModel=T) #488
Comments
Ah-ha! I got the file in question, and ran it locally, where it gives an informative error:
I think one of the meanings of the inscrutible Windows error code was "unhandled exception"; so that's probably what's going on. And, here's a script that triggers it: |
Hmm. I'm not sure what exactly the bug even is here. :-> JSON should support UTF-8, so I'm surprised the JSON library is throwing an exception. SLiM should catch that exception; I guess I didn't realize that the call in question could even throw, so it isn't inside any try block. But really this shouldn't happen at all. Ah, I'd check the character encoding of the input file. It is probably in some Windows-specific character encoding, whereas modern software (including SLiM) tends to assume UTF-8 encoding unless told otherwise (and there is no way to tell SLiM otherwise, since who but a crazy person would use anything but UTF-8 nowadays? :->). I don't think there's a reliable way for SLiM to detect that a different character encoding is being used. So. I'll add a try block around the JSON encoding call, so then you should get a normal error message from SLiM; but the test would still fail, just in a more transparent way. The way to fix the test is to use UTF-8 for input files, always; and then I think everything ought to work properly anyway. The JSON library is probably throwing the error because the é character in the Windows encoding is not valid UTF-8. Does that sound reasonable? |
Probably if that é was anywhere other than in a comment SLiM would've thrown an error itself. You can use accented characters or even emojis, in places like strings and even variable names in Eidos; but UTF-8 is assumed, so I would think/hope that SLiM would throw an error if it saw a character that wasn't valid UTF-8; there is code in EidosTokenizer to check for that, I think. But characters inside a comment just get zipped over without checking, so this invalid encoding character got past that line of defense. Ah, Windows. Always two steps behind the rest of the world. |
If you open test.txt in SLiMgui, the bad encoding character is immediately visible; it draws as "�". (Not sure what that will look like on your end, but it isn't an é. :->). Interestingly, it looks like Eidos is tolerant of this character, whatever it might be in UTF-8:
This produces output of 10 and 5, with no problems. Again, I'm not sure whether to regard that as a bug or not. :-> Probably it ought to be; if SLiM is going to assume UTF-8 (and I think it should; I don't want to get into supporting other character encodings, that's crazy-town), then probably it should error if it hits any character that violates UTF-8, even in a comment, but certainly outside a comment. That won't be 100% reliable, though; just a heuristic check. There are lots of character encodings with special characters that would pass a UTF-8 sanity check, but would render incorrectly when interpreted as UTF-8 versus when interpreted in whatever the original encoding is; there's no avoiding that. UTF-8 files are supposed to start with a special byte-order marker or something, though (which is also just an unreliable heuristic check, but is usually accurate); I guess it looks like SLiM isn't checking for that, and probably it should. So that's probably a bug I ought to fix, too, and would've caught this problem even if the � character is in fact valid in UTF-8. [EDIT: The UTF-8 BOM marker is actually not recommended (https://stackoverflow.com/a/2223926/2752221). So SLiM really shouldn't check for it or require it. So never mind, on that front.] Character encodings are one of the huge anachronistic tar pits of programming, along with endianness, and signed vs. unsigned integers. Things computer science ought to have handled differently, in hindsight; and endless sources of bugs... |
So I guess it sounds like there's a character-encoding bug in Python on Windows, too, maybe, as a part of this big ball of ick? You could report that, I suppose, if you want to. But really most software in the world contains character-encoding bugs if you poke at it. They're extremely hard to find, debug, and fix, so they are more or less omnipresent. (But it sounds like this one in Python is maybe particularly bad, if indeed you're sending proper UTF-8 into Python, and it is spitting back out non-UTF-8 text, downgrading from UTF-8 to some Windows encoding? That is a Very Bad thing to do, since it manifests as a bug even in a modern all-UTF-8 world.) |
Also a bug, of course, that the Windows error code is so ludicrously uninformative and apparently undocumented (at least through a Google search). :-> No end of bugs here! |
Ah, right. The problem is that in Windows the default writing mode is ISO-8859. Here's a script that produces the bad slim script, even on other platforms:
I think that if there is anything to do here, it is to catch the |
No, there is definitely at least one bug here for me to fix, maybe more. Thanks for the script, that will be useful. :-> |
This just took me way too long to figure out. A stdpopsim test was failing on windows because, turns out, we were calling
slim
with a script containing this line in the header (in a block comment):... but the script that was actually getting passed through to SLiM, as produced by python, is printed out in the Windows bash shell as
... I'm working on getting the actual character there. The result is:
slim test.slim
then it exits at thetreeSeqOutput( )
line with code 127 and no error message; run via python with subprocess.Popen theproc.returncode
is 3221226505, which seems to mean "something bad happened"includeModel=F
then nothing bad happensNote that running the following script in Windows works fine; the issue is what é is turned into by python, in windows, somehow.
The text was updated successfully, but these errors were encountered: