-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read bytes more efficiently #2137
Read bytes more efficiently #2137
Conversation
And as a note, the measurements are done with #2127 in place. So every texture is read max 1 time. |
Thank you for investigating how to improve performance. For the |
|
||
if (dataLength == stride) { | ||
byte[] buffer = new byte[end - index]; | ||
stream.read(buffer, 0, buffer.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: read() may read fewer bytes than buffer.length and it's bad practice to ignore its return value. I'd suggest using readFully() instead but JME's LittleEndien.readFully() is broken in a similar way. This is a really dangerous way to read because failures will look really strange and not necessarily point to this read() code.
We should fix LittleEndien [sic] readFully() and use readFully() in all of these cases where read(array)'s return value is being ignored. The length passed to read(array, pos, length) is a maximum length to read. The method is free to return fewer bytes. I'm surprised this hasn't caused problems actually and we must have been pretty lucky. For example, last I checked, a BufferedInputStream when asked to return 100 bytes but it only has 32 bytes left in its current buffer will just read 32 bytes. This leaves the rest of the array unchanged and leaves the stream in an unpredictable position. Actually, who knows how many random issues this may have been causing that we never tracked down because something else magically fixed it. |
Yes, of course. Copy paste brain fart. This was changed. Also added a small assert on the data reading part. |
How does it look now? Everything ok? |
I'm not sure you understood the nature of my complaint. Our use of read(buf, pos, len) is WRONG. Not in a "we should detect this possible case" kind of way... but in a "we are abusing the method and will have random weird consequences someday kind of way". If we wants to use read(buf, pos, len) then we need to put it in a loop that reads until pos == len. See the JDK's implementation of DataInputStream.readFully() for an example of such a loop. Put another way: it is perfectly normal for read(buf, 0, 100000) to return only read 2 bytes and return 2. In the current form, you treat this like an error (it's not). BTW: "assertReadLength" confused me at first because no actual assert calls were made. |
As far as I understand, Not getting enough bytes from the read would be a file structure error for me at least. I was told to read certain amount of bytes and they were not found, file is corrupted. This is how I treat data structures.
I'm trying to consolidate to the existing code file. There is already a similar assert, So is it fine if I just remove the assert stuff? Fixing LittleEndien I consider to be a little out of scope for this. |
@tonihele I think you're missing Paul's point. He isn't asking you to use |
sgold's got it. If you ask read to read 10 bytes. It might only read 2. That's not because there are only 2 bytes. That's just because it decided to return 2 right now. The other 8 are still there waiting for you to finish reading them. Read again and you will get some more bytes. The current code is 100% totally buggy. It's frankly amazing that it works at all. |
…sted amount of bytes
Ok, I think understand now. I was just in the believe that streams have always reached EOF if they don't return the requested amount of bytes and any further call will just get IOException. Is it now correct? |
Code looks better now, I think. read will return -1 when there is no more data. There can be a variety of reasons that read() might return less data then requested. Apparently, it will always read at least one byte (if available). https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#read-byte:A-int-int- ...it's up to the stream how to proceed from there. For example, a BufferedInputStream may return just what's left in its current buffer rather than reading an entire new buffer. That's why I'm surprised this hasn't come up before. We may have just been really lucky. Someone should go back and fix LittleEndien (and perhaps fix the spelling someday) to implement a correct readFully(). |
I think BufferedInputStream returns the requested amount always if available in stream. The buffer is just its internal workings. Only time when I have encountered this is reading straight from a network/Internet. But yeah, this is how the interface is supposed to work then. |
I think the code says otherwise. From BufferedInputStream.java:
It won't call fill() if there is already data available and will just return up to whatever is left. It only calls fill() when it runs out of data. The code is fully available to anyone that wants to check the claims I make about Java code... 90% of the time I already looked up the code before making the claim. |
But this is called in a loop by the interface method |
Yeah, I guess you are right. I have also done this a lot. (A lot a lot.) And have 100% encountered this at various times in the past 18 years or so. I'd have to look back through Java's version history to see if BufferedInputStream's behavior has changed in this regard. But all it takes is one stream in the stack to return 0 from available() and your read will be cut short. That's why there is a readFully() method to begin with. |
I do 100% agree with you that this is how the reading should be done (while--), regardless what the implementation of Maybe they have changed |
Is there anything else regarding this PR? :) |
If there's no objection, I plan to integrate this PR in about 48 hours. |
Optimized version of reading bytes. Instead of reading one byte at the time... read multiple. With GLB, the whole file is in memory and with GLTF the reading is done from
BufferedReader
(meaning that there is never this catastrophic case where we would read a disk file a byte at a time). In latter case this would increase the reading speed even more than analyzed here. I just don't know does GLTF store anything in byte arrays. GLB seems to store the textures this way, which are potentially huge. So even reading from memory, calling thisreadByte
billions of times causes significant overhead.The same problem is with all the populate*Array but fixing those others is potentially more messy and yield in less gains as the units are already bigger. That being said, I only focus on this biggest offender.
I used https://developer.nvidia.com/orca/amazon-lumberyard-bistro (BistroExterior.glb) as a test case. Model loading took on average:
31 605ms (optimized)
47 824ms (old)
Resolves #2125