Read bytes more efficiently #2137

tonihele · 2023-10-29T10:19:32Z

Optimized version of reading bytes. Instead of reading one byte at the time... read multiple. With GLB, the whole file is in memory and with GLTF the reading is done from BufferedReader (meaning that there is never this catastrophic case where we would read a disk file a byte at a time). In latter case this would increase the reading speed even more than analyzed here. I just don't know does GLTF store anything in byte arrays. GLB seems to store the textures this way, which are potentially huge. So even reading from memory, calling this readByte billions of times causes significant overhead.

The same problem is with all the populate*Array but fixing those others is potentially more messy and yield in less gains as the units are already bigger. That being said, I only focus on this biggest offender.

I used https://developer.nvidia.com/orca/amazon-lumberyard-bistro (BistroExterior.glb) as a test case. Model loading took on average:
31 605ms (optimized)
47 824ms (old)

Resolves #2125

tonihele · 2023-10-29T10:23:45Z

And as a note, the measurements are done with #2127 in place. So every texture is read max 1 time.

stephengold · 2023-10-29T19:03:54Z

Thank you for investigating how to improve performance.

For the dataLength == stride case, why not stream.read() directly into the array? That would eliminate the arrayCopy().

pspeed42 · 2023-10-29T19:37:08Z

jme3-plugins/src/gltf/java/com/jme3/scene/plugins/gltf/GltfUtils.java

+
+        if (dataLength == stride) {
+            byte[] buffer = new byte[end - index];
+            stream.read(buffer, 0, buffer.length);


Note: read() may read fewer bytes than buffer.length and it's bad practice to ignore its return value. I'd suggest using readFully() instead but JME's LittleEndien.readFully() is broken in a similar way. This is a really dangerous way to read because failures will look really strange and not necessarily point to this read() code.

pspeed42 · 2023-10-29T19:43:29Z

We should fix LittleEndien [sic] readFully() and use readFully() in all of these cases where read(array)'s return value is being ignored.

The length passed to read(array, pos, length) is a maximum length to read. The method is free to return fewer bytes. I'm surprised this hasn't caused problems actually and we must have been pretty lucky.

For example, last I checked, a BufferedInputStream when asked to return 100 bytes but it only has 32 bytes left in its current buffer will just read 32 bytes. This leaves the rest of the array unchanged and leaves the stream in an unpredictable position. Actually, who knows how many random issues this may have been causing that we never tracked down because something else magically fixed it.

tonihele · 2023-11-01T18:43:21Z

For the dataLength == stride case, why not stream.read() directly into the array? That would eliminate the arrayCopy().

Yes, of course. Copy paste brain fart. This was changed.

Also added a small assert on the data reading part.

tonihele · 2023-11-04T09:43:37Z

How does it look now? Everything ok?

pspeed42 · 2023-11-04T15:30:59Z

I'm not sure you understood the nature of my complaint.

Our use of read(buf, pos, len) is WRONG. Not in a "we should detect this possible case" kind of way... but in a "we are abusing the method and will have random weird consequences someday kind of way".

If we wants to use read(buf, pos, len) then we need to put it in a loop that reads until pos == len. See the JDK's implementation of DataInputStream.readFully() for an example of such a loop.

Put another way: it is perfectly normal for read(buf, 0, 100000) to return only read 2 bytes and return 2. In the current form, you treat this like an error (it's not).

BTW: "assertReadLength" confused me at first because no actual assert calls were made.

tonihele · 2023-11-04T18:08:15Z

I'm not sure you understood the nature of my complaint.

Our use of read(buf, pos, len) is WRONG. Not in a "we should detect this possible case" kind of way... but in a "we are abusing the method and will have random weird consequences someday kind of way".

If we wants to use read(buf, pos, len) then we need to put it in a loop that reads until pos == len. See the JDK's implementation of DataInputStream.readFully() for an example of such a loop.

Put another way: it is perfectly normal for read(buf, 0, 100000) to return only read 2 bytes and return 2. In the current form, you treat this like an error (it's not).

As far as I understand, populateXArray method and the likes, get a stream and they calculate how much they are supposed to read from it. Depending on the variables they are given. That is why I wouldn't dare to use readFully. That might advance the stream too much. This method has no way of telling that the given stream is actually just provided fully for consumption of it.

Not getting enough bytes from the read would be a file structure error for me at least. I was told to read certain amount of bytes and they were not found, file is corrupted. This is how I treat data structures.

BTW: "assertReadLength" confused me at first because no actual assert calls were made.

I'm trying to consolidate to the existing code file. There is already a similar assert, assertNotNull method. I agree that it is not the best. But I think that it is better not to change the style unnecessarily.

So is it fine if I just remove the assert stuff? Fixing LittleEndien I consider to be a little out of scope for this.

stephengold · 2023-11-04T18:16:04Z

@tonihele I think you're missing Paul's point. He isn't asking you to use readFully; he's asking to check the return value of read() and iterate until enough bytes are read.

pspeed42 · 2023-11-04T18:28:46Z

sgold's got it.

If you ask read to read 10 bytes. It might only read 2. That's not because there are only 2 bytes. That's just because it decided to return 2 right now. The other 8 are still there waiting for you to finish reading them. Read again and you will get some more bytes.

The current code is 100% totally buggy. It's frankly amazing that it works at all.

…sted amount of bytes

tonihele · 2023-11-04T18:38:18Z

Ok, I think understand now. I was just in the believe that streams have always reached EOF if they don't return the requested amount of bytes and any further call will just get IOException.

Is it now correct?

…sted amount of bytes

pspeed42 · 2023-11-04T19:28:42Z

Code looks better now, I think.

read will return -1 when there is no more data.

There can be a variety of reasons that read() might return less data then requested. Apparently, it will always read at least one byte (if available).

https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#read-byte:A-int-int-

...it's up to the stream how to proceed from there. For example, a BufferedInputStream may return just what's left in its current buffer rather than reading an entire new buffer. That's why I'm surprised this hasn't come up before. We may have just been really lucky.

Someone should go back and fix LittleEndien (and perhaps fix the spelling someday) to implement a correct readFully().

tonihele · 2023-11-04T19:37:11Z

...it's up to the stream how to proceed from there. For example, a BufferedInputStream may return just what's left in its current buffer rather than reading an entire new buffer. That's why I'm surprised this hasn't come up before. We may have just been really lucky.

I think BufferedInputStream returns the requested amount always if available in stream. The buffer is just its internal workings. Only time when I have encountered this is reading straight from a network/Internet. But yeah, this is how the interface is supposed to work then.

pspeed42 · 2023-11-04T19:54:52Z

I think the code says otherwise. From BufferedInputStream.java:

    /**
     * Read characters into a portion of an array, reading from the underlying
     * stream at most once if necessary.
     */
    private int read1(byte[] b, int off, int len) throws IOException {
        int avail = count - pos;
        if (avail <= 0) {
            /* If the requested length is at least as large as the buffer, and
               if there is no mark/reset activity, do not bother to copy the
               bytes into the local buffer.  In this way buffered streams will
               cascade harmlessly. */
            if (len >= getBufIfOpen().length && markpos < 0) {
                return getInIfOpen().read(b, off, len);
            }
            fill();
            avail = count - pos;
            if (avail <= 0) return -1;
        }
        int cnt = (avail < len) ? avail : len;
        System.arraycopy(getBufIfOpen(), pos, b, off, cnt);
        pos += cnt;
        return cnt;
    }

It won't call fill() if there is already data available and will just return up to whatever is left. It only calls fill() when it runs out of data.

The code is fully available to anyone that wants to check the claims I make about Java code... 90% of the time I already looked up the code before making the claim.

tonihele · 2023-11-04T19:58:23Z

I think the code says otherwise. From BufferedInputStream.java:

    /**
     * Read characters into a portion of an array, reading from the underlying
     * stream at most once if necessary.
     */
    private int read1(byte[] b, int off, int len) throws IOException {
        int avail = count - pos;
        if (avail <= 0) {
            /* If the requested length is at least as large as the buffer, and
               if there is no mark/reset activity, do not bother to copy the
               bytes into the local buffer.  In this way buffered streams will
               cascade harmlessly. */
            if (len >= getBufIfOpen().length && markpos < 0) {
                return getInIfOpen().read(b, off, len);
            }
            fill();
            avail = count - pos;
            if (avail <= 0) return -1;
        }
        int cnt = (avail < len) ? avail : len;
        System.arraycopy(getBufIfOpen(), pos, b, off, cnt);
        pos += cnt;
        return cnt;
    }

It won't call fill() if there is already data available and will just return up to whatever is left. It only calls fill() when it runs out of data.

The code is fully available to anyone that wants to check the claims I make about Java code... 90% of the time I already looked up the code before making the claim.

But this is called in a loop by the interface method BufferedInputStream.java:public synchronized int read(byte b[], int off, int len). So it should work right? I have never encountered this behavior... And I do this a lot.

pspeed42 · 2023-11-04T20:06:12Z

Yeah, I guess you are right.

I have also done this a lot. (A lot a lot.) And have 100% encountered this at various times in the past 18 years or so. I'd have to look back through Java's version history to see if BufferedInputStream's behavior has changed in this regard.

But all it takes is one stream in the stack to return 0 from available() and your read will be cut short.

That's why there is a readFully() method to begin with.

tonihele · 2023-11-04T20:17:11Z

Yeah, I guess you are right.

I have also done this a lot. (A lot a lot.) And have 100% encountered this at various times in the past 18 years or so. I'd have to look back through Java's version history to see if BufferedInputStream's behavior has changed in this regard.

But all it takes is one stream in the stack to return 0 from available() and your read will be cut short.

That's why there is a readFully() method to begin with.

I do 100% agree with you that this is how the reading should be done (while--), regardless what the implementation of BufferedInputStream is. The interface Javadoc is quite clear on this. But now we have double checked that jMEs existing code is working(ish) and no need to panic in this regard.

Maybe they have changed BuffereInputStream because developers have made this mistake so many times :) It is so verbose this looping... Of which Java gets some criticism. Not from me, I'm used to it. Old guy.

tonihele · 2023-11-09T19:25:24Z

Is there anything else regarding this PR? :)

stephengold · 2023-11-15T05:49:35Z

If there's no objection, I plan to integrate this PR in about 48 hours.

Read bytes more efficiently

4fe0b4b

stephengold added the enhancement label Oct 29, 2023

pspeed42 reviewed Oct 29, 2023

View reviewed changes

tonihele added 3 commits November 1, 2023 20:25

Read straight to the buffer

9fe71f9

Epsilon as final

e4b9049

Simple assert for reading malformed stream

3c4af6f

Stream might not have reached the end even if it doesn't return reque…

28c2421

…sted amount of bytes

tonihele added 2 commits November 4, 2023 20:44

Stream might not have reached the end even if it doesn't return reque…

eafb19a

…sted amount of bytes

Refactor to method

f8f884f

stephengold added this to the Future Release milestone Nov 9, 2023

stephengold merged commit 20ecf68 into jMonkeyEngine:master Nov 17, 2023
14 checks passed

tonihele deleted the feature/issue-2125 branch November 17, 2023 17:17

stephengold modified the milestones: Future Release, v3.7.0 Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read bytes more efficiently #2137

Read bytes more efficiently #2137

tonihele commented Oct 29, 2023

tonihele commented Oct 29, 2023

stephengold commented Oct 29, 2023

pspeed42 Oct 29, 2023 •

edited

Loading

pspeed42 commented Oct 29, 2023

tonihele commented Nov 1, 2023

tonihele commented Nov 4, 2023

pspeed42 commented Nov 4, 2023

tonihele commented Nov 4, 2023

stephengold commented Nov 4, 2023

pspeed42 commented Nov 4, 2023 •

edited

Loading

tonihele commented Nov 4, 2023

pspeed42 commented Nov 4, 2023

tonihele commented Nov 4, 2023

pspeed42 commented Nov 4, 2023

tonihele commented Nov 4, 2023 •

edited

Loading

pspeed42 commented Nov 4, 2023

tonihele commented Nov 4, 2023

tonihele commented Nov 9, 2023

stephengold commented Nov 15, 2023

Read bytes more efficiently #2137

Read bytes more efficiently #2137

Conversation

tonihele commented Oct 29, 2023

tonihele commented Oct 29, 2023

stephengold commented Oct 29, 2023

pspeed42 Oct 29, 2023 • edited Loading

Choose a reason for hiding this comment

pspeed42 commented Oct 29, 2023

tonihele commented Nov 1, 2023

tonihele commented Nov 4, 2023

pspeed42 commented Nov 4, 2023

tonihele commented Nov 4, 2023

stephengold commented Nov 4, 2023

pspeed42 commented Nov 4, 2023 • edited Loading

tonihele commented Nov 4, 2023

pspeed42 commented Nov 4, 2023

tonihele commented Nov 4, 2023

pspeed42 commented Nov 4, 2023

tonihele commented Nov 4, 2023 • edited Loading

pspeed42 commented Nov 4, 2023

tonihele commented Nov 4, 2023

tonihele commented Nov 9, 2023

stephengold commented Nov 15, 2023

pspeed42 Oct 29, 2023 •

edited

Loading

pspeed42 commented Nov 4, 2023 •

edited

Loading

tonihele commented Nov 4, 2023 •

edited

Loading