Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read bytes more efficiently #2137

Merged
merged 7 commits into from
Nov 17, 2023

Conversation

tonihele
Copy link
Contributor

Optimized version of reading bytes. Instead of reading one byte at the time... read multiple. With GLB, the whole file is in memory and with GLTF the reading is done from BufferedReader (meaning that there is never this catastrophic case where we would read a disk file a byte at a time). In latter case this would increase the reading speed even more than analyzed here. I just don't know does GLTF store anything in byte arrays. GLB seems to store the textures this way, which are potentially huge. So even reading from memory, calling this readByte billions of times causes significant overhead.

The same problem is with all the populate*Array but fixing those others is potentially more messy and yield in less gains as the units are already bigger. That being said, I only focus on this biggest offender.

I used https://developer.nvidia.com/orca/amazon-lumberyard-bistro (BistroExterior.glb) as a test case. Model loading took on average:
31 605ms (optimized)
47 824ms (old)

Resolves #2125

@tonihele
Copy link
Contributor Author

And as a note, the measurements are done with #2127 in place. So every texture is read max 1 time.

@stephengold
Copy link
Member

Thank you for investigating how to improve performance.

For the dataLength == stride case, why not stream.read() directly into the array? That would eliminate the arrayCopy().


if (dataLength == stride) {
byte[] buffer = new byte[end - index];
stream.read(buffer, 0, buffer.length);
Copy link
Contributor

@pspeed42 pspeed42 Oct 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: read() may read fewer bytes than buffer.length and it's bad practice to ignore its return value. I'd suggest using readFully() instead but JME's LittleEndien.readFully() is broken in a similar way. This is a really dangerous way to read because failures will look really strange and not necessarily point to this read() code.

@pspeed42
Copy link
Contributor

We should fix LittleEndien [sic] readFully() and use readFully() in all of these cases where read(array)'s return value is being ignored.

The length passed to read(array, pos, length) is a maximum length to read. The method is free to return fewer bytes. I'm surprised this hasn't caused problems actually and we must have been pretty lucky.

For example, last I checked, a BufferedInputStream when asked to return 100 bytes but it only has 32 bytes left in its current buffer will just read 32 bytes. This leaves the rest of the array unchanged and leaves the stream in an unpredictable position. Actually, who knows how many random issues this may have been causing that we never tracked down because something else magically fixed it.

@tonihele
Copy link
Contributor Author

tonihele commented Nov 1, 2023

For the dataLength == stride case, why not stream.read() directly into the array? That would eliminate the arrayCopy().

Yes, of course. Copy paste brain fart. This was changed.

Also added a small assert on the data reading part.

@tonihele
Copy link
Contributor Author

tonihele commented Nov 4, 2023

How does it look now? Everything ok?

@pspeed42
Copy link
Contributor

pspeed42 commented Nov 4, 2023

I'm not sure you understood the nature of my complaint.

Our use of read(buf, pos, len) is WRONG. Not in a "we should detect this possible case" kind of way... but in a "we are abusing the method and will have random weird consequences someday kind of way".

If we wants to use read(buf, pos, len) then we need to put it in a loop that reads until pos == len. See the JDK's implementation of DataInputStream.readFully() for an example of such a loop.

Put another way: it is perfectly normal for read(buf, 0, 100000) to return only read 2 bytes and return 2. In the current form, you treat this like an error (it's not).

BTW: "assertReadLength" confused me at first because no actual assert calls were made.

@tonihele
Copy link
Contributor Author

tonihele commented Nov 4, 2023

I'm not sure you understood the nature of my complaint.

Our use of read(buf, pos, len) is WRONG. Not in a "we should detect this possible case" kind of way... but in a "we are abusing the method and will have random weird consequences someday kind of way".

If we wants to use read(buf, pos, len) then we need to put it in a loop that reads until pos == len. See the JDK's implementation of DataInputStream.readFully() for an example of such a loop.

Put another way: it is perfectly normal for read(buf, 0, 100000) to return only read 2 bytes and return 2. In the current form, you treat this like an error (it's not).

As far as I understand, populateXArray method and the likes, get a stream and they calculate how much they are supposed to read from it. Depending on the variables they are given. That is why I wouldn't dare to use readFully. That might advance the stream too much. This method has no way of telling that the given stream is actually just provided fully for consumption of it.

Not getting enough bytes from the read would be a file structure error for me at least. I was told to read certain amount of bytes and they were not found, file is corrupted. This is how I treat data structures.

BTW: "assertReadLength" confused me at first because no actual assert calls were made.

I'm trying to consolidate to the existing code file. There is already a similar assert, assertNotNull method. I agree that it is not the best. But I think that it is better not to change the style unnecessarily.

So is it fine if I just remove the assert stuff? Fixing LittleEndien I consider to be a little out of scope for this.

@stephengold
Copy link
Member

@tonihele I think you're missing Paul's point. He isn't asking you to use readFully; he's asking to check the return value of read() and iterate until enough bytes are read.

@pspeed42
Copy link
Contributor

pspeed42 commented Nov 4, 2023

sgold's got it.

If you ask read to read 10 bytes. It might only read 2. That's not because there are only 2 bytes. That's just because it decided to return 2 right now. The other 8 are still there waiting for you to finish reading them. Read again and you will get some more bytes.

The current code is 100% totally buggy. It's frankly amazing that it works at all.

@tonihele
Copy link
Contributor Author

tonihele commented Nov 4, 2023

Ok, I think understand now. I was just in the believe that streams have always reached EOF if they don't return the requested amount of bytes and any further call will just get IOException.

Is it now correct?

@pspeed42
Copy link
Contributor

pspeed42 commented Nov 4, 2023

Code looks better now, I think.

read will return -1 when there is no more data.

There can be a variety of reasons that read() might return less data then requested. Apparently, it will always read at least one byte (if available).

https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#read-byte:A-int-int-

...it's up to the stream how to proceed from there. For example, a BufferedInputStream may return just what's left in its current buffer rather than reading an entire new buffer. That's why I'm surprised this hasn't come up before. We may have just been really lucky.

Someone should go back and fix LittleEndien (and perhaps fix the spelling someday) to implement a correct readFully().

@tonihele
Copy link
Contributor Author

tonihele commented Nov 4, 2023

...it's up to the stream how to proceed from there. For example, a BufferedInputStream may return just what's left in its current buffer rather than reading an entire new buffer. That's why I'm surprised this hasn't come up before. We may have just been really lucky.

I think BufferedInputStream returns the requested amount always if available in stream. The buffer is just its internal workings. Only time when I have encountered this is reading straight from a network/Internet. But yeah, this is how the interface is supposed to work then.

@pspeed42
Copy link
Contributor

pspeed42 commented Nov 4, 2023

I think the code says otherwise. From BufferedInputStream.java:

    /**
     * Read characters into a portion of an array, reading from the underlying
     * stream at most once if necessary.
     */
    private int read1(byte[] b, int off, int len) throws IOException {
        int avail = count - pos;
        if (avail <= 0) {
            /* If the requested length is at least as large as the buffer, and
               if there is no mark/reset activity, do not bother to copy the
               bytes into the local buffer.  In this way buffered streams will
               cascade harmlessly. */
            if (len >= getBufIfOpen().length && markpos < 0) {
                return getInIfOpen().read(b, off, len);
            }
            fill();
            avail = count - pos;
            if (avail <= 0) return -1;
        }
        int cnt = (avail < len) ? avail : len;
        System.arraycopy(getBufIfOpen(), pos, b, off, cnt);
        pos += cnt;
        return cnt;
    }

It won't call fill() if there is already data available and will just return up to whatever is left. It only calls fill() when it runs out of data.

The code is fully available to anyone that wants to check the claims I make about Java code... 90% of the time I already looked up the code before making the claim.

@tonihele
Copy link
Contributor Author

tonihele commented Nov 4, 2023

I think the code says otherwise. From BufferedInputStream.java:

    /**
     * Read characters into a portion of an array, reading from the underlying
     * stream at most once if necessary.
     */
    private int read1(byte[] b, int off, int len) throws IOException {
        int avail = count - pos;
        if (avail <= 0) {
            /* If the requested length is at least as large as the buffer, and
               if there is no mark/reset activity, do not bother to copy the
               bytes into the local buffer.  In this way buffered streams will
               cascade harmlessly. */
            if (len >= getBufIfOpen().length && markpos < 0) {
                return getInIfOpen().read(b, off, len);
            }
            fill();
            avail = count - pos;
            if (avail <= 0) return -1;
        }
        int cnt = (avail < len) ? avail : len;
        System.arraycopy(getBufIfOpen(), pos, b, off, cnt);
        pos += cnt;
        return cnt;
    }

It won't call fill() if there is already data available and will just return up to whatever is left. It only calls fill() when it runs out of data.

The code is fully available to anyone that wants to check the claims I make about Java code... 90% of the time I already looked up the code before making the claim.

But this is called in a loop by the interface method BufferedInputStream.java:public synchronized int read(byte b[], int off, int len). So it should work right? I have never encountered this behavior... And I do this a lot.

@pspeed42
Copy link
Contributor

pspeed42 commented Nov 4, 2023

Yeah, I guess you are right.

I have also done this a lot. (A lot a lot.) And have 100% encountered this at various times in the past 18 years or so. I'd have to look back through Java's version history to see if BufferedInputStream's behavior has changed in this regard.

But all it takes is one stream in the stack to return 0 from available() and your read will be cut short.

That's why there is a readFully() method to begin with.

@tonihele
Copy link
Contributor Author

tonihele commented Nov 4, 2023

Yeah, I guess you are right.

I have also done this a lot. (A lot a lot.) And have 100% encountered this at various times in the past 18 years or so. I'd have to look back through Java's version history to see if BufferedInputStream's behavior has changed in this regard.

But all it takes is one stream in the stack to return 0 from available() and your read will be cut short.

That's why there is a readFully() method to begin with.

I do 100% agree with you that this is how the reading should be done (while--), regardless what the implementation of BufferedInputStream is. The interface Javadoc is quite clear on this. But now we have double checked that jMEs existing code is working(ish) and no need to panic in this regard.

Maybe they have changed BuffereInputStream because developers have made this mistake so many times :) It is so verbose this looping... Of which Java gets some criticism. Not from me, I'm used to it. Old guy.

@tonihele
Copy link
Contributor Author

tonihele commented Nov 9, 2023

Is there anything else regarding this PR? :)

@stephengold stephengold added this to the Future Release milestone Nov 9, 2023
@stephengold
Copy link
Member

If there's no objection, I plan to integrate this PR in about 48 hours.

@stephengold stephengold merged commit 20ecf68 into jMonkeyEngine:master Nov 17, 2023
14 checks passed
@tonihele tonihele deleted the feature/issue-2125 branch November 17, 2023 17:17
@stephengold stephengold modified the milestones: Future Release, v3.7.0 Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GltfLoader is slow populating buffers
3 participants