-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient way to integrate lossyless into a PyTorch Dataset subclass #40
Comments
Hey Lennart, The Line 187 in 6b604dd
Line 238 in 6b604dd
The only changes I see that should be made are: 1/ using batch size of 1 when compressing (that will make it slower but is a simple way to ensure that you can perform permutations). I.e., here: Line 155 in 6b604dd
2/ saving (and loading) each compressed image separately rather than all at once. I.e. the following lines should go in the for loop: Line 191 in 6b604dd
Neither of those points are very complex if you want to give it a try. I'm quite busy right now but if you haven't done it by then I might be able to do it in the next 2-3 weeks :) |
Hi Yann, Thank you for your reply and pointing out the relevant LOCs. I'll see how I can best integrate I already tried something similar with a few models from |
Yes unfortunately removing batch compression will make the compressor very slow. But actually it’s not really needed, you can compress in batch and still save images separately. Concerning the decompressor right now compressai doesn’t allow batch decompression so that’s why decompressing the entire dataset at once is so slow, I.e, it doesn’t actually take advantage of batches. I don’t think that this is an issue with lossyless though but simply the compressai implementation. In theory lossyless could even be quicker at decompression than standard codes as it doesn’t require reconstructing the image. The simplest way to make the decoder quicker for now is to at least parallelise decompression of each image. |
I see. Maybe converting a lossyless model into a TensorRT engine could also help improve decompression speed. I haven't worked with TensorRT before though so I'm not sure if there is a straightforward way to integrate a TensorRT engine into another training pipeline. I'll be quite busy throughout the next week but afterwards I'll see if there is anything I can do to achieve decent/better decompression speed without having to mess with the entire underlying software stack. |
Hi Yann, In the last days, I had a chance to work on this topic again and experimented with a few ways to achieve good encoding/decoding performance. I couldn't test my prototypes on server-grade hardware yet but I'd like to share some of the insights I got from running tests on my laptop today. I ran my tests on a small ImageNet subsample with 900 images and 2 MB disk size. Encoding the dataset with lossyless (beta 0.1) takes about 40 seconds with my MX330 laptop GPU. Using WebP and Python multiprocessing, my CPU processes the same dataset in less than a second. Do these numbers sound reasonable to you? I can post a snippet of my code if you think that lossyless encoding should not be that far off WebP. Decoding the dataset using a PyTorch data loader with multiple workers takes about 25s on my laptop. WebP decoding takes less than 0.1s on the other hand. One problem with the PyTorch data loader (afaik) is that the separate worker processes do not use multi-threading due to the Python GIL. That makes executing the model rather slow. |
Hey @YannDubs,
I recently discovered your paper and find the idea very interesting. Therefore, I would like to integrate
lossyless
into a project I am currently working on. However, there are two requirements/presuppositions in my project that your compressor on PyTorch Hub does not cover as far as I understand it:Basically, I would like to integrate
lossyless
into a subclass of PyTorch'sDataset
that implements the__getitem__(index)
interface. Before I start experimenting on my own and potentially overlook something that you already thought about, I wanted to ask you if you already considered approaches how to integrate your idea into a PyTorchDataset
.Looking forward to a discussion!
The text was updated successfully, but these errors were encountered: