Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable chunked uploads #150

Merged
merged 5 commits into from
Sep 4, 2024
Merged

Conversation

isinyaaa
Copy link
Contributor

TL;DR: Rebases #137. Also fixes out-of-date _state_ parameter on session_url, which caused a 404 when resuming/completing uploads.

We wanted to use oras for larger uploads (say, ML model files) at containers/omlmd, but I wasn't able to make them work with standard uploads. I noticed #137, rebased it, addressed some of your comments (not sure how to address all of them, though). But I still couldn't make it work with https://hub.docker.com/_/registry. So I started debbuging to find out there's a _state_ parameter being passed around, and I assume it must be updated on the https://distribution.github.io/distribution/spec/api/#completed-upload PUT request to reflect the last state reported by the server. I tested this with 20GB-ish files.

I wonder if this could help with bringing back chunked file support by default, although I'm not really sure in which aspect that kind of support is "flaky" or (as I experienced) poorly documented.

Copy link
Contributor

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good for testing - could you please add / re-enable a test for chunked? You'll also need to figure out how to sign the DCO, which is a requirement for CNCF projects.

oras/provider.py Show resolved Hide resolved
@isinyaaa isinyaaa force-pushed the tuneable-chunk-sizing branch 2 times, most recently from adb4c55 to 123e39d Compare August 20, 2024 15:17
Signed-off-by: Brian Cook <[email protected]>
Signed-off-by: Isabella do Amaral <[email protected]>
@isinyaaa isinyaaa force-pushed the tuneable-chunk-sizing branch from 123e39d to 9ab5537 Compare August 20, 2024 16:22
@isinyaaa isinyaaa requested a review from vsoch August 20, 2024 16:24
@vsoch
Copy link
Contributor

vsoch commented Aug 20, 2024

@isinyaaa please see my previous review comment - we need explicit tests for the chunked upload.

Signed-off-by: Isabella do Amaral <[email protected]>
@isinyaaa isinyaaa force-pushed the tuneable-chunk-sizing branch from 9ab5537 to 804ccb1 Compare August 21, 2024 13:40
@isinyaaa
Copy link
Contributor Author

@isinyaaa please see my previous review comment - we need explicit tests for the chunked upload.

@vsoch sorry I missed that. Updated now, wdyt?

@vsoch
Copy link
Contributor

vsoch commented Aug 21, 2024

Nice! Let's run these tests now.

@isinyaaa isinyaaa force-pushed the tuneable-chunk-sizing branch from 804ccb1 to bb59445 Compare August 22, 2024 12:11
@isinyaaa
Copy link
Contributor Author

Oops, I apologize for the dumb mistake, updated now @vsoch .

@isinyaaa isinyaaa force-pushed the tuneable-chunk-sizing branch from bb59445 to 038a509 Compare August 26, 2024 14:08
@isinyaaa
Copy link
Contributor Author

isinyaaa commented Aug 26, 2024

@vsoch can you reapprove tests? I ran the linter locally to make sure it's working, sorry for the trouble

@isinyaaa
Copy link
Contributor Author

@vsoch not sure what went wrong with those tests... maybe some issue with the generated file size? from the raw logs I can't spot any problems, neither locally.

@vsoch
Copy link
Contributor

vsoch commented Aug 28, 2024

Here is what I see:


/bin/bash scripts/test.sh
ORAS_PORT: 5000
ORAS_HOST: localhost
ORAS_REGISTRY: localhost:5000
ORAS_AUTH: 
============================= test session starts ==============================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
rootdir: /home/runner/work/oras-py/oras-py
configfile: pyproject.toml
collected 22 items
oras/tests/test_oci.py .
oras/tests/test_oras.py .sSuccessfully pushed localhost:5000/dinosaur/artifact:v1
Successfully pushed localhost:5000/dinosaur/artifact:v1
.Successfully pushed localhost:5000/dinosaur/artifact:v1
.Successfully pushed localhost:5000/dinosaur/artifact:v1
..Successfully pushed localhost:5000/dinosaur/directory:v1
.s
oras/tests/test_provider.py Successfully pushed localhost:5000/dinosaur/artifact:v1
Successfully pushed localhost:5000/dinosaur/artifact:v1
0+0 records in
0+0 records out
0 bytes copied, 3.8763e-05 s, 0.0 kB/s

I would try to reproduce locally.

@isinyaaa isinyaaa force-pushed the tuneable-chunk-sizing branch from 038a509 to ce6e216 Compare August 30, 2024 12:53
@isinyaaa isinyaaa force-pushed the tuneable-chunk-sizing branch from ce6e216 to 0956242 Compare August 30, 2024 13:01
@isinyaaa
Copy link
Contributor Author

isinyaaa commented Aug 30, 2024

@vsoch as expected, while testing on my fork I found the problem lies when working with those very large files on GHA (workflow run on my modified main). It worked when I reduced the test file size to be a couple times the default chunk size, wdyt?

@vsoch
Copy link
Contributor

vsoch commented Aug 30, 2024

Should the chunk size perhaps be smaller then?

@isinyaaa
Copy link
Contributor Author

isinyaaa commented Aug 30, 2024

I don't really think that's a problem, up to you. The issue was in creating a 15GB test file in github actions. I think the worker didn't have this much space to spare or something. I reduced the test file to be 4x the default chunk size (4x 16MB), that's all.

@vsoch
Copy link
Contributor

vsoch commented Aug 30, 2024

Ah ok. Please keep the tests in GitHub the same as what you are doing (and what is working) locally and let's try making more space on the builder - these first three lines before the tests to cleanup and add space should be sufficient:

https://github.com/converged-computing/fluxnetes/blob/0d577aa3155e68aff457f390f8c926e9f57a13d3/.github/workflows/e2e-test.yaml#L129-L133

But you can add more as needed.

Signed-off-by: Isabella do Amaral <[email protected]>
@isinyaaa
Copy link
Contributor Author

isinyaaa commented Sep 2, 2024

@vsoch updated! thanks for the pointer, I had no idea the default images could be so huge! Though the first three lines didn't suffice as we actually need 15GB for the file + 15GB for the upload.

@isinyaaa
Copy link
Contributor Author

isinyaaa commented Sep 4, 2024

Hey, @vsoch, can we merge this yet?

@vsoch
Copy link
Contributor

vsoch commented Sep 4, 2024

Yes - we are close! Can you please bump the version in oras/version.py and make a note about the change in CHANGELOG.md? That should be the final bit we need.

Signed-off-by: Isabella do Amaral <[email protected]>
@isinyaaa
Copy link
Contributor Author

isinyaaa commented Sep 4, 2024

updated, wdyt?

@vsoch vsoch merged commit dfc2415 into oras-project:main Sep 4, 2024
5 checks passed
@isinyaaa isinyaaa deleted the tuneable-chunk-sizing branch September 4, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants