Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use additional checksum algorithms when using put_object() #611

Open
tkwilos opened this issue Apr 27, 2023 · 16 comments
Open

How to use additional checksum algorithms when using put_object() #611

tkwilos opened this issue Apr 27, 2023 · 16 comments
Labels
enhancement 💡 New feature or request

Comments

@tkwilos
Copy link

tkwilos commented Apr 27, 2023

Using paws, I am looking to store additional checksum data when using s3$put_object(), specifically sha1 values. However, I'm having issues successfully setting this when executing put_object().

Is this something I can currently do with paws? Below is some basic sample code and the associated error message.

Sample code:

s3$put_object(
  Bucket = "test-bucket-name",
  Key = "test-file-key",
  Body = "test-file",
  ChecksumAlgorithm = 'sha1'
)

Associated error:

Running the example above produces the following error message:

Error: InvalidRequest (HTTP 400). x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* or x-amz-trailer headers were found.

Looking through the paws documentation, I am not sure how to set the "headers" referenced in the error message above.

What I'm ultimately hoping for is, when using s3$put_object(), I can store the sha1 value for the object I am adding to s3, then I want to have the ability to retrieve that data using something like s3$list_objects()

Thanks in advance.

@tkwilos tkwilos changed the title How to use additional checksum algorithms when using put How to use additional checksum algorithms when using put_object() Apr 27, 2023
@DyfanJones
Copy link
Member

Can you share the logs please, options(paws.log_level=3). It will help debug the issue :)

@tkwilos
Copy link
Author

tkwilos commented Apr 27, 2023

Absolutely, thanks @DyfanJones!

Below are the logs you requested, along with some additional background information, in case it's helpful:

Background Info:

What I am looking to do/take advantage of is covered here. However, rather than utilizing the additional checksums feature via say the s3 console, I'm hoping to do so programmatically via paws.

Requested Logs retrieved via options(paws.log_level=3)

INFO [2023-04-27 18:04:11.417]: -> PUT /test_data_1.csv HTTP/1.1
-> Host: exp-derived-data-dev.s3.us-east-2.amazonaws.com
-> Accept-Encoding: deflate, gzip, br
-> Accept: application/json, text/xml, application/xml, */*
-> User-Agent: paws/0.5.5 (R4.2.2; linux-gnu; x86_64)
-> x-amz-acl: 2
-> x-amz-sdk-checksum-algorithm: sha1
-> Content-Md5: BGGVlI07DfjTJ+gVmdqhMA==
-> Content-Length: 78
-> X-Amz-Security-Token: <REDACTED>
-> X-Amz-Date: 20230427T180411Z
-> X-Amz-Content-Sha256: 8f767239b0bfe5e2b03c04638784e96eef2cca51827685b087f430ee3dc1353d
-> Authorization: AWS4-HMAC-SHA256 Credential=ASIAVVZOODW4YOCBGRNS/20230427/us-east-2/s3/aws4_request, SignedHeaders=content-length;content-md5;host;x-amz-acl;x-amz-content-sha256;x-amz-date;x-amz-sdk-checksum-algorithm;x-amz-security-token, Signature=0a6fc8d80861e67b66c094831ce21b40e003cdd0792cdb84991a589f4f0ff688
-> 
INFO [2023-04-27 18:04:11.417]: >> variable_name,value_1,value_2,value_3
>> apple,1,2,3
>> banana,4,2,4
>> orange,2,2,2

INFO [2023-04-27 18:04:11.433]: <- HTTP/1.1 400 Bad Request
INFO [2023-04-27 18:04:11.437]: <- x-amz-request-id: 6ZT7AVPQ2B48RKT7
INFO [2023-04-27 18:04:11.437]: <- x-amz-id-2: bxz2Z94/DGCj0ED6gqro63ilNL/g+tpAcE8kG2mFth50zbeW9dBcT3H+lglV194EYfA2huec1e4=
INFO [2023-04-27 18:04:11.437]: <- Content-Type: application/xml
INFO [2023-04-27 18:04:11.437]: <- Transfer-Encoding: chunked
INFO [2023-04-27 18:04:11.437]: <- Date: Thu, 27 Apr 2023 18:04:11 GMT
INFO [2023-04-27 18:04:11.438]: <- Server: AmazonS3
INFO [2023-04-27 18:04:11.438]: <- Connection: close
INFO [2023-04-27 18:04:11.438]: <- 
Error: InvalidRequest (HTTP 400). x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* or x-amz-trailer headers were found.

@DyfanJones
Copy link
Member

Thanks, will have a look at the backend to see why the headers aren't being attached 🤔

@DyfanJones
Copy link
Member

From checking over I believe don't currently support this functionality. Will need to investigate how the other sdk implement this so that we can bring it over to paws.

@DyfanJones DyfanJones added the enhancement 💡 New feature or request label Apr 28, 2023
@DyfanJones
Copy link
Member

botocore: https://github.com/boto/botocore/blob/develop/botocore/httpchecksum.py
aws sdk go v2: https://github.com/aws/aws-sdk-go-v2/blob/main/service/dynamodb/internal/customizations/checksum.go

Note: it looks like aws sdk go v1 only has md5 for it's checksum algorithms

@DyfanJones
Copy link
Member

Not a 100% sure how to implement crc32c algorithm. It looks like digest doesn't support it as of yet. Will raise a ticket to see if they are happy to implement it.

@DyfanJones
Copy link
Member

Raise a ticket with the package digest: eddelbuettel/digest#183

@DyfanJones
Copy link
Member

For the time being will focus on the other checksum algorithms. After they have been completed we can loop back to crc32c.

@tkwilos
Copy link
Author

tkwilos commented May 2, 2023

Thanks for all the investigating/work thusfar @DyfanJones! Please let me know if there's anything else I can provide.

@DyfanJones
Copy link
Member

No worries, I am on holiday for the next 2 weeks. I will start work on this when I get back. In the meantime please feel free to raise any PRs, more than happy to review them.

@tkwilos
Copy link
Author

tkwilos commented May 31, 2023

hi @DyfanJones! hope you had a good holiday...wanted to check in to see if there were any updates here.

@DyfanJones
Copy link
Member

Hi @tkwilos we have some fantastic news, @eddelbuettel has implemented the crc32c algorithm https://github.com/eddelbuettel/crc32c. This means we can proceed in implemented the new checksum algorithms possibly by investigating botocore implementation https://github.com/boto/botocore/blob/develop/botocore/httpchecksum.py.

This feature will take a little time as I am fairly busy with a new born. I will keep you updated on the progress of this feature.

@DyfanJones
Copy link
Member

Please feel free to raise PR if you are able to get to this before me :)

@eddelbuettel
Copy link

eddelbuettel commented May 31, 2023

Yep meant to circle back too. It's all there but not yet fully wired up in the digest version on CRAN. However, crc32c is there and can be used and relied upon. We should circle back 'time permitting' to make better use of it in digest too.

@DyfanJones
Copy link
Member

AWS SDK GO V2 checksum implementation

@tkwilos
Copy link
Author

tkwilos commented Jun 5, 2023

Thanks so much for the updates and efforts @DyfanJones & @eddelbuettel!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 💡 New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants