Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a flag to preserve timestamps of files #534

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Ahuge
Copy link

@Ahuge Ahuge commented Nov 7, 2022

This PR is in response to issue #532

We are implementing a flag that when activated will record and store ctime, mtime, and atime in S3 as metadata on the object.
When pulling files down, if that data exists, we will set it on the files as well.

Linux & Darwin don't appear to allow setting of ctime, however atime and mtime are preserved.
Windows does allow us to set ctime, mtime, and atime, however they are actually referred to as CreationTime, LastAccessTime, and LastWriteTime.

Please let me know if you have feedback on this PR.
Thanks!

Ahuge added 2 commits November 2, 2022 11:50
The `--preserve-timestamp` flag will set metadata on the s3 objects and
then update the files on disk during a download.

If the `--preserve-timestamp` flag is passed during an upload the following metadata keys will be set:
  - x-amz-meta-file-ctime
  - x-amz-meta-file-mtime
  - x-amz-meta-file-atime

If the `--preserve-timesamp` flag is passed during a download, s5cmd
will attempt to modify the created files on disk.
  **There doesn't appear to be a way to change the created date on Linux
@Ahuge Ahuge requested a review from a team as a code owner November 7, 2022 16:25
@Ahuge Ahuge requested review from igungor and seruman and removed request for a team November 7, 2022 16:25
@strophy
Copy link

strophy commented Sep 9, 2023

I'd love to see this merged - @Ahuge could you rebase it and maybe ping a maintainer to try again for a review?

@seruman
Copy link
Member

seruman commented Sep 9, 2023

Hi @Ahuge, thanks for the PR.

With the addition of #621 and the request #350, I think these two could be handled in a similar fashion as s5cmd specific file metadata. I have a PoC which is not ready for a review yet. I would like to get your thoughts on it once it is ready.

I kindly request for you to wait for it to get your thoughts if it would cover yours and @strophy's use cases.

@strophy
Copy link

strophy commented Nov 6, 2023

Hi @seruman just checking on progress and if there is anything I can do to help move this issue along? Having this feature would allow us to switch back to s5cmd from aws-cli, with big performance benefit.

@k0ste
Copy link

k0ste commented May 15, 2024

Seems needs rebase, again

@rmanus
Copy link

rmanus commented Jul 3, 2024

The current code does not work as it passes the --preserve-timestamp flag to the "rm" command during sync and "rm" command does not support it.

@rmanus
Copy link

rmanus commented Jul 9, 2024

I've slightly modified the original code from @Ahuge to use the last_modified date if present in the metadata when comparing objects with same name and same size during "sync" operation.

diff --git a/command/sync.go b/command/sync.go
index 3090b9a..db912cc 100644
--- a/command/sync.go
+++ b/command/sync.go
@@ -209,7 +209,7 @@ func (s Sync) Run(c *cli.Context) error {
                isBatch = obj != nil && obj.Type.IsDir()
        }
 
-       onlySource, onlyDest, commonObjects := compareObjects(sourceObjects, destObjects)
+       onlySource, onlyDest, commonObjects := compareObjects(ctx, s.storageOpts, s.preserveTimestamp, sourceObjects, destObjects)
 
        sourceObjects = nil
        destObjects = nil
@@ -248,7 +248,7 @@ func (s Sync) Run(c *cli.Context) error {
 // sourceObjects and destObjects channels are already sorted in ascending order.
 // Returns objects those in only source, only destination
 // and both.
-func compareObjects(sourceObjects, destObjects chan *storage.Object) (chan *url.URL, chan *url.URL, chan *ObjectPair) {
+func compareObjects(ctx context.Context, storageOps storage.Options, preserveTimestamp bool, sourceObjects chan *storage.Object, destObjects chan *storage.Object) (chan *url.URL, chan *url.URL, chan *ObjectPair) {
        var (
                srcOnly   = make(chan *url.URL, extsortChannelBufferSize)
                dstOnly   = make(chan *url.URL, extsortChannelBufferSize)
@@ -278,6 +278,21 @@ func compareObjects(sourceObjects, destObjects chan *storage.Object) (chan *url.
                                        srcOnly <- src.URL
                                        src, srcOk = <-sourceObjects
                                } else if srcName == dstName { // if there is a match.
+
+                                       // use last_modified date from metadata if available
+                                       if preserveTimestamp && src.URL.IsRemote() {
+                                               client, _ := storage.NewClient(ctx, src.URL, storageOps)
+                                               newsrc, _ := client.Stat(ctx, src.URL)
+                                               src.ModTime = newsrc.ModTime
+                                       }
+
+                                       // use last_modified date from metadata if available
+                                       if preserveTimestamp && dst.URL.IsRemote() {
+                                               client, _ := storage.NewClient(ctx, dst.URL, storageOps)
+                                               newdst, _ := client.Stat(ctx, dst.URL)
+                                               dst.ModTime = newdst.ModTime
+                                       }
+
                                        commonObj <- &ObjectPair{src: src, dst: dst}
                                        src, srcOk = <-sourceObjects
                                        dst, dstOk = <-destObjects

@VladStarr
Copy link

Having a strange bug with sync command. Build docker image from 09f4b59
Tried with and without --preserve-timestamp flag

Will be happy to help if any other debug data needed from my side, I would love to see this feature merged

DEBUG: Request s3/PutObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
PUT /xxx/upload/./2287894.sst HTTP/1.1
Host: xxx.storage.googleapis.com
User-Agent: aws-sdk-go/1.44.256 (go1.20.14; linux; amd64) S3Manager
Content-Length: 2167218
Authorization: AWS4-HMAC-SHA256 Credential=GOOG1EQXXXXXXXXX/20240828/us-east-1/s3/aws4_request, SignedHeaders=content-length;content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=xxxxxxxxxxxxx
Content-Md5: Mi0QnzZI9wBAPD/AefR+Gw==
Content-Type: application/octet-stream
Expect: 100-continue
X-Amz-Content-Sha256: d76c74a4f72655a355624a078d59f130126bae6f6b4a56ff6628843155e0f8eb
X-Amz-Date: 20240828T174007Z
Accept-Encoding: gzip

-----------------------------------------------------
DEBUG: Response s3/PutObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 400 Bad Request
Content-Length: 287
Cache-Control: private, max-age=0
Content-Type: application/xml; charset=UTF-8
Date: Wed, 28 Aug 2024 17:40:07 GMT
Expires: Wed, 28 Aug 2024 17:40:07 GMT
Server: UploadServer
X-Guploader-Uploadid: AHxI1nPCx3Z2g7SO8xIDln2CGdwVKHVu8Z5utRDFLXlzL0foQZFKKuwQaeUdnu_1FdBcQo3b9No


-----------------------------------------------------
ERROR "cp /data/base-mainnet/geth/chaindata/2287894.sst s3://xxx/xxx/upload/./2287894.sst": MalformedSecurityHeader: Invalid argument. status code: 400, request id: , host id:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants