-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3: medusa getting stuck indefinitely while fetching uploaded object #697
Comments
Hi @sandeepmallik, it seems like you're not running a recent version of Medusa. We have had loads of issues with previous versions that used a combination of libcloud and awscli to handle communications with S3. |
Hi. @adejanovski We will update version to 0.16.3 and see if it works. |
@adejanovski It didn't work. We are using python3.9/pip3.9. After upgrading medusa to 0.16.3 on new nodes, I need to install these packages as per medusa requirements. Old nodes is on 0.15.0 pip3.9 install click-aliases==1.0.1 Medusa having issue with fetching IAM role. [2023-12-18 23:00:03,944] INFO: Registered backup id s3_prod_12_18_2023_23 $ sudo medusa -vvv list-backups $ aws s3 ls s3://ec4-backup-prod/index/backup_index/ |
Hi @sandeepmallik, we have a contributed fix for the role issue which you can track here. |
@adejanovski I have tested the fix on my cluster and it didn't work. I couldn't contribute at code level as I am not familier with Python. |
Thanks @sandeepmallik, we're having a conversation around using a slightly different fix for this which would build the S3 client without passing the credentials. |
Thanks @adejanovski. As temporary fix, I will comment these lines in s3_base_storage.py as we only use IAM role. aws_access_key_id=credentials.access_key, |
Sure, let us know how this works. |
@adejanovski It didn't work. After snapshot got created, it got stuck indefinitely. Nothing is logs even after using verbose. |
@adejanovski Tested backup on ONE node with new code (https://github.com/thelastpickle/cassandra-medusa/blob/082375b52e6a8f0e376586984444852cb7371c72/medusa/storage/s3_base_storage.py). I upgraded medusa to 0.17.0 and replace above file. Backup worked. Will test this procedure on remaining nodes. If it works then we are good to merge the change. [2024-01-01 23:06:12,870] INFO: Creating snapshot |
@adejanovski Fix #691 worked. Tested it on 6 nodes. |
Project board link
While uploading backup, medusa getting stuck indefinitely while fetching uploaded object. Not sure what is causing this. We have a cluster of 96 nodes. We added 6 nodes recently and it is failing on 4 nodes with same issue. On other old nodes, backup is working fine. Nothing has changed at infrastructure level.
The object exists in S3 but medusa is unable to fetch/recognise it. Is it something to do with checksum after uploading object? Is there a option to retry Getting object.
[2023-12-18 08:15:17,023] INFO: Uploading /var/lib/cassandra/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/snapshots/medusa-s3_prod_12_17_2023_23/mc-6212-big-TOC.txt (92.000B)
[2023-12-18 08:15:17,024] DEBUG: Uploading 152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-TOC.txt as single part
[2023-12-18 08:15:17,121] INFO: Uploading /var/lib/cassandra/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/snapshots/medusa-s3_prod_12_17_2023_23/mc-6212-big-Statistics.db (11.435KiB)
[2023-12-18 08:15:17,121] DEBUG: Uploading 152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Statistics.db as single part
[2023-12-18 08:15:17,126] DEBUG: https://s3.amazonaws.com:443 "PUT /ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-TOC.txt HTTP/1.1" 200 0
[2023-12-18 08:15:17,129] INFO: Uploading /var/lib/cassandra/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/snapshots/medusa-s3_prod_12_17_2023_23/mc-6212-big-Summary.db (264.101KiB)
[2023-12-18 08:15:17,129] DEBUG: Uploading 152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Summary.db as single part
[2023-12-18 08:15:17,131] INFO: Uploading /var/lib/cassandra/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/snapshots/medusa-s3_prod_12_17_2023_23/mc-6212-big-CompressionInfo.db (50.417KiB)
[2023-12-18 08:15:17,131] DEBUG: Uploading 152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-CompressionInfo.db as single part
[2023-12-18 08:15:17,147] INFO: Uploading /var/lib/cassandra/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/snapshots/medusa-s3_prod_12_17_2023_23/mc-6212-big-Data.db (141.154MiB)
[2023-12-18 08:15:17,147] DEBUG: Uploading 152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Data.db as multi part
[2023-12-18 08:15:17,147] DEBUG: aws s3 cp /var/lib/cassandra/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/snapshots/medusa-s3_prod_12_17_2023_23/mc-6212-big-Data.db s3://ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Data.db
[2023-12-18 08:15:17,163] INFO: Uploading /var/lib/cassandra/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/snapshots/medusa-s3_prod_12_17_2023_23/mc-6212-big-Index.db (30.050MiB)
[2023-12-18 08:15:17,163] DEBUG: Uploading 152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Index.db as single part
[2023-12-18 08:15:17,543] DEBUG: https://s3.amazonaws.com:443 "PUT /ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Summary.db HTTP/1.1" 200 0
[2023-12-18 08:15:17,544] DEBUG: https://s3.amazonaws.com:443 "PUT /ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-CompressionInfo.db HTTP/1.1" 200 0
[2023-12-18 08:15:17,545] DEBUG: https://s3.amazonaws.com:443 "PUT /ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Statistics.db HTTP/1.1" 200 0
[2023-12-18 08:15:17,548] INFO: Uploading /var/lib/cassandra/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/snapshots/medusa-s3_prod_12_17_2023_23/mc-6212-big-Digest.crc32 (9.000B)
[2023-12-18 08:15:17,548] DEBUG: Uploading 152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Digest.crc32 as single part
[2023-12-18 08:15:18,561] DEBUG: https://s3.amazonaws.com:443 "PUT /ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Digest.crc32 HTTP/1.1" 200 0
[2023-12-18 08:15:18,654] DEBUG: https://s3.amazonaws.com:443 "PUT /ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Index.db HTTP/1.1" 200 0
[2023-12-18 08:15:19,810] DEBUG: [Storage] Getting object 152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Data.db
[2023-12-18 08:15:19,835] DEBUG: https://s3.amazonaws.com:443 "HEAD /ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Data.db HTTP/1.1" 200 0
[root@ip-172-21-181-249 ~]# aws s3 ls s3://ec4-backup-prod/152.2.81.127/data/keyspace/table-21c2779222113427b87bcf21d527e6c5/mc-6212-big-Data.db
2023-12-18 08:15:18 148010978 mc-6212-big-Data.db
The text was updated successfully, but these errors were encountered: