Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 uploads #543 #659

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

Conversation

metame
Copy link
Contributor

@metame metame commented Sep 5, 2023

PR Checklist:

Open issues:

  1. The data browser preview is currently not working for s3 files. This is likely because the download endpoint does a 302 temporary redirect to a S3 signed URL. I'd be happy to open a new issue for this for someone to fix that is familiar with the client side code.
  2. Currently the config only requires S3 bucket to be specified. This is in line with opendal's requirements. The main callout here is that we are depending on opendal's default for region, which is "us-east-1". I could specify this directly in atomic's source code or we can make this a required config value.

@joepio
Copy link
Member

joepio commented Sep 5, 2023

Awesome, great job! I'll take a closer look tomorrow :)

Copy link
Member

@joepio joepio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looking good mostly :)

left some comments, curious to hear your thougts.

server/src/files.rs Outdated Show resolved Hide resolved
server/src/files.rs Outdated Show resolved Hide resolved
server/src/files.rs Outdated Show resolved Hide resolved
server/src/handlers/upload.rs Outdated Show resolved Hide resolved
@metame
Copy link
Contributor Author

metame commented Sep 10, 2023

@joepio This should be ready for another look.

Copy link
Member

@joepio joepio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! One more general point maybe: Could you add some documentation and tests? I'd accept a PR without them if you had enough of this.

Especially one happy flow for FileStore in tests would be nice.

server/src/files.rs Outdated Show resolved Hide resolved
server/src/files.rs Show resolved Hide resolved
server/src/handlers/upload.rs Outdated Show resolved Hide resolved
@metame
Copy link
Contributor Author

metame commented Sep 13, 2023

Looking good! One more general point maybe: Could you add some documentation and tests? I'd accept a PR without them if you had enough of this.

Especially one happy flow for FileStore in tests would be nice.

Just tests and documentation left. Should be able to finish this up on Friday.

@metame
Copy link
Contributor Author

metame commented Sep 18, 2023

Alright @joepio this is ready for another look. Tests are passing and functionality tested with small and large files.

@joepio
Copy link
Member

joepio commented Sep 19, 2023

Great job! Only thing I still have left is authentication.

@AlexMikhalev
Copy link
Collaborator

AlexMikhalev commented Oct 3, 2023

You can init OpenDal via both struct+builder or hashmap, passing region and endpoint:

fn init_operator_via_map() -> Result<Operator> {
    // setting up the credentials
    let access_key_id =
        env::var("AWS_ACCESS_KEY_ID").expect("AWS_ACCESS_KEY_ID is set and a valid String");
    let secret_access_key =
        env::var("AWS_SECRET_ACCESS_KEY").expect("AWS_ACCESS_KEY_ID is set and a valid String");

    let mut map = HashMap::default();
    map.insert("bucket".to_string(), "test".to_string());
    map.insert("region".to_string(), "us-east-1".to_string());
    map.insert("endpoint".to_string(), "http://rpi4node3:8333".to_string());
    map.insert("access_key_id".to_string(), access_key_id);
    map.insert("secret_access_key".to_string(), secret_access_key);

    let op = Operator::via_map(Scheme::S3, map)?;
    Ok(op)
}

But honestly, I don't see anything wrong with the current auth, @joepio. What's your take on it? I believe Keypair allow to work with both private and public.

@AlexMikhalev
Copy link
Collaborator

Late to the party, but personally, I would prefer to remove this parameter:

    /// CAUTION: Skip authentication checks, making all data publicly readable. Improves performance.
    #[clap(long, env = "ATOMIC_PUBLIC_MODE")]
    pub public_mode: bool,

It's the job of the atomic server to decide if the file or any other resource is public or visible to the group. Storage is always private with a key/secret pair; if people want to share their s3 bucket as a public website, they have another way of doing it.

@joepio
Copy link
Member

joepio commented Oct 3, 2023

Late to the party, but personally, I would prefer to remove this parameter:

    /// CAUTION: Skip authentication checks, making all data publicly readable. Improves performance.
    #[clap(long, env = "ATOMIC_PUBLIC_MODE")]
    pub public_mode: bool,

It's the job of the atomic server to decide if the file or any other resource is public or visible to the group. Storage is always private with a key/secret pair; if people want to share their s3 bucket as a public website, they have another way of doing it.

This flag is unrelated to this PR / S3, it's for regular authorization checks.

@joepio joepio force-pushed the 543-s3-uploads branch 2 times, most recently from e83a734 to c1ade8c Compare January 18, 2024 15:00
metame and others added 2 commits January 18, 2024 16:36
added s3 config vars

s3 uploads working

prefix file-id with store type

cargo format

download building

download working

possibly fix issue with :

refactor files logic into files module

static str lovelies

clean up log

changelog and readme updates

use tokio streams in upload

check bucket exists before unwrap

Store config in FileStore

cargo fmt

consolidate download fs path logic

move file stores to appstate

no unwrap

refactor

finish refactor

tests passing

back to nicer interface

documentation

move test to tests.rs

readme and dep fix
@joepio
Copy link
Member

joepio commented Jan 19, 2024

For some reason when I refresh a file, it gives me a 404. When I press retry, it works. I'll try to find out what's going on.

@joepio
Copy link
Member

joepio commented Jan 25, 2024

I think my 404 issue has to do with the URL encoding. The first part in s3:fileID is URL encoded, so it becomes s3%3AfileID (: becomes %3A). When we refresh the page, though, the server un-encodes it I think so we get the : instead, which 404s.

@metame do you remember what the encoding was needed for? Sorry I only find out about this so long after you opened this PR!

@metame
Copy link
Contributor Author

metame commented Apr 19, 2024

I think my 404 issue has to do with the URL encoding. The first part in s3:fileID is URL encoded, so it becomes s3%3AfileID (: becomes %3A). When we refresh the page, though, the server un-encodes it I think so we get the : instead, which 404s.

@metame do you remember what the encoding was needed for? Sorry I only find out about this so long after you opened this PR!

So the main reason we implemented the file prefix was to support backwards compatibility among file systems. That is, if an atomic server instance had files uploaded to the server directly and then turned on s3 support, the existing files would still work.

I've found the source of this bug, but the fix has some drawbacks. Specifically, it requires url-encoding the subject path for every resource that goes through handler::get_resource like here. Perhaps, this is acceptable. If not, we could drop the behavior that supports backwards compatibility and server admins would have to deal with the migration themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants