Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use default aws credential provider #5

Open
wants to merge 53 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
4e73ec5
Update README.md
oshyshko May 31, 2019
8b7125a
Merge branch 'develop' of github.com:oshyshko/uio into develop
artstwolf May 20, 2021
902add9
aws creds
artstwolf May 20, 2021
db53b3b
use `"hadoop.security.authentication" "simple"`
changliang1007 Aug 10, 2021
b7b8ac7
update link
changliang1007 Aug 10, 2021
803168a
Update README.md
oshyshko May 31, 2019
f175c49
rebase to develop
Nov 22, 2021
767662a
correct changes to impl
Nov 22, 2021
718ebc2
remove space
Nov 22, 2021
9792700
Merge branch 'master' into develop
Nov 22, 2021
ca21266
fix tests
Nov 22, 2021
b3064cf
rm extra line
Nov 22, 2021
8417698
change back s3 access for hdfs
Nov 23, 2021
0a89b5f
Merge pull request #2 from Factual/use-crendential-chain
irisxingfu Nov 23, 2021
6713bc4
Merge branch 'develop' into feature/aws-creds
Nov 24, 2021
004a959
Merge branch 'feature/aws-creds' into feature/simple-auth-hdfs
Nov 24, 2021
6fa71fe
UIO should not control hdfs auth
Nov 24, 2021
b24c70b
Dont override s3 impls
Nov 24, 2021
1170803
Merge pull request #3 from Factual/feature/aws-creds
irisxingfu Nov 29, 2021
13bc4ca
Update project.clj
Nov 29, 2021
6a658d2
Merge pull request #4 from Factual/feature/simple-auth-hdfs
Dec 6, 2021
a20d389
release 1.2
Dec 6, 2021
c646853
bump version to 1.3-SNAPSHOT
Dec 6, 2021
d5edfb8
ALlow s3 to access bucket in any region
Dec 7, 2021
7101f35
Check for configured cred overrides in s3
Dec 17, 2021
cba0bab
Merge pull request #6 from Factual/s3-url-creds
Dec 17, 2021
7512ee4
UIO V1.2.1
Dec 18, 2021
86b63b3
Snapshot
Aug 29, 2022
aea61ce
Make connection timeout configurable
Sep 15, 2022
f2c8ac0
Remove log statement
Sep 15, 2022
1b72eb0
Apply suggestions from code review
Sep 16, 2022
b9ab164
Merge pull request #7 from Factual/sftp-connection-timeout
Sep 16, 2022
6ad7d3c
Upgrade aws sdk
Sep 16, 2022
65b990d
Upgrade jsch
Sep 16, 2022
8ba0de0
uio 1.2.2
Sep 16, 2022
5345750
snapshot
Sep 16, 2022
5b6cc59
rm `Etags don't match`
changliang1007 Jan 5, 2023
ef0273c
Merge branch 'develop' into DEL-2151-s3-etag-not-matching-md5-after-K…
changliang1007 Jan 5, 2023
32b0acc
rm useless assignment
changliang1007 Jan 5, 2023
e210915
Update CHANGELOG.md
changliang1007 Jan 5, 2023
6a398fa
Update CHANGELOG.md
changliang1007 Jan 5, 2023
dc2baf6
rm `partDigest`
changliang1007 Jan 9, 2023
fe322e6
Merge pull request #8 from Factual/DEL-2151-s3-etag-not-matching-md5-…
changliang1007 Jan 9, 2023
b43b451
release 1.2.3
changliang1007 Jan 9, 2023
01a8dc7
bump to 1.2.4-SNAPSHOT
changliang1007 Jan 9, 2023
6add7fc
Support listing in res fs
Jan 23, 2023
6f6e247
Merge pull request #9 from Factual/res-ls-support
Jan 25, 2023
620fe3f
Release V1.2.4
Jan 25, 2023
c799bdc
Snapshot
Jan 25, 2023
0bbe9f8
Remove keytab+principle from hdfs tests
Jan 30, 2023
eba657d
Add size to res ls
Jan 30, 2023
e59ac06
Release V1.2.5
Jan 30, 2023
cfcdbab
Snapshot
Jan 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,32 @@
# Changelog

## [1.2] - unreleased
## Unreleased

## [1.2.5] - 2023-01-30
### Added
- Size field when listing in resource filesystem.

## [1.2.4] - 2023-01-25
### Added
- Support for listing in resource filesystem.

## [1.2.3] - 2023-01-09
### Fixed
- S3OutputStream no longer validates a Multipart Upload's ETag against its MD5 digest, according to [this](https://stackoverflow.com/a/53886736). [DEL-2151](https://foursquare.atlassian.net/browse/DEL-2151)

## [1.2.2] - 2022-9-16
### Added
- SFTP fs now allows configuration of the connection timeout: `(uio.fs.sftp/with-sftp-configs {:connection-timeout 10000} #(uio/to* to-url))`
### Changed
- Upgrade AWS SDK to 1.12.300
- Upgrade jsch to 0.1.55

## [1.2.1] - 2021-12-17
### Changed
- Force ability to use bucket in any region.
- S3 fs checks for configured access and secret overrides again.

## [1.2] - 2021-12-06
### Fixed
- proper escaping of ` `, `+` and `%` in `file://`, `hdfs://`, `s3://` and `sftp://`
- don't lookup credentials from env (compatibility with v1.0) when there is a matching url in configuration
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Features:
|[hdfs](#hdfs) | • | • | • | • | • | • | • | • |`hdfs://[host]/path/to/file.txt` |
|[http(s)](#https) | • | |:cat:| :cat: | | | | |`http[s]://host[:port]/path/to/file.txt`|
|[mem](#mem) | • | • | • | • | • | • | • | • |`mem:///path/to/file.txt` |
|[res](#res) | • | | | • | | | | |`res:///com/mypackage/file.txt` |
|[res](#res) | • | | | • | | | | |`res:///com/mypackage/file.txt` |
|[s3](#s3) | • | • | • | • | • |:dog:| • | • |`s3://bucket/key/with/slashes.txt` |
|[sftp](#sftp) | • |:bug:| • | • |:pig: | • | • | • |`sftp://host[:port]/path/to/file.txt` |

Expand Down Expand Up @@ -434,6 +434,7 @@ This will also work for S3 buckets/paths and SSH hosts/ports/paths.
You can override multiple URL prefixes, the rule of thumb is: the longest URL prefix that matches your URL wins.

## License
Copyright © Oleksandr Shyshko. All rights reserved.

The use and distribution terms for this software are covered by the
Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
Expand Down
27 changes: 18 additions & 9 deletions project.clj
Original file line number Diff line number Diff line change
@@ -1,18 +1,26 @@
(defproject uio/uio "1.2-SNAPSHOT"
(defproject uio/uio "1.2.6-SNAPSHOT"
:description "uio is a Clojure library and a command line tool for accessing HDFS, S3, SFTP and other file systems."

:repositories {"cloudera" "https://repository.cloudera.com/content/groups/cdh-releases-rcs"}
:repositories {"cloudera" "https://repository.cloudera.com/content/groups/cdh-releases-rcs"
"foursquare" {:url "https://foursquaredev.jfrog.io/foursquaredev/fsnexus"
:username :env/MVN_USERNAME :password :env/MVN_PASSWORD}}

:deploy-repositories [["clojars" {:url "https://clojars.org/repo/"
:sign-releases false}]]
:deploy-repositories {"snapshots" {:id "foursquare"
:url "https://foursquaredev.jfrog.io/foursquaredev/fsfactual-snapshots-local"
:username :env/MVN_USERNAME :password :env/MVN_PASSWORD
:sign-releases false}
"releases" {:id "foursquare"
:url "https://foursquaredev.jfrog.io/foursquaredev/fsfactual-releases-local"
:username :env/MVN_USERNAME :password :env/MVN_PASSWORD
:sign-releases false}}

:dependencies [[org.clojure/clojure "1.9.0"]

[com.amazonaws/aws-java-sdk-s3 "1.11.417"] ; s3
[com.amazonaws/aws-java-sdk-sts "1.11.417"] ; s3 with roles
[org.apache.httpcomponents/httpclient "4.5.6"] ; (needed by `aws-java-sdk-s3`)
[com.amazonaws/aws-java-sdk-s3 "1.12.300"] ; s3
[com.amazonaws/aws-java-sdk-sts "1.12.300"] ; s3 with roles
[org.apache.httpcomponents/httpclient "4.5.13"] ; (needed by `aws-java-sdk-s3`)

[com.jcraft/jsch "0.1.54"] ; sftp
[com.jcraft/jsch "0.1.55"] ; sftp
[com.jcraft/jzlib "1.1.3"] ; (needed by `jsch`)

[org.apache.hadoop/hadoop-common "2.8.1" ; hdfs (API) note: 3.1.1 is available, but it can't find HDFS impl
Expand All @@ -32,6 +40,7 @@
"-target" "1.8"
"-Xlint:deprecation"
"-Xlint:unchecked"]
:resource-paths ["resources"]

:profiles {:dev {:dependencies [[midje "1.9.2"]]
:plugins [[lein-midje "3.2.1"]]}}
Expand All @@ -47,4 +56,4 @@
; A trick to prevent IntelliJ from resetting compiler/module version to "1.5"
:pom-plugins [[org.apache.maven.plugins/maven-compiler-plugin "3.6.1"
[:configuration ([:source "1.8"]
[:target "1.8"])]]])
[:target "1.8"])]]])
1 change: 1 addition & 0 deletions resources/test/test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
gdbg
23 changes: 1 addition & 22 deletions src/uio/fs/S3.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ public static class S3OutputStream extends OutputStream {

private final MessageDigest inDigest = MessageDigest.getInstance("MD5");
private final MessageDigest outDigest = MessageDigest.getInstance("MD5");
private final MessageDigest partDigest = MessageDigest.getInstance("MD5");

private final File partTempFile;
private Streams.StatsableOutputStream partOutputStream;
Expand Down Expand Up @@ -65,7 +64,6 @@ public void write(byte[] bs, int offset, int length) throws IOException {
// append to buffer
partOutputStream.write(bs, offset, bytesToCopy);

partDigest.update(bs, offset, bytesToCopy);
outDigest.update(bs, offset, bytesToCopy);

offset += bytesToCopy;
Expand All @@ -81,7 +79,6 @@ private void assertOpen() throws IOException {
private void _flush(boolean isLastPart) throws IOException {
partOutputStream.close();

String localPartEtag = hex(partDigest.digest());
try {
UploadPartRequest upr = new UploadPartRequest()
.withBucketName(init.getBucketName())
Expand All @@ -95,13 +92,7 @@ private void _flush(boolean isLastPart) throws IOException {
PartETag remotePartEtag = c.uploadPart(upr).getPartETag();
tags.add(remotePartEtag);

if (!remotePartEtag.getETag().equals(localPartEtag)) {
throw new RuntimeException("Part ETags don't match:\n" +
" - local : " + localPartEtag + "\n" +
" - remote: " + remotePartEtag.getETag());
}

partDigest.reset();
partOutputStream = new Streams.StatsableOutputStream(new FileOutputStream(partTempFile));
partIndex++;
} catch (Exception e) {
Expand All @@ -124,20 +115,8 @@ public void close() throws IOException {
" - read : " + read + "\n" +
" - written: " + written);

String remoteEtag = c.completeMultipartUpload(
new CompleteMultipartUploadRequest(init.getBucketName(), init.getKey(), init.getUploadId(), tags)
).getETag();
c.completeMultipartUpload(new CompleteMultipartUploadRequest(init.getBucketName(), init.getKey(), init.getUploadId(), tags));

partDigest.reset();
for (PartETag tag : tags) {
partDigest.update(unhex(tag.getETag()));
}
String localEtag = hex(partDigest.digest()) + "-" + partIndex;

if (!localEtag.equals(remoteEtag))
throw new RuntimeException("Etags don't match:\n" +
" - local : " + localEtag + "\n" +
" - remote: " + remoteEtag);
} catch (Exception e) {
abort(); // TODO delete remote file if exception happened after `c.completeMultipartUpload(...)`
throw e;
Expand Down
17 changes: 1 addition & 16 deletions src/uio/fs/hdfs.clj
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
; :access (optional) S3 access
; :secret (optional) S3 secret
;
; NOTE: to use `kinit` isntead of keytab file, pass empty creds (`{}` or all nil values)
; NOTE: to use `kinit` instead of keytab file, pass empty creds (`{}` or all nil values)
;
(ns uio.fs.hdfs
(:require [clojure.string :as str]
Expand All @@ -29,41 +29,26 @@
(let [c (Configuration.)
creds (url->creds url)

principal (:principal creds)
keytab-path (some-> (:keytab creds) path)
aws-access (:access creds)
aws-secret (:secret creds)]

(when (and aws-access aws-secret)
(.set c "fs.s3a.impl" "org.apache.hadoop.fs.s3a.S3AFileSystem")
(.set c "fs.s3a.access.key" aws-access)
(.set c "fs.s3a.secret.key" aws-secret)

(.set c "fs.s3n.impl" "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
(.set c "fs.s3n.awsAccessKeyId" aws-access)
(.set c "fs.s3n.awsSecretAccessKey" aws-secret)

(.set c "fs.s3.impl" "org.apache.hadoop.fs.s3.S3FileSystem")
(.set c "fs.s3.awsAccessKeyId" aws-access)
(.set c "fs.s3.awsSecretAccessKey" aws-secret))

(doseq [url ["file:///etc/hadoop/conf/core-site.xml"
"file:///etc/hadoop/conf/hdfs-site.xml"]]
(if (exists? url)
(.addResource c (URL. url))))

(.set c "hadoop.security.authentication" "kerberos")

(UserGroupInformation/setConfiguration c)

; only use keytab creds if either user or keytab path was specified, otherwise rely on default auth (e.g. if ran from kinit/Yarn)
(when (or principal keytab-path)
(UserGroupInformation/loginUserFromKeytab principal keytab-path)

; TODO is there a way to provide more information about the failure?
(if-not (UserGroupInformation/isLoginKeytabBased)
(die "Could not authenticate. Wrong or missing keytab?")))

c))

(defn ->fs [^String url]
Expand Down
18 changes: 16 additions & 2 deletions src/uio/fs/res.clj
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@
; ^^^ triple slash
;
(ns uio.fs.res
(:require [uio.impl :refer :all])
(:import (clojure.java.api Clojure)))
(:require
[uio.impl :refer :all])
(:import (clojure.java.api Clojure)
(java.io File)))

(defn assert-res-url [url]
(if (host url)
Expand All @@ -20,3 +22,15 @@
(defmethod exists? :res [url & args] (if (.getResource Clojure (path (assert-res-url url)))
true
false))

(defmethod ls :res [url & args]
(->>
(.substring (path (normalize url)) 1) ; get path and remove leading slash
(.getResources (.getClassLoader Clojure)) ; Multiple resources can have the same name
(enumeration-seq)
(map #(File. (.getPath %)))
(map #(if (.isFile %)
%
(seq (.listFiles %))))
(flatten) ; If it's a directory, flatten the list of files.
(map #(do {:url (str "file://" %) :size (.length %)})))) ; expected format is a list of maps
31 changes: 15 additions & 16 deletions src/uio/fs/s3.clj
Original file line number Diff line number Diff line change
Expand Up @@ -9,31 +9,30 @@
(ns uio.fs.s3
(:require [uio.impl :refer :all]
[clojure.string :as str])
(:import [com.amazonaws.auth BasicAWSCredentials STSAssumeRoleSessionCredentialsProvider AWSCredentialsProvider]
[com.amazonaws.internal StaticCredentialsProvider]
[com.amazonaws.services.s3 AmazonS3Client]
(:import [com.amazonaws.services.s3 AmazonS3ClientBuilder]
[com.amazonaws.services.s3.model ListObjectsRequest ObjectListing S3ObjectSummary GetObjectRequest CannedAccessControlList AmazonS3Exception]
[uio.fs S3$S3OutputStream]
[java.nio.file NoSuchFileException]))
[java.nio.file NoSuchFileException]
(com.amazonaws.auth BasicAWSCredentials AWSStaticCredentialsProvider)))

(defn bucket-key->url [b k]
(str "s3://" b default-delimiter (escape-path k)))

(defn url->key [^String url]
(subs (or (path url) "_") 1))

(defn ^AWSCredentialsProvider ->creds-provider [url]
(let [{:keys [access secret role-arn] :as creds} (url->creds url)
_ (if-not access (die-creds-key-not-found :access url creds))
_ (if-not secret (die-creds-key-not-found :secret url creds))
bawsc (BasicAWSCredentials. access secret)]
(if role-arn
(STSAssumeRoleSessionCredentialsProvider. bawsc ^String role-arn "uio-s3-session")
(StaticCredentialsProvider. bawsc))))
(defn client-for-url [^String url]
(let [client-builder (AmazonS3ClientBuilder/standard)
{:keys [access secret]} (url->creds url)]
(when (and access secret)
(.withCredentials client-builder (AWSStaticCredentialsProvider. (BasicAWSCredentials. access secret))))
(.withForceGlobalBucketAccessEnabled client-builder true)
(.build client-builder)))


(defn with-client-bucket-key [url c-b-k->x]
(try-with url
#(AmazonS3Client. (->creds-provider url))
#(client-for-url url)
#(c-b-k->x % (host url) (url->key url))
#(.shutdown %)))

Expand All @@ -51,7 +50,7 @@
(+ start
(:length opts))
(dec (Long/MAX_VALUE)))]
(wrap-is #(AmazonS3Client. (->creds-provider url))
(wrap-is #(client-for-url url)
#(.getObjectContent
(.getObject %
(.withRange
Expand All @@ -60,7 +59,7 @@
end)))
#(.shutdown %))))

(defmethod to :s3 [url & [opts]] (wrap-os #(AmazonS3Client. (->creds-provider url))
(defmethod to :s3 [url & [opts]] (wrap-os #(client-for-url url)
#(S3$S3OutputStream. % (host url) (url->key url) (some-> opts :acl acl->enum))
#(.shutdown %)))

Expand Down Expand Up @@ -163,7 +162,7 @@
(defmethod ls :s3 [url & args] (single-file-or
url
(let [opts (get-opts default-opts-ls url args)
c (AmazonS3Client. (->creds-provider url))
c (client-for-url url)
b (host url)
k (url->key (ensure-ends-with-delimiter url))]
(cond->> (close-when-realized-or-finalized
Expand Down
16 changes: 13 additions & 3 deletions src/uio/fs/sftp.clj
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,20 @@
(:import [com.jcraft.jsch JSch ChannelSftp Session SftpException SftpATTRS Channel]
[java.io ByteArrayInputStream]
[java.util.zip GZIPOutputStream GZIPInputStream]
[java.util Date]))
[java.util Date]
(clojure.lang IPersistentMap)))

(def default-timeout-ms 10000)

(def ^:dynamic *sftp-connection-config* {:connection-timeout default-timeout-ms})

(defn with-sftp-configs [config f]
(if-not (instance? IPersistentMap config)
(die (str "Argument `config` expected to be a map, but was " (.getName (class config)))))

(binding [*sftp-connection-config* (merge *sftp-connection-config* config)]
(f)))

; JSch expects a private key with new-line characters as described in RFC-4716.
; However, it's useful to pass private keys around as a single-line string where new-lines are replaced with space.
; This fn will convert a single-line private key back to multi-line format and make JSch happy.
Expand Down Expand Up @@ -71,7 +81,7 @@
(.getBytes (or identity-pass ""))))

s (.getSession j user (host url) (or (port url) 22)) ; ^Session
_ (.setTimeout s default-timeout-ms)
_ (.setTimeout s (:connection-timeout *sftp-connection-config*))
_ (.setConfig s "StrictHostKeyChecking" (if known-hosts "yes" "now"))
_ (.setPassword s pass)
_ (.connect s)
Expand Down Expand Up @@ -140,7 +150,7 @@
(defmethod copy :sftp [from-url to-url & args] (try-with to-url
#(->session+channel to-url)
(fn [[_ c]]
(with-open [is (from from-url)]
(with-open [is (from from-url args)]
(.put c is (path to-url))))
(fn [[s c]]
(.disconnect c)
Expand Down
Loading