Skip to content
This repository has been archived by the owner on Aug 4, 2022. It is now read-only.

[REEF-2036] Add IFileSystem api to check if uri is a directory #1473

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

dwaijam
Copy link
Contributor

@dwaijam dwaijam commented Jun 9, 2018

[REEF-2036] Add IFileSystem api to check if uri is a directory

Currently, there is no way to check if a location specified by a uri is a directory or not. The only way to check it is by calling fs.GetChildren and checking for an exception. There should be a more graceful way of doing this check by adding an api. This change addresses this.

JIRA:
REEF-2036

Pull request:
This closes 1473

public bool IsDirectory(Uri uri)
{
var path = uri.AbsolutePath.Trim('/');
var blobItems = _client.ListBlobsSegmented(path, false, BlobListingDetails.Metadata, 1, null, null, null).Results;
Copy link
Contributor

@tyclintw tyclintw Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 [](start = 97, length = 1)

1 [](start = 97, length = 1)

super-nit: Make a static variable. It helps self document the code. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating a local variable


In reply to: 194554180 [](ancestors = 194554180)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const static?


In reply to: 194580367 [](ancestors = 194580367,194554180)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static wont apply to local variable. Not sure if you wanted to make this a class variable. I have used const only


In reply to: 195218006 [](ancestors = 195218006,194580367,194554180)

/// <returns>true if uri is for a directory else false</returns>
public bool IsDirectory(Uri uri)
{
return Exists(uri) && _adlsClient.GetDirectoryEntry(uri.AbsolutePath).Type == DirectoryEntryType.DIRECTORY;
Copy link
Contributor

@tyclintw tyclintw Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exists(uri) && [](start = 19, length = 14)

Exists(uri) && [](start = 19, length = 14)

Call Exists doubles the number of network calls. Can we just rely on the GetDirectoryEntry alone? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


In reply to: 194554633 [](ancestors = 194554633)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the double network call but the catch all could hide simple exceptions like a network error.


In reply to: 194580374 [](ancestors = 194580374,194554633)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change reverted


In reply to: 195218945 [](ancestors = 195218945,194580374,194554633)

Copy link
Contributor

@singlis singlis Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the change reverted? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_adlsClient.GetDirectoryEntry(uri.AbsolutePath) throws exception if uri not present


In reply to: 195886246 [](ancestors = 195886246)

Copy link
Contributor

@singlis singlis Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you look at the source for adls client, https://github.com/Azure/azure-data-lake-store-net/blob/master/AdlsDotNetSDK/ADLSClient.cs

CheckExists internally call GetDirectoryEntry, so you are calling it twice. Can you model what the CheckExists code does internally? That may help with handling the specific case.


In reply to: 196918926 [](ancestors = 196918926,195886246)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 197270432 [](ancestors = 197270432,196918926,195886246)

public bool IsDirectory(Uri uri)
{
var stdOut = _commandRunner.Run("dfs -count " + uri).StdOut;
return stdOut.Any() && stdOut.First().StartsWith("1");
Copy link
Contributor

@tyclintw tyclintw Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.StartsWith("1"); [](start = 49, length = 17)

Elaborate this check. What if it the result is "11"? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 194555273 [](ancestors = 194555273)

@@ -295,6 +290,20 @@ public void TestDeleteDirectorySecondLevelE2E()
Assert.True(CheckContainerExists(_container));
}

private List<CloudBlockBlob> SetupBlobs(string directory)
Copy link
Contributor

@tyclintw tyclintw Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetupBlobs [](start = 37, length = 10)

nit: SetupBlobs doesn't really let me know what this function is doing. I suggest renaming something more explanitory and also exposing the blob count as well. maybe something like CreateTempBlobs(string directory, int blobCount=3) #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 194555964 [](ancestors = 194555964)

string dirName = $"/{_defaultFolderName}";
string fakeDirName = $"/fakeDir";
_adlsClient.CreateDirectory(dirName);
string fileName = UploadFromString(ContentsText);
Copy link
Contributor

@tyclintw tyclintw Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: try to add newlines to break out the logic. Giant block of code are difficult to follow. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 194556435 [](ancestors = 194556435)

Assert.False(_fileSystem.IsDirectory(remoteFileUri));
_fileSystem.DeleteDirectory(remoteDirUri);
_fileSystem.Delete(remoteFileUri);
File.Delete(localFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup should be handled in the test cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing tests follow this same pattern


In reply to: 194556624 [](ancestors = 194556624)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but please avoid in the future if possible. Please break up this code block for readability as well.


In reply to: 194580410 [](ancestors = 194580410,194556624)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 195218460 [](ancestors = 195218460,194580410,194556624)

Copy link
Contributor

@tyclintw tyclintw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@@ -228,6 +229,21 @@ public void TestCopyFromLocalE2E()
container.DeleteIfExistsAsync().Wait();
}

[Fact(Skip = SkipMessage)]
Copy link
Contributor

@singlis singlis Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is following the existing pattern -- but instead of having to remove all the skips and then set credentials, can we change this to dynamically skip the test if the credentials are not set? It looks like xunit does support a SkipTestException: https://github.com/xunit/samples.xunit/blob/master/DynamicSkipExample/SkipTestException.cs

Not sure if this is supported in our version of xunit or not. #Resolved

Copy link
Contributor

@singlis singlis Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also put the check in a single function and call it before the test. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkippableFact is not supported in our version of xunit.
Yopu dont need to change each SkipMessage.Just set SkipMessage to null at top and all tests will run. These tests can only be run locally and a valid connectionstring is required to be inserted. I dont see how making this dynamic would be any help since the conn string validation will be run everytime in a build pipeline and it will always be expected to fail


In reply to: 195885915 [](ancestors = 195885915)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value can only be a const string.
You dont need to change each SkipMessage.Just set SkipMessage to null at top and all tests will run. These tests can only be run locally and a valid connectionstring is required to be inserted. I dont see how making this dynamic would be any help since the conn string validation will be run everytime in a build pipeline and it will always be expected to fail


In reply to: 195886005 [](ancestors = 195886005)

Copy link
Contributor

@singlis singlis Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sounds good, it maybe worth making that a bit more clear above ( there is a comment off to the side that says use null to run tests. It maybe more clear to have another line under SkipMessage with SkipMessage = null commented out and a comment above to uncomment to run skipped tests.


In reply to: 196930627 [](ancestors = 196930627,195885915)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this as it is as discussed offline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate? We might need to refer to this PR in a future when we have all forgotten what was discussed offline :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scott said it is fine to have the current line.
private const string SkipMessage = "Fill in credentials before running test"; // Use null to run tests

Copy link
Contributor

@singlis singlis Jun 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did -- so setting SkipMessage=null has the same effect as the SkipTestException since our version of XUnit does not support it. But I think the comment of "// Use null to run tests" could be a bit more explicit. This is more what I was thinking:

SkipMessage="....";
//Uncomment SkipMessage=null to run tests
//SkipMessage=null
``` #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested

/// <returns>true if uri is for a directory else false</returns>
public bool IsDirectory(Uri uri)
{
var path = uri.AbsolutePath.Trim('/');
Copy link
Contributor

@singlis singlis Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the uri need to trim the '/'? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ListBlobsSegmented does not work is there is a '/' at the beginining of the path. I have changed to AbsolutePath.TrimStart('/')


In reply to: 195886086 [](ancestors = 195886086)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks


In reply to: 196934429 [](ancestors = 196934429,195886086)

/// <returns>true if uri is for a directory else false</returns>
public bool IsDirectory(Uri uri)
{
return Exists(uri) && _adlsClient.GetDirectoryEntry(uri.AbsolutePath).Type == DirectoryEntryType.DIRECTORY;
Copy link
Contributor

@singlis singlis Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the change reverted? #Resolved

@@ -142,6 +142,18 @@ public void CreateDirectory(Uri directoryUri)
_commandRunner.Run("dfs -mkdir " + directoryUri);
}

public bool IsDirectory(Uri uri)
{
var stdOut = _commandRunner.Run("dfs -count " + uri).StdOut;
Copy link
Contributor

@singlis singlis Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to do this? Please see here:
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#test
hdfs dfs -test -d will check to see if the path is a directory. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried to use this initially but it does not return any value. It sets the value in $? but I am not sure how to run that command using the commandrunner


In reply to: 195886573 [](ancestors = 195886573)

Copy link
Contributor

@singlis singlis Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So $? is a linux variable that contains the value of the last run command. This will be 0 for success or 1 for failure. Talking with you offline, it looks like %errorlevel% in dos would work, however we need to account for linux as well. I think the right way to do this is to echo the %errorlevel% for now and file a JIRA to handle this correctly on linux. Parsing an integer from stdout should be a lot simpler and less error prone.


In reply to: 196918217 [](ancestors = 196918217,195886573)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 197282264 [](ancestors = 197282264,196918217,195886573)

continuationToken,
blobRequestOptions,
operationContext);
task.Wait();
Copy link
Contributor

@singlis singlis Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task.Wait can be removed and just have return task.Result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been taken care of in a previous PR but I have changed it here too

{
// TODO[JIRA REEF - 2039]: HadoopFileSystem .IsDirectory() check needs to work on linux machines.
var stdOut = _commandRunner.Run("dfs -test -d " + uri + "& call echo %^errorlevel%").StdOut;
if (stdOut.Any())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit pick, this if statement could be reduced to return stdOut.Any() && stdOut.First() == '0';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @dwaijam

Copy link
Contributor

@tyclintw tyclintw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@tyclintw tyclintw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Assert.False(fs.IsDirectory(fakeDirectoryUri));
Assert.False(fs.IsDirectory(fileUri));
fs.Delete(fileUri);
fs.DeleteDirectory(directoryUri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cleanup block to make more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up


In reply to: 198553473 [](ancestors = 198553473)

Copy link
Contributor

@tyclintw tyclintw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@motus motus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few nit picks

/// <returns>true if uri is for a directory else false</returns>
public bool IsDirectory(Uri uri)
{
var path = uri.AbsolutePath.TrimStart('/');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use UrlPathSeparator constant

Copy link
Contributor Author

@dwaijam dwaijam Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 198668431 [](ancestors = 198668431)

{
var path = uri.AbsolutePath.TrimStart('/');
const int maxBlobResults = 1;
var blobItems = _client.ListBlobsSegmented(path, false, BlobListingDetails.Metadata, maxBlobResults, null, null, null).Results;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we need maxBlobResults constant here. for readability, we can just write maxResults: 1, e.g.

var blobItems = _client.ListBlobsSegmented(
    path,
    useFlatListing: false,
    BlobListingDetails.Metadata,
    maxResults: 1,
    continuationToken: null,
    blobRequestOptions: null,
    operationContext: null).Results;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 198669314 [](ancestors = 198669314)

var path = uri.AbsolutePath.TrimStart('/');
const int maxBlobResults = 1;
var blobItems = _client.ListBlobsSegmented(path, false, BlobListingDetails.Metadata, maxBlobResults, null, null, null).Results;
return blobItems.Any() && blobItems.First() is CloudBlobDirectory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe blobItems.OfType<CloudBlobDirectory>().Any()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 198670124 [](ancestors = 198670124)

@@ -69,6 +68,26 @@ public ICloudBlockBlob GetBlockBlobReference(Uri uri)
}

public BlobResultSegment ListBlobsSegmented(
string prefix,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter name prefix differs from the base name path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 198671019 [](ancestors = 198671019)

continuationToken,
blobRequestOptions,
operationContext);
return task.Result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use task.Result in this method, but .GetAwaiter().GetResult() in ListDirectoryBlobsSegmented() implementation. is there are reason for it? if not, we should probably use same calls everywhere for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing all task.Result to task.GetAwaiter().GetResult() based on this discussion https://stackoverflow.com/questions/17284517/is-task-result-the-same-as-getawaiter-getresult which says task.Result hides actual exceptions


In reply to: 198671715 [](ancestors = 198671715)

@@ -336,6 +333,20 @@ public void TestDeleteDirectorySecondLevelE2E()
Assert.True(CheckContainerExists(_container));
}

private List<CloudBlockBlob> CreateTempBlobs(string directory, int fileCount = 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use more general return type, e.g. IList or even IEnumerable

blockBlobs.Add(blockBlob);
}
return blockBlobs;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use Linq? e.g.

private IEnumerable<CloudBlockBlob> CreateTempBlobs(string directory, int fileCount = 3)
{
    return Enumerable.Range(0, fileCount).Select(i =>
    {
        var filePath = directory + '/' + i;
        var blockBlob = _container.GetBlockBlobReference(filePath);
        UploadFromString(blockBlob, "hello");
        Assert.True(CheckBlobExists(blockBlob), "Blob does not exist: " + filePath);
        return blockBlob;
    });
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


In reply to: 198681302 [](ancestors = 198681302)


Assert.True(_fileSystem.IsDirectory(PathToFile(dirName)));
Assert.False(_fileSystem.IsDirectory(PathToFile(fakeDirName)));
Assert.False(_fileSystem.IsDirectory(PathToFile(fileName)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test for three different conditions here. Maybe split it into three unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


In reply to: 198681751 [](ancestors = 198681751)

@@ -269,6 +272,21 @@ public void TestCopyFromLocalE2E()
container.DeleteIfExistsAsync().Wait();
}

[Fact(Skip = SkipMessage)]
public void TestIsDirectory()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test for several different conditions here. Maybe split this test into several independent tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 198682001 [](ancestors = 198682001)

/// Tests whether .IsDirectory() works.
/// </summary>
[Fact(Skip = SkipMessage)]
public void TestIsDirectory()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split that test into three independent tests - one for each condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 198682763 [](ancestors = 198682763)

dwaijam added 4 commits June 29, 2018 16:27
SKIP e2e tests

address cr comments

handle path not found expception

cr comments fix

address cr comments

fix cr comments

cr comment fixes

fix validate children test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants