Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code question #119

Open
ryan-williams opened this issue Oct 13, 2016 · 2 comments
Open

Code question #119

ryan-williams opened this issue Oct 13, 2016 · 2 comments

Comments

@ryan-williams
Copy link
Contributor

I was just looking at this bit of BAMInputFormat; I assumed its goal was to set splitsEnd to the first split >i that has a different path than the split at i, yet it seems like that loop will set splitsEnd to the last such split.

Am I missing something? Thanks!

@ryan-williams
Copy link
Contributor Author

Looking at this more, I feel like this is definitely a bug in split-generation in the presence of a .splitting-bai, and possibly a severe one; I'll try to come up with a repro case.

Can anyone comment on how widely that functionality is used? I've not personally seen it in the wild.

@ryan-williams
Copy link
Contributor Author

Nevermind, seems like it probably works ok, albeit due to what looks like luck.

The first call to addIndexedSplits, with i == 0, will add FileVirtualSplits corresponding to all FileSplits but the last one (i.e. indices [0, splits.size())) and return its splitsEnd (splits.size() - 1).

A second call to addIndexedSplits, with i == splits.size() - 1 will then add a FileVirtualSplit for the last FileSplit.

So each FileSplit should get processed exactly once, though the relationship between getSplits and addIndexedSplits is not what it should be / what is documented.

On that note, however, I don't see any reason why calls to addIndexedSplits should be chunked per-underlying-FileSplit.getPath in the first place, as that comment implies they are…

🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant