You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was just looking at this bit of BAMInputFormat; I assumed its goal was to set splitsEnd to the first split >i that has a different path than the split at i, yet it seems like that loop will set splitsEnd to the last such split.
Am I missing something? Thanks!
The text was updated successfully, but these errors were encountered:
Looking at this more, I feel like this is definitely a bug in split-generation in the presence of a .splitting-bai, and possibly a severe one; I'll try to come up with a repro case.
Can anyone comment on how widely that functionality is used? I've not personally seen it in the wild.
Nevermind, seems like it probably works ok, albeit due to what looks like luck.
The first call to addIndexedSplits, with i == 0, will add FileVirtualSplits corresponding to all FileSplits but the last one (i.e. indices [0, splits.size())) and return its splitsEnd (splits.size() - 1).
A second call to addIndexedSplits, with i == splits.size() - 1 will then add a FileVirtualSplit for the last FileSplit.
So each FileSplit should get processed exactly once, though the relationship between getSplits and addIndexedSplits is not what it should be / what is documented.
On that note, however, I don't see any reason why calls to addIndexedSplits should be chunked per-underlying-FileSplit.getPath in the first place, as that comment implies they are…
I was just looking at this bit of BAMInputFormat; I assumed its goal was to set
splitsEnd
to the first split >i
that has a different path than the split ati
, yet it seems like that loop will setsplitsEnd
to the last such split.Am I missing something? Thanks!
The text was updated successfully, but these errors were encountered: