[LibriSpeech] Fix dev split local_extracted_archive for 'all' config #4904
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We define the keys for the
_DL_URLS
of the dev split asdev.clean
anddev.other
:datasets/datasets/librispeech_asr/librispeech_asr.py
Lines 60 to 61 in 2e7142a
These keys get forwarded to the
dl_manager
and thus thelocal_extracted_archive
.However, when calling
SplitGenerator
for the dev sets, we query thelocal_extracted_archive
keysvalidation.clean
andvalidation.other
:datasets/datasets/librispeech_asr/librispeech_asr.py
Line 212 in 2e7142a
datasets/datasets/librispeech_asr/librispeech_asr.py
Line 219 in 2e7142a
The consequence of this is that the
local_extracted_archive
arg passed to_generate_examples
is alwaysNone
, as the keysvalidation.clean
andvalidation.other
do not exists in thelocal_extracted_archive
.When defining the
audio_file
in_generate_examples
, sincelocal_extracted_archive
is alwaysNone
, we always omit thelocal_extracted_archive
path from theaudio_file
path, even if in non-streaming mode:datasets/datasets/librispeech_asr/librispeech_asr.py
Lines 259 to 263 in 2e7142a
Thus,
audio_file
will only ever be the streaming path (audio_file
, notos.path.join(local_extracted_archive, audio_file)
).This PR fixes the
.get()
keys for thelocal_extracted_archive
for the dev splits.