diff --git a/data/README.md b/data/README.md index 68684a8..be31645 100644 --- a/data/README.md +++ b/data/README.md @@ -2,7 +2,7 @@ The document collection is MS MARCO Passages and has to be stored in `collection.tsv`. Furthermore, for the `doc2query` based approaches, descriptive queries for each document in the collection must be stored in `doc2query.tsv`. -This file can be automatically generated using [this script](scripts/doc2query-t5.py). :warning: May take several days. +This file can be automatically generated using [this script](../scripts/doc2query-t5.py). :warning: May take several days. A MS MARCO document collection has been provided [here](https://gustav1.ux.uis.no/dat640/msmarco-passage.tar.gz). A pre-generated `doc2query.tsv` file has been made available [here](https://drive.google.com/file/d/1vGGGu0eprxG_iUm9Z5xkbsKEwjJoAf_A/view?usp=drive_link).