-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The code doesn't work #11
Comments
Thanks for letting us know about all of this! (Though the opening strikes me as needlessly aggressive.) Could you let us know what kind of experiments/models you're planning to run? If you're just trying to evaluate a system on GLUE, you should just use the jiant codebase (as we say in big letters in the readme). That's where our ongoing, supported work on this project lives. This codebase only exists as an archive to allow people to reproduce our exact baseline numbers if they need to (it's basically an old internal draft of jiant). We will try to fix the clashes and broken links, though. |
Hello Prof. Bowman, I apologize if my comment came out as aggressive, it was more of a deep sigh of resignation at my attempt to reproduce the baseline than an accusatory comment. I understand how it will come out to appear aggressive though and apologize on behalf of sleep-deprived me writing this comment at 5 in the morning. I am just planning to reproduce the baseline experiments. My set of experiments involve evaluating some word embeddings and for this task, the diagnostic test proposed by GLUE looked well suited. I read that comment about using jiant repo but all I was trying to do was swap out GloVe for something else and thought that running this repo might be simpler than running code from jiant repo. I also understand that it is difficult to maintain a repo, especially in academic setting considering we don't have an army of engineers to support this effort but my comments stand. I will request you to kindly edit the Glue Benchmark site to point to jiant repo for running the baseline and as a main repo for the running benchmarks as the support effort is directed there. Otherwise, a few people at least will give up on using this extremely useful benchmark in their experiments due to their inability to reproduce the experiments. Also, I will suggest adding a deprecated warning on the repo like https://github.com/knowitall/openie so that it is clear that we ought to use jiant repo directly. The current Readme indicates if you don't plan to substantially change code/models in the baseline, this repo should suffice. This is not the experience I had with the repo. Finally, I also add a couple of issues I missed out on in my last comment.
Once again apologies for my comment coming out as aggressive. If you like, I can help fix some or most of these issues via a pull request after the ACL deadline. Regards, |
Thanks for the note! If you're far enough along that you're definitely going to try to use this code, PRs are welcome. We may beat you to it, but with the ACL deadline, it's not that likely. Otherwise, though, jiant is still a work in progress, but it supports all the use cases that this repo does, and it's better documented and maintained. (CoVe is a conditional import there, IIRC, and the download script should be up to date.) |
@kushalarora thank you for the information about how to fix the urls. works perfectly now :) for anyone else, if you get this error message:
... then open up download_glue_data.py, meander down to lines 45 and 46, and update them as per @kushalarora 's urls, in his first post. Then you will see a healthier
:) |
@kushalarora Thank you! |
@kushalarora thanks for your sharing, helps a lot |
1 similar comment
@kushalarora thanks for your sharing, helps a lot |
The code in this repository is broken with multiple issues.
First the code has hard coded paths, this is unprofessional and I expected better from such a reputed lab especially with instutions like NYU, DeepMind and UW involved.
The path for downloading MRPC dataset from SentEval is broken. They seemed to have moved their data to a different URIs, namely
MRPC_TRAIN = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt'
MRPC_TEST = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt'
The command to run baseline is broken and it needs --eval_tasks to be passed else the code breaks as empty string is passed to task definition and a check their doesn't find the empty string in supported tasks.
Then half the code is migrated to QNLIV2 but dataset download part still download QNLI (V1?) hence the code breaks there.
Once I got passed this error, I encountered the following error. tr_generator = iterator(task.train_data, num_epochs=None, cuda_device=self._cuda_device)
Finally, the following error broke my spirits and I decided not to use GLUE benchmark for my experiments as despite importing the conda env with the package and having spent 3-4 hours getting the basic command from README to run, I just gave up as I am bit skeptical now about multiple hidden traps I might have to encounter fixing the code to get GLUE benchmark to run.
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
In case, there is a commit or version that I can run out of the box, please let me know. It will be a big help.
The text was updated successfully, but these errors were encountered: