-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataCite timeout: Datasets repeatedly lock on publication for >30min and process fails, second try works #10992
Comments
FWIW: DataCite is reporting Degraded Service at the moment and there have also been times when I've seen timeouts when their status is all green. So - I suspect the issue is real in your case/is a problem at DataCite. We have discussed at times whether we should change how Dataverse responds to that. It seems unwise to publish and not be sure that the PIDs are actually searchable/only failures report in the log, so we've so far stayed with having the publication fail so you can retry later. Work could probably be done to track the failures and retry them later, warn in the UI that not all PIDs are findable yet, etc. I think so far the problem has been rare enough that no one has prioritized anything like this though. |
I'm glad that we're not the only ones who notice this (even with services other than Dataverse that use Datacite). Unfortunately, we often run into this timeout (after checking the checksums of the individual files for a very long time). We have now tried to publish the record in question again. This usually happens very quickly. This time, however, we ran into the timeout again.
status.datacite.org says: "All Systems Operational " |
Just to be on the safe side: How long the validation of the file checksums takes has no influence on the timeout during the registration of the DOI (which presumably only starts afterwards), right? In this case, we have a dataset with just under 100 GB of files. That takes some time. |
@lmaylein yes, I would think these are unrelated. |
Same problem:
And then a few seconds later:
In significantly more than 50 percent of our publications, it only works on the second attempt. |
Unfortunately, this problem also seems to affect the creation of new datasets. I suspect that in this case DOIs are being reserved at Datacite. The timeouts here result in datasets being created twice very frequently because the first attempt apparently doesn't work and the frontend no longer responds :-( |
Hmm, when creating datasets too, huh? It sounds like we need more robust error handling of DataCite outages. |
I'm just wondering whether this isn't a local problem here. As often as it occurs, I would have assumed that more users would be complaining. I assume that quite a few Dataverse installations use Datacite. The strange thing is that publishing or dataset creation is then very fast again at other times. |
I haven't heard of problems like this elsewhere (except in rare cases). Definitely worth asking DataCite whether they can see anything. You might also be able to make the same DataCite API calls Dataverse makes using curl (maybe against their test server or just registering/deleting rather than trying to make the DOI findable/permanent) to see if those also timeout and/or if you can get more information about how long the calls are taking etc. (I looked in the code - if you |
Hey all. I just found this GitHub issue and wanted to mention that I and support staff at Harvard Dataverse sometimes have to unlock datasets that have been locked for a long time, although we haven't looked into why these datasets are locked for so long. In https://github.com/IQSS/dataverse-HDV-Curation/issues/402 I've been reporting the number of these unlocks. |
@jggautier Somehow, I get "error 404" when I try to open the link. I checked for leading/trailing dots in the URL etc., but that doesn't seem to be the issue. |
Ah, sorry about that @kbrueckmann. I didn't realize that the GitHub repository of that GitHub issue is private. In the GitHub issue I've mostly just reported the number of times repository staff need to remove long publication locks. I mentioned it here just to say that these long locks happen regularly for a relatively small number of Harvard Dataverse users, who used to report their stuck datasets more often before I started checking for locks daily. |
FWIW: I was hoping that #10794 in 6.5 would help - closing the one known way of causing a stuck lock. Have you seen a reduction since 6.5? If not/if it is still occurring, it would be useful to track circumstances so we can identify and fix the cause. |
Heya @qqmyers, did you mean a reduction in Harvard Dataverse since 6.5 or in a repository where @kbrueckmann has seen stuck datasets, or both? In case you meant for Harvard Dataverse, I think v6.5 was applied to that repository in early January, so it's too soon to say if there's been a reduction compared to months when it was using earlier versions of Dataverse. I agree about tracking circumstances, although I think there hasn't been the bandwidth for this. |
Harvard w.r.t. stuck locks. I may be misunderstanding, but it doesn't sound like the problem in this issue actually leaves a stuck lock - the log ends in |
What happens?
When trying to publish datasets, we repeatedly experience that they are first locked for an extraordinarily long time (up to 40 minutes) before failing the publication process. However, on the second try, the dataset is published after just a few seconds and without any issues. The error message after the first try shows that the dataset "could not be published due to a failure to register, or update the Global Identifier for the dataset or one of the files in it."
Here is a server log entry from an attempt earlier today:
It seems like it might be connected to a timeout of the DataCite API. Outages etc. as reported here (https://status.datacite.org/) don't seem to be the problem as they appear far less often than our problems with the dataset publications.
What steps does it take to reproduce the issue?
We are unsure if others have experienced the same problems and if the described behavior is reproducible. For us, we simply try to publish a dataset via the publish button. The second try can take place just after the first one failed or later on.
When does this issue occur?
On hitting the publish button on any dataset for the first time. It happens for new datasets as well as for new versions of existing datasets. The datasets involved can have very few files with small sizes (e.g. doi:10.11588/data/NCQJJR -> 5 files with max. 90 KB each and without PID registry for each file) or be larger; we still see similar behaviour.
To whom does it occur (all users, curators, superusers)?
Superusers, curators.
What did you expect to happen?
Ideally, a publication on the first try without delays above a few minutes.
Which version of Dataverse are you using?
v. 6.3 build v6.3+10797_dls
Any related open or closed issues to this bug report?
Possibly #7393
The text was updated successfully, but these errors were encountered: