Passing negative-1 `MaxWaitTime` hangs `DataMovementStatusRequest` indefinitely #190

mcfadden8 · 2024-08-01T20:27:48Z

The documentation says: "", but the data movement status request never call never returns.

2024-08-01 13:19:49:780 AXL rzadams1075: @ nnfdm_start:177 nnfdm::CreateRequest(src=/mnt/nnf/3c1bc64d-4355-48fa-898f-4af6c60d04b1-0/martymcf/scr.defjobid/scr.dataset.1/xxxx-0000-00000.silo, dst=/p/lustre1/martymcf/BDH/lustre-scr_sync_storage/1/xxxx00000/xxxx-0000-00000.silo)
2024-08-01 13:19:49:804 AXL rzadams1075: @ nnfdm_start:177 nnfdm::CreateRequest(src=/mnt/nnf/3c1bc64d-4355-48fa-898f-4af6c60d04b1-0/martymcf/scr.defjobid/scr.dataset.1/xxxx00000.root, dst=/p/lustre1/martymcf/BDH/lustre-scr_sync_storage/1/xxxx00000.root)
2024-08-01 13:19:49:820 AXL rzadams1075: @ nnfdm_wait:352 0
2024-08-01 13:19:49:820 AXL rzadams1075: @ nnfdm_stat:65 /mnt/nnf/3c1bc64d-4355-48fa-898f-4af6c60d04b1-0/martymcf/scr.defjobid/scr.dataset.1/xxxx-0000-00000.silo

The same call will work if I pass 1 second and continue to poll for between 5 and 10 seconds.

The text was updated successfully, but these errors were encountered:

bdevcich · 2024-08-02T18:22:39Z

It hangs even when the NnfDataMovement resource in kubernetes shows that it's finished? Can you check that once you make it hang?

This part of the API has always bothered me because I think a good API should always respond as quickly as possible to the client to minimize wait time and also confirm that nothing is wrong. It's like asking someone a question and they never respond.

Is this something that you use a lot?

mcfadden8 · 2024-08-05T15:07:11Z

How do I check that? Do you happen to have a test for this? Under what circumstances does it work?

I was only attempting to use it because the documentation said that I could. I reverted back to polling with a one-second timer. But we have use cases where users just want to wait until the copy is done before proceeding.

bdevcich · 2024-08-05T15:27:27Z

How do I check that? Do you happen to have a test for this? Under what circumstances does it work?

As it's running (and presumably hanging), you can query the NnfDataMovement resource in k8s. You won't be able to do this in your application unless the compute nodes have k8s access, but you could do it from somewhere that does. This is basically what the DataMovementStatusRequest is doing for you:

kubectl get -n <rabbit-hostname> nnfdatamovements <request UID>

So if compute-node-1 was attached to rabbit-node-1 and the DataMovementCreateRequest returned a UID of nnf-dm-node-5vghx, you can do this to query it:

$ kubectl get nnfdatamovement -n rabbit-node-1 nnf-dm-node-5vghx
NAME                STATE      STATUS    ERROR   AGE
nnf-dm-node-5vghx   Finished   Success           4m54s

A MaxWaitTime of -1 is not going to respond until that nnfdatamovement is done. So if it's a large request, it's going to appear to hang since the response won't come until it's finished. I'm hoping that's what happening here. If the nnfdatamovement resource is showing Finished and it's not responding, then we have an issue.

I reverted back to polling with a one-second timer. But we have use cases where users just want to wait until the copy is done before proceeding.

I think this is the best way to do this. It ensures that the server is responding and isn't hung.

github-project-automation bot added this to Issues Dashboard Aug 1, 2024

github-project-automation bot moved this to 📋 Open in Issues Dashboard Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing negative-1 `MaxWaitTime` hangs `DataMovementStatusRequest` indefinitely #190

Passing negative-1 `MaxWaitTime` hangs `DataMovementStatusRequest` indefinitely #190

mcfadden8 commented Aug 1, 2024

bdevcich commented Aug 2, 2024

mcfadden8 commented Aug 5, 2024

bdevcich commented Aug 5, 2024

Passing negative-1 MaxWaitTime hangs DataMovementStatusRequest indefinitely #190

Passing negative-1 MaxWaitTime hangs DataMovementStatusRequest indefinitely #190

Comments

mcfadden8 commented Aug 1, 2024

bdevcich commented Aug 2, 2024

mcfadden8 commented Aug 5, 2024

bdevcich commented Aug 5, 2024

Passing negative-1 `MaxWaitTime` hangs `DataMovementStatusRequest` indefinitely #190

Passing negative-1 `MaxWaitTime` hangs `DataMovementStatusRequest` indefinitely #190