Better error handling during `wait*()` and `Run.init()` #118

kgodlewski · 2025-01-10T15:28:21Z

This PR addresses a problem, where it's impossible to react gracefully and consistently to errors that occur during Run.__init__(), namely run creation.

With our current design it's impossible for the user (the user being our CLI as well mind you) to handle trivial errors like invalid API key passed to Run, if a custom error handler is registered.

This change is on the functional and UX front.
What it does:

return bool from _wait() and friends. True means "everything was processed", False means: "timeout or run is now being closed"
honor the timeout argument almost(*) properly
make Run.close() and Run.terminate() block until the process actually terminates, if called from a different thread that initiated the close operation. This helps with race conditions / cluttering stdout with logs in order which is not meaningful.

(*) almost, because we apply the timeout to a single iteration of the wait loop. A future PR will address that.

src/neptune_scale/api/run.py

PatrykGala

Create a variable starting: True.
The default error callback checks with a lock whether it's True or False. If it's True, it performs a terminate; if it's False, it calls the callback.

It's hacky, but shouldn't be problematic for now

This applies to cases when we close a Run from a different thread. It's better to wait until the worker process terminates, as it makes it easier to print out a final error message in case of an error, instead of having it mixed together with logging from different threads.

In case of an error callback from ProcessLink, push the message to errors queue instead of calling `Run.terminate()` directly. This helps avoiding deadlocks.

Also make `wait*` return `True` if all operations were processed, `False` on timeout

kgodlewski requested a review from PatrykGala January 10, 2025 15:28

PatrykGala reviewed Jan 13, 2025

View reviewed changes

src/neptune_scale/api/run.py Outdated Show resolved Hide resolved

PatrykGala force-pushed the kg/run-error-handling-fixes branch from b6e03cd to 46c08e5 Compare January 14, 2025 10:27

PatrykGala requested changes Jan 14, 2025

View reviewed changes

kgodlewski added 7 commits January 14, 2025 12:42

Silence an exception due to a race condition

902ac13

It's hacky, but shouldn't be problematic for now

Properly return from Run._wait() on timeout

f7c6d79

Don't allow calling ProcessLink.join() from a callback

395b3f8

Always call Run.terminate() from a single thread

570364e

In case of an error callback from ProcessLink, push the message to errors queue instead of calling `Run.terminate()` directly. This helps avoiding deadlocks.

Run.wait() will block if run is being closed while waiting

56f5a8f

Add types.RunCallback

f840365

kgodlewski force-pushed the kg/run-error-handling-fixes branch from 46c08e5 to f19ef1c Compare January 14, 2025 11:42

kgodlewski added 3 commits January 14, 2025 12:45

Handle Run creation errors in __init__.

fe825b9

Also make `wait*` return `True` if all operations were processed, `False` on timeout

Run._wait() also returns during closing when called from the main thread

148c302

Add tests for run initialization and closure

de8d69d

kgodlewski force-pushed the kg/run-error-handling-fixes branch from f19ef1c to de8d69d Compare January 14, 2025 11:45

Add missing license header

1b81b66

kgodlewski requested a review from PatrykGala January 14, 2025 11:48

PatrykGala approved these changes Jan 14, 2025

View reviewed changes

kgodlewski merged commit 879bc65 into main Jan 14, 2025
9 checks passed

kgodlewski deleted the kg/run-error-handling-fixes branch January 14, 2025 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better error handling during `wait*()` and `Run.init()` #118

Better error handling during `wait*()` and `Run.init()` #118

kgodlewski commented Jan 10, 2025

PatrykGala left a comment

Better error handling during wait*() and Run.__init__() #118

Better error handling during wait*() and Run.__init__() #118

Conversation

kgodlewski commented Jan 10, 2025

PatrykGala left a comment

Choose a reason for hiding this comment

Better error handling during `wait*()` and `Run.init()` #118

Better error handling during `wait*()` and `Run.init()` #118