Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Use Blockbuster to detect blocking calls in asyncio during tests #29043

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

cbornet
Copy link
Collaborator

@cbornet cbornet commented Jan 6, 2025

This PR uses the blockbuster library in langchain-core to detect blocking calls made in the asyncio event loop during unit tests.
Avoiding blocking calls is hard as these can be deeply buried in the code or made in 3rd party libraries.
Blockbuster makes it easier to detect them by raising an exception when a call is made to a known blocking function (eg: time.sleep).

Adding blockbuster allowed to find a blocking call in aconfig_with_context (it ends up calling get_function_nonlocals which loads function code).

Dependencies:

  • blockbuster (test)

Twitter handle: cbornet_

Copy link

vercel bot commented Jan 6, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Jan 10, 2025 5:27pm

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 6, 2025
@cbornet cbornet force-pushed the blockbuster branch 2 times, most recently from 0983c3d to 7ce4b18 Compare January 6, 2025 15:10

with tempfile.NamedTemporaryFile(delete=True, suffix=".jpg") as temp_file:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useless to use a real file here.

"""Test invoking nested runnable lambda."""
blockbuster.deactivate()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code makes a sync call from async on purpose...

Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Should we enable/disable for unit tests?

I like that it forces us to separate async tests from sync tests and avoid being lazy, but at the same time there's some changes that aren't technically necessary and seem to unnecessarily complicate the testing code (e.g., updating a file open to a non blocking request)?

@@ -121,7 +121,7 @@ def _config_with_context(
return patch_config(config, configurable=context_funcs)


def aconfig_with_context(
async def aconfig_with_context(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically breaking change, but probably okay looks a lot like a private function to me.

Would you mind adding a comment about why this needs to be async (i.e., is inspect.get_source is making os calls?)

@@ -90,9 +91,11 @@ async def test_inmemory_dump_load(tmp_path: Path) -> None:
output = await store.asimilarity_search("foo", k=1)

test_file = str(tmp_path / "test.json")
store.dump(test_file)
await asyncio.to_thread(store.dump, test_file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a false positive for the test? it doesn't really matter whether this is using blocking or non blocking code here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, with the way blockbuster is configured here (activated before the test, deativated after), the test code itself needs to be non-blocking.
Maybe we could add aload/adump methods to InMemoryVectorStore using aiofile/aiofiles?

@@ -298,17 +299,17 @@ def parent(a: int) -> int:
# Now run the chain and check the resulting posts
cb = [tracer]
if method == "invoke":
res: Any = parent.invoke(1, {"callbacks": cb}) # type: ignore
res: Any = await asyncio.to_thread(parent.invoke, 1, {"callbacks": cb}) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit to doing this? feel like it's complicating the test code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this part to cleanly separate sync and async tests: see c2162c2
With this the asyncio.to_thread calls are not needed.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 9, 2025
@cbornet cbornet force-pushed the blockbuster branch 2 times, most recently from 427bc71 to 2498ab6 Compare January 10, 2025 12:40
asyncio.Event is not thread-safe so it must be created in the asyncio thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

2 participants