Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Infinite Loop and Timeout Issue in SWE-Bench Evaluation Due to Context Overflow Handling in OpenHands Framework #6357

Open
1 task done
yanrui27 opened this issue Jan 20, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@yanrui27
Copy link

yanrui27 commented Jan 20, 2025

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Describe the bug and reproduction steps

Description:

While using the OpenHands framework to evaluate SWE-Bench, I encountered an issue where the program enters an infinite loop and eventually times out when the conversation context exceeds the model’s maximum window size.

Problem:

This issue arises from a previously introduced fix that attempts to truncate the agent's history when the context window exceeds its limit. The fix proposed truncating the history roughly in half. However, the problem is that this approach doesn't correctly manage the agent's internal state after truncation, which causes the agent to enter an infinite loop. This loop persists until the program times out.

Steps to Reproduce:

  1. Use the OpenHands framework to evaluate SWE-Bench.
  2. Provide a conversation history that exceeds the model’s context window limit.
  3. The framework attempts to truncate the history as per the proposed fix.
  4. After truncation, the agent enters an infinite loop due to improper state handling, eventually causing a timeout.

Expected Behavior:

  • When the context window is exceeded, the conversation history should be truncated as expected.
  • The agent’s state should be properly reset or adjusted after truncating the history to avoid entering an infinite loop.
  • The program should terminate correctly without triggering a timeout.

Additional Information:

  • This issue is specific to the SWE-Bench evaluation and does not appear in all use cases.
  • The previous fix (which truncates history) does not properly reset the agent's internal state, leading to the infinite loop.
  • A timeout occurs because the system continues looping endlessly without terminating.

Links to Relevant Issues:

Suggested Fix:

  • Ensure that after truncating the conversation history, the agent's internal state is properly reset to prevent the infinite loop.
  • Consider handling agent states more robustly after truncation, such as saving and restoring the state correctly, to avoid this issue.

OpenHands Installation

Development workflow

OpenHands Version

0.20.0

Operating System

Linux

Logs, Errors, Screenshots, and Additional Context

No response

@yanrui27
Copy link
Author

yanrui27 commented Jan 20, 2025

Proposed Solution

To address the issue of infinite loops caused by context window exceeding errors, we suggest adding proper exception handling and state management when exceptions are raised. Specifically, after handling the exception, we need to ensure the agent’s state is properly updated to prevent the loop from continuing.

The following solution outlines the steps:

  1. Catch the exception: After catching the exception, call the await self._react_to_exception(e) method to handle the exception as currently implemented.

Code Example

# Save the ID of the first event in our truncated history for future reloading
if self.state.history:
self.state.start_id = self.state.history[0].id
# Don't add error event - let the agent retry with reduced context
return
raise

                    # Save the ID of the first event in our truncated history for future reloading
                    if self.state.history:
                        self.state.start_id = self.state.history[0].id
                    # Don't add error event - let the agent retry with reduced context

                    await self._react_to_exception(e) # <-- Adding state handling here to prevent infinite loop
                    return
                raise
`

@enyst
Copy link
Collaborator

enyst commented Jan 20, 2025

What exactly of the agent state do you think needs modified when context window error happens? Which attributes?

I'm not sure what, in the agent state, could lead to an infinite loop. Maybe the missing information was relevant? But then resetting state wouldn't fix that.

The proposed solution, to deal with the exception, puts the agent in ERROR state (which is not necessary IMO, and I think would end the swe_bench run right there), and it calls reset, which would zero out the metrics, leading to inaccurate reporting at the end of the run.

Can you tell how exactly did you run, to get this issue? LLM, number of iterations, instance_id, would be useful to replicate. The full run_infer command would be great. Any logs you may have would be useful too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants