-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: SWE-Bench Eval does not work #2150
Comments
Could you provide the full traceback? |
Could you please check out https://github.com/OpenDevin/OpenDevin/blob/main/evaluation/swe_bench/README.md#test-if-your-environment-works? I am pretty sure your config is wrong. |
|
You need to set run_as_devin = false in config.toml |
Is this config loaded? If so, the cache dir would be |
Do you have only one config file? |
yes - I have already pasted the config file. |
|
Btw that one doesn't go to |
I now get:
|
u don't have playwright? #2140 (comment) |
Did the setup, still get the following error:
|
@SmartManoj slightly different question, during SWE bench evaluation I dont want to the agent to browser the internet though. |
Use CodeActSWEAgent #2105 |
where to set it then? Which config file? |
https://github.com/OpenDevin/OpenDevin/blob/main/evaluation/swe_bench/README.md |
So there are two
|
@li-boxuan |
Eventually we will refactor the whole |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
@li-boxuan can this issue be closed? Seems like it was a config issue? |
Is there an existing issue for the same bug?
Describe the bug
I get the following error when I run the
./evaluation/swe_bench/scripts/run_infer.sh [model_config] [agent] [eval_limit]
scriptCurrent OpenDevin version
Installation and Configuration
Model and Agent
CodeAct agent
Operating System
Mac
Reproduction Steps
Run the
./evaluation/swe_bench/scripts/run_infer.sh
scriptLogs, Errors, Screenshots, and Additional Context
No response
The text was updated successfully, but these errors were encountered: