Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SWE-Bench Eval does not work #2150

Closed
2 tasks done
ramsey-coding opened this issue May 30, 2024 · 22 comments
Closed
2 tasks done

[Bug]: SWE-Bench Eval does not work #2150

ramsey-coding opened this issue May 30, 2024 · 22 comments
Labels
bug Something isn't working severity:low Minor issues or affecting single user Stale Inactive for 30 days

Comments

@ramsey-coding
Copy link

ramsey-coding commented May 30, 2024

Is there an existing issue for the same bug?

Describe the bug

I get the following error when I run the ./evaluation/swe_bench/scripts/run_infer.sh [model_config] [agent] [eval_limit] script

Traceback (most recent call last):
  File "/Users/ramsey/conda/anaconda3/envs/opendevin/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb
AttributeError: 'NoneType' object has no attribute 'util'

Current OpenDevin version

latest

Installation and Configuration

./evaluation/swe_bench/scripts/run_infer.sh eval_bedrock CodeActAgent 5

Model and Agent

CodeAct agent

Operating System

Mac

Reproduction Steps

Run the ./evaluation/swe_bench/scripts/run_infer.sh script

Logs, Errors, Screenshots, and Additional Context

No response

@ramsey-coding ramsey-coding added the bug Something isn't working label May 30, 2024
@SmartManoj
Copy link
Contributor

Could you provide the full traceback?

@li-boxuan
Copy link
Collaborator

Could you please check out https://github.com/OpenDevin/OpenDevin/blob/main/evaluation/swe_bench/README.md#test-if-your-environment-works?

I am pretty sure your config is wrong.

@SmartManoj
Copy link
Contributor

AssertionError: Failed to backup ~/.bashrc: mv: cannot stat '/home/opendevin/.bashrc': No such file or directory

@li-boxuan
Copy link
Collaborator

Could you please check out https://github.com/OpenDevin/OpenDevin/blob/main/evaluation/swe_bench/README.md#test-if-your-environment-works?

I am pretty sure your config is wrong.

You need to set run_as_devin = false in config.toml

@SmartManoj
Copy link
Contributor

SmartManoj commented May 31, 2024

Is this config loaded? If so, the cache dir would be /root/.cache. Any error about the config in the log?

@SmartManoj
Copy link
Contributor

Do you have only one config file?

@ramsey-coding
Copy link
Author

yes - I have already pasted the config file.

@SmartManoj
Copy link
Contributor

export RUN_AS_DEVIN=false and check

@li-boxuan
Copy link
Collaborator

selected_ids = ['sphinx-doc__sphinx-8721', 'sympy__sympy-14774', 'scikit-learn__scikit-learn-10508']

Btw that one doesn't go to config.toml in the root folder. It goes to swe-bench folder.

@ramsey-coding
Copy link
Author

ramsey-coding commented May 31, 2024

I now get:

ERROR:root:<class 'opendevin.runtime.browser.browser_env.BrowserException'>: Failed to start browser environment.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:33<00:00, 33.10s/it]
Exception ignored in: <function _ExecutorManagerThread.__init__.<locals>.weakref_cb at 0x179e34c20>
Traceback (most recent call last):
  File "/Users/ramsey/conda/anaconda3/envs/opendevin/lib/python3.11/concurrent/futures/process.py", line 308, in weakref_cb
AttributeError: 'NoneType' object has no attribute 'util'

@SmartManoj
Copy link
Contributor

SmartManoj commented May 31, 2024

u don't have playwright? #2140 (comment)

@SmartManoj SmartManoj mentioned this issue May 31, 2024
@ramsey-coding
Copy link
Author

ramsey-coding commented May 31, 2024

Did the setup, still get the following error:

opendevin.runtime.browser.browser_env.BrowserException: Failed to start browser environment.
20:18:18 - opendevin:INFO: browser_env.py:53 - Browser env started.

@SmartManoj
Copy link
Contributor

#1993 (comment)

@ramsey-coding
Copy link
Author

@SmartManoj slightly different question, during SWE bench evaluation I dont want to the agent to browser the internet though.

@SmartManoj
Copy link
Contributor

SmartManoj commented May 31, 2024

Use CodeActSWEAgent #2105

@SmartManoj SmartManoj added the severity:low Minor issues or affecting single user label May 31, 2024
@ramsey-coding
Copy link
Author

selected_ids = ['sphinx-doc__sphinx-8721', 'sympy__sympy-14774', 'scikit-learn__scikit-learn-10508']

Btw that one doesn't go to config.toml in the root folder. It goes to swe-bench folder.

where to set it then? Which config file?

@li-boxuan
Copy link
Collaborator

selected_ids = ['sphinx-doc__sphinx-8721', 'sympy__sympy-14774', 'scikit-learn__scikit-learn-10508']

Btw that one doesn't go to config.toml in the root folder. It goes to swe-bench folder.

where to set it then? Which config file?

https://github.com/OpenDevin/OpenDevin/blob/main/evaluation/swe_bench/README.md

@ramsey-coding
Copy link
Author

So there are two config.toml, one in the root folder and one in the ./evaluation/swe_bench/ folder.

  • I need to create then second config.toml in ./evaluation/swe_bench/ folder with only the following content:
selected_ids = ['sphinx-doc__sphinx-8721', 'sympy__sympy-14774', 'scikit-learn__scikit-learn-10508']

@SmartManoj
Copy link
Contributor

SmartManoj commented May 31, 2024

@li-boxuan sub_config.toml name would be better?

@li-boxuan
Copy link
Collaborator

@li-boxuan sub_config.toml name would be better?

Eventually we will refactor the whole evaluation folder.

Copy link
Contributor

github-actions bot commented Jul 1, 2024

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale Inactive for 30 days label Jul 1, 2024
@mamoodi
Copy link
Collaborator

mamoodi commented Jul 6, 2024

@li-boxuan can this issue be closed? Seems like it was a config issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working severity:low Minor issues or affecting single user Stale Inactive for 30 days
Projects
None yet
Development

No branches or pull requests

4 participants