Add CodeActSWEAgent to remove browsing & github + improvements on agentskills #2105

xingyaoww · 2024-05-28T09:35:30Z

No description provided.

use minimal prompt for codeact;

yufansong

Most LGTM, left some question.

agenthub/codeact_agent/codeact_agent.py

yufansong · 2024-05-28T17:42:28Z

agenthub/codeact_agent/codeact_agent.py

+def get_in_context_example() -> str:
+    return EXAMPLES
+
+
 class CodeActAgent(Agent):
    VERSION = '1.5'


Whether should we bump the version 1.6? The prompts keep the same, but change some logic.

agenthub/codeact_swe_agent/README.md

tests/unit/test_agent_skill.py

yufansong · 2024-05-28T19:08:15Z

agenthub/codeact_swe_agent/prompt.py

+IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
+"""
+
+SWE_EXAMPLE = """


Are we use SWE_EXAMPLE any where? Just according to this case, it seems spend lots of token on edit file.

Yes. SWE_EXAMPLE is used in the CodeActSWEAgent, which does not use the same ICL for CodeActAgent.

evaluation/swe_bench/run_infer.py

Co-authored-by: Yufan Song <[email protected]>

enyst

This use of SWE agent is interesting and, IMHO, most welcome. We just deprecated the original, a replacement like this seems to me like an even better solution for it.

You may also want to see #2103 CC: @li-boxuan

isavita · 2024-05-28T21:39:07Z

agenthub/codeact_swe_agent/prompt.py

+
+SYSTEM_SUFFIX = """The assistant's response should be concise.
+The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <execute_browse> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
+IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.


Is the <execute_browse> relevant?

Suggested change

IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.

IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> instead of providing it.

Good catch... It is not relevant. But following the same reason above, I have already run a couple of experiments with this prompt - so hoping to keep it as is for now.

How about we start a separate PR to correct these typos, and at the same time bump the CodeActSWEAgent's version from v1.5 to v1.6 for better reproducibility.

isavita · 2024-05-28T21:41:54Z

agenthub/codeact_swe_agent/prompt.py

+"""
+
+SYSTEM_SUFFIX = """The assistant's response should be concise.
+The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <execute_browse> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.


Suggested change

The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <execute_browse> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.

The assistant should include ONLY ONE <execute_ipython> or <execute_bash> in every one of the responses, unless the assistant is finished with the task or needs more input or action from the user in order to proceed.

isavita · 2024-05-28T21:45:49Z

agenthub/codeact_swe_agent/codeact_swe_agent.py

+        return f'{action.thought}\n<execute_bash>\n{action.command}\n</execute_bash>'
+    elif isinstance(action, IPythonRunCellAction):
+        return f'{action.thought}\n<execute_ipython>\n{action.code}\n</execute_ipython>'
+    elif isinstance(action, BrowseInteractiveAction):


If I understand correctly the BrowseInteractiveAction should be remove, is this the case or I simply misunderstood?

In my understanding, this agent, CodeAct-SWE, is supposed to know how to browse and use github. It takes over those responsibilities, 'freeing' CodeAct.

Yep. I think this should be removed. CodeActSWE does not have the ability to browse.

Good catch... It is not relevant. But following the same reason above, I have already run a couple of experiments with this prompt - so hoping to keep it as is for now.

How about we start a separate PR to correct these typos, and at the same time bump the CodeActSWEAgent's version from v1.5 to v1.6 for better reproducibility.

^ For this reason, how about we starts a separate PR once this is merged to correct all these issues and bump the version?

isavita · 2024-05-28T21:57:53Z

agenthub/codeact_swe_agent/codeact_swe_agent.py

+            len(message['content']) for message in messages
+        ) + len(action_str)
+
+        if finish_command := re.search(r'<finish>.*</finish>', action_str, re.DOTALL):


Is this finish block in use? I cannot see instructions for the model to use <finish>...</finish>.

Nope! Let's remove it

li-boxuan · 2024-05-29T05:01:31Z

Integration test captures a regression for CodeActAgent:

UPDATE: I found the culprit. Looking into fixing it...
UPDATE2: Pushed a fix fdce0ce

neubig · 2024-05-29T21:59:16Z

@xingyaoww Looks like this is ready to go once conflicts are cleared.

agenthub/codeact_swe_agent/codeact_swe_agent.py

li-boxuan · 2024-05-30T04:22:27Z

I spoke to Xingyao and we decided to merge it.

xingyaoww added 21 commits May 27, 2024 23:44

update swe_bench prompt;

e9d7889

use minimal prompt for codeact;

upgrade agentskills and update testcases

8ec58d2

update infer prompt

1e58a12

fix cwd

80c0a33

add icl for swebench

4aeb002

also log in_context_example to run infer

4f853e7

remove extra print

2a1cc9a

change prompt to abs path

deef10b

update error message to include current file info

7783c10

change cwd for jupyter if needed

c2a284f

update edit error message

604c8d9

update prompt

851df73

improve git get patch

6e2736f

update hint string

a36f6f5

default to 50 turns

cb23bdb

revert changes from codeact agent and create new CodeActSWEAgent

fa97e57

revert changes to codeact

a98f15a

revert instructions for run infer

a9dc3ce

revert instructions for run infer

a27b0bb

update README

368f0b9

update max iter

e699f21

xingyaoww changed the title ~~In-context Example & Prompt specifically adapted to SWE Agent + improvements on agentskills~~ Add CodeActSWEAgent to remove browsing & github + improvements on agentskills May 28, 2024

xingyaoww marked this pull request as ready for review May 28, 2024 14:23

xingyaoww added 4 commits May 28, 2024 22:25

add codeact swe agent

3eaa6fb

fix issue for CodeActSWEAgent

832a828

allow specifying max iter in cmdline script

95eb048

stop printing

a4af937

yufansong reviewed May 28, 2024

View reviewed changes

enyst reviewed May 28, 2024

View reviewed changes

evaluation/swe_bench/run_infer.py Show resolved Hide resolved

Update agenthub/codeact_swe_agent/README.md

f6a6854

Co-authored-by: Yufan Song <[email protected]>

enyst approved these changes May 28, 2024

View reviewed changes

isavita reviewed May 28, 2024

View reviewed changes

Fix prompt regression in jupyter plugin

fdce0ce

li-boxuan approved these changes May 29, 2024

View reviewed changes

frankxu2004 mentioned this pull request May 29, 2024

Add LightCodeActAgent with simplified prompt and removed browsing #2062

Closed

3 tasks

isavita reviewed May 29, 2024

View reviewed changes

agenthub/codeact_swe_agent/codeact_swe_agent.py Show resolved Hide resolved

Merge remote-tracking branch 'upstream/main' into xw/swe-bench-prompt

9b499e5

li-boxuan merged commit 01ef902 into main May 30, 2024
2 checks passed

li-boxuan deleted the xw/swe-bench-prompt branch May 30, 2024 04:19

li-boxuan mentioned this pull request May 30, 2024

Persistent docker session #1998

Merged

SmartManoj mentioned this pull request May 31, 2024

[Bug]: SWE-Bench Eval does not work #2150

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CodeActSWEAgent to remove browsing & github + improvements on agentskills #2105

Add CodeActSWEAgent to remove browsing & github + improvements on agentskills #2105

xingyaoww commented May 28, 2024

yufansong left a comment

yufansong May 28, 2024

yufansong May 28, 2024

xingyaoww May 28, 2024

enyst left a comment •

edited

Loading

isavita May 28, 2024

xingyaoww May 28, 2024

isavita May 28, 2024

xingyaoww May 28, 2024

isavita May 28, 2024

enyst May 28, 2024

xingyaoww May 28, 2024

isavita May 28, 2024

xingyaoww May 28, 2024

li-boxuan commented May 29, 2024 •

edited

Loading

neubig commented May 29, 2024

li-boxuan commented May 30, 2024

	IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
	IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> instead of providing it.

	The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <execute_browse> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
	The assistant should include ONLY ONE <execute_ipython> or <execute_bash> in every one of the responses, unless the assistant is finished with the task or needs more input or action from the user in order to proceed.

Add CodeActSWEAgent to remove browsing & github + improvements on agentskills #2105

Add CodeActSWEAgent to remove browsing & github + improvements on agentskills #2105

Conversation

xingyaoww commented May 28, 2024

yufansong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enyst left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

li-boxuan commented May 29, 2024 • edited Loading

neubig commented May 29, 2024

li-boxuan commented May 30, 2024

enyst left a comment •

edited

Loading

li-boxuan commented May 29, 2024 •

edited

Loading