Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CodeActSWEAgent to remove browsing & github + improvements on agentskills #2105

Merged
merged 28 commits into from
May 30, 2024

Conversation

xingyaoww
Copy link
Collaborator

No description provided.

@xingyaoww xingyaoww changed the title In-context Example & Prompt specifically adapted to SWE Agent + improvements on agentskills Add CodeActSWEAgent to remove browsing & github + improvements on agentskills May 28, 2024
@xingyaoww xingyaoww marked this pull request as ready for review May 28, 2024 14:23
Copy link
Collaborator

@yufansong yufansong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most LGTM, left some question.

agenthub/codeact_agent/codeact_agent.py Show resolved Hide resolved
def get_in_context_example() -> str:
return EXAMPLES


class CodeActAgent(Agent):
VERSION = '1.5'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether should we bump the version 1.6? The prompts keep the same, but change some logic.

agenthub/codeact_swe_agent/README.md Outdated Show resolved Hide resolved
agenthub/codeact_swe_agent/README.md Show resolved Hide resolved
tests/unit/test_agent_skill.py Show resolved Hide resolved
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
"""

SWE_EXAMPLE = """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we use SWE_EXAMPLE any where? Just according to this case, it seems spend lots of token on edit file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. SWE_EXAMPLE is used in the CodeActSWEAgent, which does not use the same ICL for CodeActAgent.

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of SWE agent is interesting and, IMHO, most welcome. We just deprecated the original, a replacement like this seems to me like an even better solution for it.

You may also want to see #2103 CC: @li-boxuan


SYSTEM_SUFFIX = """The assistant's response should be concise.
The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <execute_browse> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the <execute_browse> relevant?

Suggested change
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> or <execute_browse> instead of providing it.
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> instead of providing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch... It is not relevant. But following the same reason above, I have already run a couple of experiments with this prompt - so hoping to keep it as is for now.

How about we start a separate PR to correct these typos, and at the same time bump the CodeActSWEAgent's version from v1.5 to v1.6 for better reproducibility.

"""

SYSTEM_SUFFIX = """The assistant's response should be concise.
The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <execute_browse> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The assistant should include ONLY ONE <execute_ipython> or <execute_bash> or <execute_browse> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
The assistant should include ONLY ONE <execute_ipython> or <execute_bash> in every one of the responses, unless the assistant is finished with the task or needs more input or action from the user in order to proceed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return f'{action.thought}\n<execute_bash>\n{action.command}\n</execute_bash>'
elif isinstance(action, IPythonRunCellAction):
return f'{action.thought}\n<execute_ipython>\n{action.code}\n</execute_ipython>'
elif isinstance(action, BrowseInteractiveAction):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the BrowseInteractiveAction should be remove, is this the case or I simply misunderstood?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, this agent, CodeAct-SWE, is supposed to know how to browse and use github. It takes over those responsibilities, 'freeing' CodeAct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I think this should be removed. CodeActSWE does not have the ability to browse.

Good catch... It is not relevant. But following the same reason above, I have already run a couple of experiments with this prompt - so hoping to keep it as is for now.

How about we start a separate PR to correct these typos, and at the same time bump the CodeActSWEAgent's version from v1.5 to v1.6 for better reproducibility.

^ For this reason, how about we starts a separate PR once this is merged to correct all these issues and bump the version?

len(message['content']) for message in messages
) + len(action_str)

if finish_command := re.search(r'<finish>.*</finish>', action_str, re.DOTALL):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this finish block in use? I cannot see instructions for the model to use <finish>...</finish>.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Let's remove it

@li-boxuan
Copy link
Collaborator

li-boxuan commented May 29, 2024

Integration test captures a regression for CodeActAgent:

image

UPDATE: I found the culprit. Looking into fixing it...
UPDATE2: Pushed a fix fdce0ce

@neubig
Copy link
Contributor

neubig commented May 29, 2024

@xingyaoww Looks like this is ready to go once conflicts are cleared.

@li-boxuan li-boxuan merged commit 01ef902 into main May 30, 2024
2 checks passed
@li-boxuan li-boxuan deleted the xw/swe-bench-prompt branch May 30, 2024 04:19
@li-boxuan
Copy link
Collaborator

I spoke to Xingyao and we decided to merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants