Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comprehensive OpenHands glossary #6310

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions .openhands/microagents/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# OpenHands Glossary

### Agent
The core AI entity in OpenHands that can perform software development tasks by interacting with tools, browsing the web, and modifying code.

#### Agent Controller
A component that manages the agent's lifecycle, handles its state, and coordinates interactions between the agent and various tools.

#### Agent Delegation
The ability of an agent to hand off specific tasks to other specialized agents for better task completion.

#### Agent Hub
A central registry of different agent types and their capabilities, allowing for easy agent selection and instantiation.

#### Agent Skill
A specific capability or function that an agent can perform, such as file manipulation, web browsing, or code editing.

#### Agent State
The current context and status of an agent, including its memory, active tools, and ongoing tasks.

#### CodeAct Agent
A specialized agent type in OpenHands designed specifically for software development tasks and code manipulation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A specialized agent type in OpenHands designed specifically for software development tasks and code manipulation.
[A generalist agent in OpenHands](https://arxiv.org/abs/2407.16741) designed to perform tasks by editing and executing code.


### Browser
A system for web-based interactions and tasks.

#### Browser Environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition isn't filled in yet, if it matters


#### Browser Gym
A testing and evaluation environment for browser-based agent interactions and tasks.

#### Web Browser Tool
A tool that enables agents to interact with web pages and perform web-based tasks.

### Commands
Terminal and execution related functionality.

#### Bash Parser
A component that processes and interprets bash commands and their outputs for agent interaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implies an interaction between Bash Parser, Bash Session, and the Runtime, but it isn't spelled out. Also Bash is one specific shell, we might consider calling this Shell or Console at some point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have tmux integrated, i don't think we have a explicit bash parser anymore, i.e., most things are handled by tmux automatically. Maybe we could remove this to reduce confusion?


#### Bash Session
A persistent terminal session that maintains state and history for bash command execution.

### Configuration
System-wide settings and options.

#### Agent Configuration
Settings that define an agent's behavior, capabilities, and limitations, including available tools and runtime settings.

#### Configuration Options
Settings that control various aspects of OpenHands behavior, including runtime, security, and agent settings.

#### LLM Config
Configuration settings for language models used by agents, including model selection and parameters.

#### LLM Draft Config
Settings for draft mode operations with language models, typically used for faster, lower-quality responses.

#### Runtime Configuration
Settings that define how the runtime environment should be set up and operated.

#### Security Options
Configuration settings that control security features and restrictions.

### Conversation
A sequence of interactions between a user and an agent, including messages, actions, and their results.

#### Conversation Info
Metadata about a conversation, including its status, participants, and timeline.

#### Conversation Manager
A component that handles the creation, storage, and retrieval of conversations.

#### Conversation Metadata
Additional information about conversations, such as tags, timestamps, and related resources.

#### Conversation Status
The current state of a conversation, including whether it's active, completed, or failed.

#### Conversation Store
A storage system for maintaining conversation history and related data.

### Events

#### Event
Every Conversation comprises a series of Events. Each Event is either an Action or an Observation.

#### Event Stream
A continuous flow of events that represents the ongoing activities and interactions in the system.

#### Action
A specific operation or command that an agent executes through available tools, such as running a command or editing a file.

#### Observation
The response or result returned by a tool after an agent's action, providing feedback about the action's outcome.

### Interface
Different ways to interact with OpenHands.

#### CLI Mode
A command-line interface mode for interacting with OpenHands agents without a graphical interface.

#### GUI Mode
A graphical user interface mode for interacting with OpenHands agents through a web interface.

#### Headless Mode
A mode of operation where OpenHands runs without a user interface, suitable for automation and scripting.

### Agent Memory
The system that decides which parts of the Event Stream (i.e. the conversation history) should be passed into each LLM prompt.

#### Memory Store
A storage system for maintaining agent memory and context across sessions.

#### Condenser
A component that processes and summarizes conversation history to maintain context while staying within token limits.

#### Truncation
A very simple Condenser strategy. Reduces conversation history or content to stay within token limits.

### Microagent
A specialized prompt that enhances OpenHands with domain-specific knowledge, repository-specific context, and task-specific workflows.

#### Microagent Registry
A central repository of available microagents and their configurations.

#### Public Microagent
A general-purpose microagent available to all OpenHands users, triggered by specific keywords.

#### Repository Microagent
A type of microagent that provides repository-specific context and guidelines, stored in the `.openhands/microagents/` directory.

### Prompt
Components for managing and processing prompts.

#### Prompt Caching
A system for caching and reusing common prompts to improve performance.

#### Prompt Manager
A component that handles the loading, processing, and management of prompts used by agents, including microagents.

#### Response Parsing
The process of interpreting and structuring responses from language models and tools.

### Runtime
The execution environment where agents perform their tasks, which can be local, remote, or containerized.

#### Action Execution Server
A REST API that receives agent actions (e.g. bash commands, python code, browsing actions), executes them in the runtime environment, and returns the results.

#### Action Execution Client
A component that handles the execution of actions in the runtime environment, managing the communication between the agent and the runtime.

#### Docker Runtime
A containerized runtime environment that provides isolation and reproducibility for agent operations.

#### E2B Runtime
A specialized runtime environment built on E2B for secure and isolated code execution.

#### Local Runtime
A runtime environment that executes on the local machine, suitable for development and testing.
#### Modal Runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Not newlined like the others

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Modal Runtime
#### Modal Runtime

A runtime environment built on Modal for scalable and distributed agent operations.

#### Remote Runtime
A sandboxed environment that executes code and commands remotely, providing isolation and security for agent operations.

#### Runtime Builder
A component that builds a Docker image for the Action Execution Server based on a user-specified base image.

### Security
Security-related components and features.

#### Security Analyzer
A component that checks agent actions for potential security risks.
Loading