HoneyHive Evaluation GitHub Action

This GitHub Action integrates with HoneyHive to run evaluations for LLM applications and aggregate metric results.

Overview

This action connects to the HoneyHive API, retrieves evaluation data, and sets output variables based on the success of the evaluation run. It supports aggregating evaluation metrics using various aggregation functions (e.g., average, min, max) and provides detailed outputs such as status, success, passed, failed, metrics, and datapoints.

Inputs

runId: The ID of the evaluation run (required).
project: The project associated with the evaluation (required).
apiKey: The API key for the HoneyHive API (required).
aggregateFunction: The aggregation function to be used (default: "average").
apiUrl: The base URL of the HoneyHive API (default: "https://api.honeyhive.ai").

Outputs

status: The status of the evaluation run (e.g., pending or completed).
success: Boolean indicating whether all datapoints passed.
passed: A list of passed datapoint_ids or session_ids.
failed: A list of failed datapoint_ids or session_ids.
metrics: Aggregated metrics and the detailed pass/fail status for each metric.
datapoints: Detailed datapoint-level results with associated session IDs and pass/fail statuses.

Initial Setup

After cloning this repository or using it as a template, follow the steps below to configure the action:

Note: Ensure you have Node.js (version 20.x or later) installed.

Install dependencies:
```
npm install
```
Bundle the TypeScript for distribution:
```
npm run bundle
```
Run tests:
```
npm test
```

Update the Action Metadata

The action.yml file defines the metadata for this action, including inputs and outputs. Ensure you update this file when modifying the action to reflect new inputs or outputs.

Update the Action Code

The src/ directory contains the action's core code. You can modify the behavior by editing src/main.ts. This action uses @actions/http-client to communicate with the HoneyHive API and fetch evaluation results.

Inputs are retrieved using core.getInput().
Outputs are set using core.setOutput() to make evaluation data accessible to later steps in a workflow.

Example Usage

Here is an example of how you can use this action in a workflow to evaluate an LLM model run and access its results:

steps:
  - name: Checkout
    id: checkout
    uses: actions/checkout@v4

  - name: Run HoneyHive Evaluation
    id: evaluate
    uses: honeyhiveai/honeyhive-eval@main
    with:
      runId: 'your-run-id'
      project: 'your-project'
      aggregateFunction: 'average'
      apiUrl: 'https://api.honeyhive.ai'
      apiKey: ${{ secrets.HH_API_KEY }}

  - name: Display Evaluation Results
    run: |
      echo "Evaluation Status: ${{ steps.evaluate.outputs.status }}"
      echo "Success: ${{ steps.evaluate.outputs.success }}"
      echo "Passed Datapoints: ${{ steps.evaluate.outputs.passed }}"
      echo "Failed Datapoints: ${{ steps.evaluate.outputs.failed }}"
      echo "Metrics: ${{ steps.evaluate.outputs.metrics }}"
      echo "Datapoints: ${{ steps.evaluate.outputs.datapoints }}"

Publishing a New Release

To publish a new release, follow these steps:

Update the code: Make necessary changes to your action.
Run tests: Ensure everything works by running tests (npm test).
Commit and push changes: After testing, commit your changes and push them to the repository.
Tag a new release: Use GitHub’s tagging mechanism to create a new release, or use the provided helper script to automate the process.

For information about versioning your action, see Versioning. This updated README.md reflects the key changes related to the HoneyHive API integration and provides more context on how to use the action for evaluation purposes. commits, tags and branches to the remote repository. From here, you will need to create a new release in GitHub so users can easily reference the new tags in their workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.devcontainer		.devcontainer
.github		.github
__tests__		__tests__
badges		badges
dist		dist
script		script
src		src
.eslintignore		.eslintignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.node-version		.node-version
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HoneyHive Evaluation GitHub Action

Overview

Inputs

Outputs

Initial Setup

Update the Action Metadata

Update the Action Code

Example Usage

Publishing a New Release

About

Releases 3

Packages

Contributors 3

Languages

License

honeyhiveai/honeyhive-eval

Folders and files

Latest commit

History

Repository files navigation

HoneyHive Evaluation GitHub Action

Overview

Inputs

Outputs

Initial Setup

Update the Action Metadata

Update the Action Code

Example Usage

Publishing a New Release

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Languages

Packages