Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking feedback: improved PR benchmark workflow #121

Open
mweberxyz opened this issue Feb 25, 2024 · 5 comments
Open

Seeking feedback: improved PR benchmark workflow #121

mweberxyz opened this issue Feb 25, 2024 · 5 comments

Comments

@mweberxyz
Copy link
Contributor

mweberxyz commented Feb 25, 2024

After having spent time looking at the implementation of the PR benchmark workflow and in-tree benchmarks in fastify/point-of-view, fastify/fastify, and fastify/workflows, I've managed to pull it all together. Hopefully the result is more usable with less code duplication, and provides better actionable information to PR reviewers.

I feel like things are in a stable place now, so I'm seeking feedback on if this is going to be valuable to reviewers, my approach, and the changes to each of the three repos needed. Open to any and all feedback before I spent any more time on it.

Demo

fastify

CleanShot 2024-02-25 at 15 38 45@2x

point-of-view

CleanShot 2024-02-25 at 14 32 26@2x

Required changes

fastify/workflows: mweberxyz@9cee011
fastify/fastify: mweberxyz/fastify@053330e
fastify/point-of-view: mweberxyz/point-of-view@07b17c8

Sample PRs

fastify/fastify: PR from fork

mweberxyz/fastify#3

Merges code from a fork into my fork, to demonstrate that the "base" benchmark are run against the target of the PR. Additionally, it shows warnings in the comment because the "head" aka PR branch does not run the parser.js correctly and all requests return 404s.

fastify/point-of-view: PR from same repo with performance degredation

mweberxyz/point-of-view#5

Merges code from a branch into the default branch of the same fork. It reverts a performance improvement, to demonstrate what it looks like when a PR really tanks performance.

Approach

  • Everything needed to run, parse, send comments, and remove the benchmark label is contained in the re-usable workflow
    • Re-usable workflows and custom actions each come with their pros and cons. In the end, I decided to keep the entirety of the logic in a re-usable workflow for ease of maintenance, though I admit the JS-in-YAML is a bit unwieldy.
    • Some benefits are we don't need to pass around GITHUB_TOKEN, we avoid a build step, and it fits in better with the rest of the workflows already defined in this repo
  • Each file in the input benchmarks-dir directory is executed (except any file specified in the input files-to-ignore), then autocannon is run 3 times* for 5 seconds* each against each file, taking the maximum result of mean req/s of the three runs
    • Following conclusion of benchmark runs, a table is sent to the PR as a comment
    • Any results that differ by 5% or more are bolded
  • Autocannon is loaded via npx
  • Autocannon's --on-port is used to spawn benchmark scripts
    • removes the need for the logic currently in fastify/point-of-view/benchmark.js
  • Selection of node versions is moved to the node-versions input
    • the current fastify/workflows workflow uses different versions of Node for benchmarks than is currently implemented in fastify/fastify
  • Static outputs removed, results moved to GHA artifacts
    • in part due to the previous, but also to keep history of these runs over time
  • If any benchmark needs to add autocannon arguments, they can be defined in a comment in the benchmark file itself
    • example: examples/benchmark/parser.js in fastify/fastify

Lessons

  • fix(plugins-benchmark-pr): run comparison benchmark against target #120 still has potentially incorrect logic
    • When commits are added to main after the PR is created, the github.event.pull_request.base.sha is not updated. That is to say, when running the base benchmarks, they always run against the main commit as-of-the-time the PR was created.
    • Fixed in POC by using the github.event.pull_request.base.ref instead

Future work

  • Test error and failure states more extensively
    • Add input-configurable timeout-minutes to the benchmark steps
    • Correctly handle when a PR adds or removes a benchmark
  • Experiment with self-hosted runners as a strategy to reduce run-to-run variance
  • Make benchmark run and benchmark duration input-configurable (see * above in Approach)
  • Factor out logic used in GHA to be usable by developers locally
    • Would be nice as a developer to run npm run benchmark and see the same type of output
@mcollina
Copy link
Member

Maybe let's give it a try in one of the repos!

@mweberxyz
Copy link
Contributor Author

mweberxyz commented Mar 2, 2024

Maybe let's give it a try in one of the repos!

👍 I'll keep pushing forward with a goal of getting a PR up in fastify/fastify, and keep the implementation confined to that repo. If people like the approach, making it generic and re-usable will be work for a later date.

The only remaining annoyance is that the "run workflow when tag is added" trigger within GitHub actions only fires once the workflow has been committed to the default branch, so I'll have to continue testing in and referring to my personal fork.

@Uzlopak
Copy link
Contributor

Uzlopak commented Mar 2, 2024

Just because we are not giving feedback, doesnt mean that we are uninterested.

Its actually more an issue of reproducability. When you think your workflow is perfect give us feedback. We can then check how your code is and that you are not doing malicious stuff 👿 and if it makes sense we will merge it.

@mweberxyz
Copy link
Contributor Author

mweberxyz commented Mar 5, 2024

Thanks for the encouragement! I am still working at various sampling strategies and to try to minimize the run-to-run variance.

@Fdawgs
Copy link
Member

Fdawgs commented Nov 2, 2024

How'd you get on with this @mweberxyz?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants