Releases · haesleinhuepf/human-eval-bia · GitHub

04 Jul 14:13

2024.07.04 Latest

Latest

What's Changed

add contributing guide and code of conduct by @haesleinhuepf in #45
Add gpt4o benchmarking results by @haesleinhuepf in #66
Add gemini 1.5 flash benchmarking results by @haesleinhuepf in #67
Add claude 3.5 sonnet benchmarking results by @haesleinhuepf in #69
Text update by @haesleinhuepf in #72

Full Changelog: 2024.04.25...2024.07.04

Contributors

haesleinhuepf

Assets 2

25 Apr 12:23

2024.04.25

What's Changed

Added samplers and samples from 9 recent open source models. by @jkh1 in #62
Figures and text modifications #63

New Contributors

@jkh1 made their first contribution in #62

Full Changelog: 2024.04.19...2024.04.25

Contributors

jkh1

Assets 2

19 Apr 15:14

2024.04.19

This version of the benchmark was submitted as preprint. A link will be added to the readme once it is out.

Most important changes

Document details of our modifications to the HumanEval framework by @haesleinhuepf in #19
copy example images to tempdir by @haesleinhuepf in #16
add documentation how to add requirements by @haesleinhuepf in #41
add dependencies which made some tests fail by @haesleinhuepf in #43
add notebook for detecting missing requirements by @haesleinhuepf in #42
add notebook to summarize common failure reasons by @haesleinhuepf in #51
add notebook that summarizes which libraries were used in generated code by @haesleinhuepf in #54
Rerun benchmark by @haesleinhuepf in #56

Changes to list of changed models

adding gpt-4-turbo-2024-04-09 to tested models by @haesleinhuepf in #18
The mistral models tested via the blablador infrastructure was temporarily removed from the list of tested models due to technical difficulties. See #55 for details

New test-cases

Add read_zarr test, add zarr dependency, add zarr example data by @tischi in #33
Add test for radial intensity profile by @tischi in #40
Add fit_circle test by @tischi in #38
add test-case for tiled image processing by @haesleinhuepf in #27
add test-case binary_skeleton by @haesleinhuepf in #28
add simeple image masking test case by @nscherf in #20
added test-case combine-columns by @haesleinhuepf in #30
add bland-altman test case by @haesleinhuepf in #31
add test to load a nifti image by @nscherf in #50
added test-case for using aicsimageio, example data, requirements by @haesleinhuepf in #48

Other changes

Sample canonical by @haesleinhuepf in #13
Better data visualization by @haesleinhuepf in #25
fix typos in test-case names by @haesleinhuepf in #29
rename read-... test case to open-... so that it fits better to others by @haesleinhuepf in #49
Tex paper by @haesleinhuepf in #46
add seaborn plots by @nscherf in #57
revised main text by @haesleinhuepf in #58

New Contributors

@nscherf made their first contribution in #20
@tischi made their first contribution in #33

Full Changelog: 2024.04.07...2024.04.19

Contributors

tischi, nscherf, and haesleinhuepf

Assets 2

07 Apr 16:31

2024.04.07 Pre-release

Pre-release

What's Changed

Benchmark 47 test-cases, 10 samples, 6 LLMs by @haesleinhuepf in #10

Full Changelog: https://github.com/haesleinhuepf/human-eval-bia/commits/2024.04.07

Contributors

haesleinhuepf

Assets 2