Release 2024.04.19 · haesleinhuepf/human-eval-bia

This version of the benchmark was submitted as preprint. A link will be added to the readme once it is out.

Most important changes

Document details of our modifications to the HumanEval framework by @haesleinhuepf in #19
copy example images to tempdir by @haesleinhuepf in #16
add documentation how to add requirements by @haesleinhuepf in #41
add dependencies which made some tests fail by @haesleinhuepf in #43
add notebook for detecting missing requirements by @haesleinhuepf in #42
add notebook to summarize common failure reasons by @haesleinhuepf in #51
add notebook that summarizes which libraries were used in generated code by @haesleinhuepf in #54
Rerun benchmark by @haesleinhuepf in #56

adding gpt-4-turbo-2024-04-09 to tested models by @haesleinhuepf in #18
The mistral models tested via the blablador infrastructure was temporarily removed from the list of tested models due to technical difficulties. See #55 for details

Add read_zarr test, add zarr dependency, add zarr example data by @tischi in #33
Add test for radial intensity profile by @tischi in #40
Add fit_circle test by @tischi in #38
add test-case for tiled image processing by @haesleinhuepf in #27
add test-case binary_skeleton by @haesleinhuepf in #28
add simeple image masking test case by @nscherf in #20
added test-case combine-columns by @haesleinhuepf in #30
add bland-altman test case by @haesleinhuepf in #31
add test to load a nifti image by @nscherf in #50
added test-case for using aicsimageio, example data, requirements by @haesleinhuepf in #48

Sample canonical by @haesleinhuepf in #13
Better data visualization by @haesleinhuepf in #25
fix typos in test-case names by @haesleinhuepf in #29
rename read-... test case to open-... so that it fits better to others by @haesleinhuepf in #49
Tex paper by @haesleinhuepf in #46
add seaborn plots by @nscherf in #57
revised main text by @haesleinhuepf in #58

Full Changelog: 2024.04.07...2024.04.19