Releases: haesleinhuepf/human-eval-bia
Releases · haesleinhuepf/human-eval-bia
2024.07.04
What's Changed
- add contributing guide and code of conduct by @haesleinhuepf in #45
- Add gpt4o benchmarking results by @haesleinhuepf in #66
- Add gemini 1.5 flash benchmarking results by @haesleinhuepf in #67
- Add claude 3.5 sonnet benchmarking results by @haesleinhuepf in #69
- Text update by @haesleinhuepf in #72
Full Changelog: 2024.04.25...2024.07.04
2024.04.25
2024.04.19
This version of the benchmark was submitted as preprint. A link will be added to the readme once it is out.
Most important changes
- Document details of our modifications to the HumanEval framework by @haesleinhuepf in #19
- copy example images to tempdir by @haesleinhuepf in #16
- add documentation how to add requirements by @haesleinhuepf in #41
- add dependencies which made some tests fail by @haesleinhuepf in #43
- add notebook for detecting missing requirements by @haesleinhuepf in #42
- add notebook to summarize common failure reasons by @haesleinhuepf in #51
- add notebook that summarizes which libraries were used in generated code by @haesleinhuepf in #54
- Rerun benchmark by @haesleinhuepf in #56
Changes to list of changed models
- adding gpt-4-turbo-2024-04-09 to tested models by @haesleinhuepf in #18
- The mistral models tested via the blablador infrastructure was temporarily removed from the list of tested models due to technical difficulties. See #55 for details
New test-cases
- Add read_zarr test, add zarr dependency, add zarr example data by @tischi in #33
- Add test for radial intensity profile by @tischi in #40
- Add fit_circle test by @tischi in #38
- add test-case for tiled image processing by @haesleinhuepf in #27
- add test-case binary_skeleton by @haesleinhuepf in #28
- add simeple image masking test case by @nscherf in #20
- added test-case combine-columns by @haesleinhuepf in #30
- add bland-altman test case by @haesleinhuepf in #31
- add test to load a nifti image by @nscherf in #50
- added test-case for using aicsimageio, example data, requirements by @haesleinhuepf in #48
Other changes
- Sample canonical by @haesleinhuepf in #13
- Better data visualization by @haesleinhuepf in #25
- fix typos in test-case names by @haesleinhuepf in #29
- rename read-... test case to open-... so that it fits better to others by @haesleinhuepf in #49
- Tex paper by @haesleinhuepf in #46
- add seaborn plots by @nscherf in #57
- revised main text by @haesleinhuepf in #58
New Contributors
Full Changelog: 2024.04.07...2024.04.19
2024.04.07
What's Changed
- Benchmark 47 test-cases, 10 samples, 6 LLMs by @haesleinhuepf in #10
Full Changelog: https://github.com/haesleinhuepf/human-eval-bia/commits/2024.04.07