WIP/RFC: Add detailed reporting for debugging generation #6

hgoldstein95 · 2024-11-20T16:31:06Z

This PR adds preliminary support for more detailed reporting of plausible testing success, via a tool called Tyche. Tyche is available as a VSCode extension or as a standalone application in the browser, and it allows developers to visualize the distribution of data used to test their code. Currently Tyche is supported in Haskell's QuickCheck, Python's Hypothesis, and Rocq's QuickChick, among other languages. You can read our paper about Tyche for more information.

You can try out this PR by downloading the Tyche extension in VSCode, adding "tyche.observationGlobs": ["**/.lean/observations/*.jsonl"] to your VSCode configuration, and then running the plausible test in the test/Tyche.lean file. You should see a new interface pop up in your sidebar, giving visual feedback about duplicate/given up tests.

Currently this PR adds the bare minimum, but I'd love some comments from more experienced Lean developers on how to improve the integration and add advanced features. In particular, I have a few changes I'd like to make:

Tyche supports plotting "features" of the distribution. These features are projections from generated data to numerical or categorical values that can be displayed in a bar chart. For example, if the user is testing a theorem quantified over a list, they may want to plot the lengths of those lists. In a perfect world, the user could write something like:
```
theorem list_reverse_reverse : ∀ (xs : List Nat), xs.reverse.reverse = xs := by
    plausible (config := { detailedReportingWithName := "list_reverse_reverse",
                           features := fun xs => xs.length })
```
but at the moment I don't see how to actually do that.
Right now I'm letting the user pass the name of the theorem into the system manually, but I'd like to just look up the name of the theorem. I'm hoping this is just possible through the tactic system, but I'm not sure how to do it.
Printing the Tyche report to a file every single time plausible runs produces a really large amount of data. Ideally I'd like users to be able to generate the report as-needed. Any thoughts on how to make that possible?

Let me know what you think! We've gotten really good feedback about how useful Tyche is for helping developers understand how confident they should be in their tests, and I think it'd be a great addition to plausible.

WIP: Add detailed reporting for debugging generation

8482ce7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP/RFC: Add detailed reporting for debugging generation #6

WIP/RFC: Add detailed reporting for debugging generation #6

hgoldstein95 commented Nov 20, 2024 •

edited

Loading

WIP/RFC: Add detailed reporting for debugging generation #6

Are you sure you want to change the base?

WIP/RFC: Add detailed reporting for debugging generation #6

Conversation

hgoldstein95 commented Nov 20, 2024 • edited Loading

hgoldstein95 commented Nov 20, 2024 •

edited

Loading