-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to mean time, when for allocations #340
base: main
Are you sure you want to change the base?
Conversation
Note, idea take from @chriselrod's PR:
|
In my opinion, when you're comparing codes with different amounts of allocations, If micro-optimzing a kernel, you should ideally have non-allocating code. But many of us use |
Yes, I accidentally used median, not mean. I meant to copy your idea as is. And I'm editing this blind. I've never made a macro or changed (or one that calls another macro...). At least if people want this change, with mean, it would be good to get it confirmed. Or knowing if not so I can stop looking into this. I've preferred the median in some cases (though usually the min), but now I'm thinking was I wrong, when would that be best? Is it only shown in |
I've encountered resistance to the suggestion before. Many argue minimum is the best, because all noise is positive, slowing you down from the ideal. This is especially the case when you have a GC, but it is even true for Ignoring this cost, which sporadically results in extra time required, is wrong. You probably don't want to measure things like page faults or (even worse) the OS deciding to randomly context switch you, which is why using the minimum is common advice. I had a form of that matrix exp benchmark about a year ago for an important work-load, where I optimized it for |
A short summary:
What we care about generally is the expected value of the time. The expected value is generally best estimated by the sample mean, although distributional assumptions can change that (in which case the estimates disagreeing would probably be a reflection of that assumption being bad). |
This works now, in case people want to merge this as is. I'm not sure how to increase to code coverage, I guess it would need to test the new (trivial) code path, and I sort of know how, but not exactly in the CI, in case anyone wants to help with that. |
Thank you for the contribution! |
I believe it's well-argued that mean should be shown (when and only when allocating), why I did the PR, but I'm also hesitant to show that only, since this is a different number then previously shown and expected by many why I also kept the old one. Then it's clear to users, and also when only that number (correctly(?!)) shown. I tend to also use Most often there are allocations or not, and when the same amount. I think different number of allocations (because of different code paths) can and should be handled in a separate PR if at all (it's not a very common case?). Should you maybe merge, to master, and see if people object? It's always possible to revert, even after in stable version... |
I just remembered where the other lengthy discussion about this was: |
@KristofferC was firmly against the mean in |
BenchmarkTools itself runs the GC explicitly (see If you want to get the mean, use |
I do support host adding
No idea how successful my solution is, but I do try to disable that: |
If we step back a bit, then the purpose of benchmarking, timing, isn't usually the timing as a number itself. It's "can I improve it?". Or have I reached the optimal code. Thus you may want to track it over time, but rather I would like a macro that compares two versions of my code and tells me which is faster.
I think that's actually worse. It may be helpful to Elrod or other users that think (or know) that the mean is better (those users can use I want the best info for If you have to show only one number I think we all agree the minimum is best, at least when not allocating. Do we think it would be good to show one more number, and possibly in same format as when not allocating?
The max is an order of magnitude slower always (ranges from 9x-14x), and at first I dismissed it as useless info (why even show it?), but maybe it isn't, showing speed from cold cache? Is that sometimes useful? At least it gets calculated into the mean. I'm not saying we should show it directly. The mean is seemingly mostly immune to the max as an outlier. The median even more, and I'm not sure we shouldn't be showing that rather than the mean. Do you like two numbers as timings, or the other as a fraction, as shown? Or do you insist one just one number? [One other possible PR I'm not going to make would be showing e.g. 0–20 allocations, when there's a range. It's for rather unusual situations anyway, but the logic in this PR assumes 0 or not...] |
No description provided.