Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to mean time, when for allocations #340

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

PallHaraldsson
Copy link

No description provided.

@PallHaraldsson
Copy link
Author

Note, idea take from @chriselrod's PR:

JuliaLang/julia#51812

# macros are too awkward to work with, so we use a function
# mean times are much better for benchmarking than minimum
# whenever you have a function that allocates
function bmean(f)
  b = @benchmark $f()
  BMean(b)
end

@PallHaraldsson PallHaraldsson marked this pull request as draft October 22, 2023 17:16
@chriselrod
Copy link

chriselrod commented Oct 22, 2023

In my opinion, when you're comparing codes with different amounts of allocations, mean is the most appropriate. GC produces a skewed distribution, and median misses those heavy tails.

If micro-optimzing a kernel, you should ideally have non-allocating code.

But many of us use BenchmarkTools for non-microbenchmarks.

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Oct 22, 2023

Yes, I accidentally used median, not mean. I meant to copy your idea as is. And I'm editing this blind. I've never made a macro or changed (or one that calls another macro...).

At least if people want this change, with mean, it would be good to get it confirmed. Or knowing if not so I can stop looking into this.

I've preferred the median in some cases (though usually the min), but now I'm thinking was I wrong, when would that be best? Is it only shown in @benchmark to notice it's different from the mean?

@PallHaraldsson PallHaraldsson changed the title Default to median time, when for allocations Default to mean time, when for allocations Oct 22, 2023
@PallHaraldsson PallHaraldsson marked this pull request as ready for review October 22, 2023 18:33
@chriselrod
Copy link

chriselrod commented Oct 22, 2023

Yes, I accidentally used median, not mean. I meant to copy your idea as is. And I'm editing this blind. I've never made a macro or changed (or one that calls another macro...).

At least if people want this change, with mean, it would be good to get it confirmed. Or knowing if not so I can stop looking into this.

I've preferred the median in some cases (though usually the min), but now I'm thinking was I wrong, when would that be best? Is it only shown in @benchmark to notice it's different from the mean?

I've encountered resistance to the suggestion before. Many argue minimum is the best, because all noise is positive, slowing you down from the ideal.
However, whenever heap allocations are involved, that's not really correct.

This is especially the case when you have a GC, but it is even true for malloc/free, but to a much smaller degree (free is normally extremely fast, but every now and then can trigger some work to make the freed memory actually available outside of fast reuse caches).

Ignoring this cost, which sporadically results in extra time required, is wrong.

You probably don't want to measure things like page faults or (even worse) the OS deciding to randomly context switch you, which is why using the minimum is common advice.

I had a form of that matrix exp benchmark about a year ago for an important work-load, where I optimized it for Array so that the minimum time matched the minimum time for SArray!
This was great because SArray took like 3 minutes longer to compile.
Unfortunately, the Array version in practice actually took several times longer to run, because while minimums matched, the full multithreaded application choked on GC...

@chriselrod
Copy link

A short summary:

  1. minimum assumes all noise is positive. Wrong when you incur costs that periodically show up as time, as then you can have negative noise as in "paid less of that cost than average".
  2. median minimizes absolute error. The centering parameter of a Laplace/double exponential distribution is the sample median (the logpdf of this distribution sums the absolute value of errors).
  3. mean minimizes square error. The centering parameter of a Gaussian distribution is the sample mean (the logpdf of this distribution sums the squares; this is often intuitive as it minimizes distance, while the central limit theorem explains why this one tends to work so well in practice).

What we care about generally is the expected value of the time.
If we run things a million times, the duration will be around a million times the expected value.

The expected value is generally best estimated by the sample mean, although distributional assumptions can change that (in which case the estimates disagreeing would probably be a reflection of that assumption being bad).
Some distributions don't have means, e.g. the Cauchy distribution is too variable and thus sample means have the same variability as individual samples. But I don't think we have that sort of extreme problem (but it can show up from things as benign as the ratio of two independent zero-mean gaussian variables).

@PallHaraldsson
Copy link
Author

This works now, in case people want to merge this as is. I'm not sure how to increase to code coverage, I guess it would need to test the new (trivial) code path, and I sort of know how, but not exactly in the CI, in case anyone wants to help with that.

@gdalle
Copy link
Collaborator

gdalle commented Oct 23, 2023

Thank you for the contribution!
I'm a maintainer of BT but I really don't have the expertise necessary to debate on benchmarking methods.
However, as far as benchmarking interfaces are concerned, I don't really like the idea of outputting different things depending on what the benchmarked code does. I'll try to see if this has popped up in issues before.

@gdalle
Copy link
Collaborator

gdalle commented Oct 23, 2023

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Oct 23, 2023

I believe it's well-argued that mean should be shown (when and only when allocating), why I did the PR, but I'm also hesitant to show that only, since this is a different number then previously shown and expected by many why I also kept the old one. Then it's clear to users, and also when only that number (correctly(?!)) shown.

I tend to also use @benchmark also but many users would not, or think only one number would do.

Most often there are allocations or not, and when the same amount. I think different number of allocations (because of different code paths) can and should be handled in a separate PR if at all (it's not a very common case?).

Should you maybe merge, to master, and see if people object? It's always possible to revert, even after in stable version...

@gdalle
Copy link
Collaborator

gdalle commented Oct 23, 2023

I just remembered where the other lengthy discussion about this was:

@gdalle
Copy link
Collaborator

gdalle commented Oct 23, 2023

@KristofferC was firmly against the mean in @btime, and therefore asking to name this new mean-focused macro differently.

@KristofferC
Copy link
Contributor

KristofferC commented Oct 24, 2023

BenchmarkTools itself runs the GC explicitly (see gcscrub) so inferring things about the GC time consumption in a run is unlikely to be very useful. It's kind of random if the GC time will be included in the measurement or not. Also, changing the statistics based on whether an allocation happens is surprising and easily missed.

If you want to get the mean, use @benchmark or introduce a @bmean or something.

@chriselrod
Copy link

I do support host adding @bmean.
No need to change how any existing stuff works.

BenchmarkTools itself runs the GC explicitly (see gcscrub)

No idea how successful my solution is, but I do try to disable that:
https://github.com/chriselrod/ElrodConfig.jl/blob/a3f4303bea52489f1162d2a9a443c06fc90c4547/src/ElrodConfig.jl#L76

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Oct 25, 2023

If we step back a bit, then the purpose of benchmarking, timing, isn't usually the timing as a number itself. It's "can I improve it?". Or have I reached the optimal code. Thus you may want to track it over time, but rather I would like a macro that compares two versions of my code and tells me which is faster.

I do support host adding @bmean.
No need to change how any existing stuff works.

I think that's actually worse. It may be helpful to Elrod or other users that think (or know) that the mean is better (those users can use @benchmark), but most would be confused: should I use @btime or @bmean?

I want the best info for @btime, condensed into one line, whatever that may be.

If you have to show only one number I think we all agree the minimum is best, at least when not allocating.

Do we think it would be good to show one more number, and possibly in same format as when not allocating?

@btime sin(1)
  1.877 ns (mean is 5% slower; 0 allocations)

The max is an order of magnitude slower always (ranges from 9x-14x), and at first I dismissed it as useless info (why even show it?), but maybe it isn't, showing speed from cold cache? Is that sometimes useful? At least it gets calculated into the mean. I'm not saying we should show it directly. The mean is seemingly mostly immune to the max as an outlier. The median even more, and I'm not sure we shouldn't be showing that rather than the mean.

Do you like two numbers as timings, or the other as a fraction, as shown? Or do you insist one just one number?

[One other possible PR I'm not going to make would be showing e.g. 0–20 allocations, when there's a range. It's for rather unusual situations anyway, but the logic in this PR assumes 0 or not...]

@PallHaraldsson PallHaraldsson marked this pull request as draft October 25, 2023 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants