Progress bar rendering through indicatif is slow #10384

konstin · 2025-01-08T09:08:27Z

Currently, the uv resolver thread spends a considerable amount of time in on_progress -> indicatif::style::ProgressStyle::format_state. We should limit rendering progress bar updates to at most 1/60s, ideally directly in indicatif.

Profile for samply record --rate 20000 target/profiling/uv pip compile scripts/requirements/airflow.in:

The text was updated successfully, but these errors were encountered:

jaheba · 2025-01-08T10:48:34Z

To understand the issue: Currently, there is a call to set_message for each resolved package. If there are a lot of packages, we could end up calling set_message more than 60 times a second, slowing down the process. There is no benefit to display each resolved package, since the user can't process that information that quickly.

From what I understand, indicatif already is limited to 20 updates a second by default:

    /// Draw to a buffered stderr terminal at a max of 20 times a second.
    ///
    /// This is the default draw target for progress bars.  For more
    /// information see [`ProgressDrawTarget::term`].
    pub fn stderr() -> Self {
        Self::term(Term::buffered_stderr(), 20)
    }

However, it appears that the rate limiter only applies to progress bar ticks and not changing the text message. A call to .set_message() will always end in a call to .draw().

This would mean that ResolverReporter would need to implement a rate limiter itself.

jaheba · 2025-01-08T13:26:42Z

Actually, I think I was wrong.

There is a call to rate_limiter.allow(now) here: https://github.com/console-rs/indicatif/blob/main/src/draw_target.rs#L171

However, attempting to change the rate here did not show different profiles:

uv/crates/uv/src/printer.rs

Line 20 in 68adadf

Self::Default => ProgressDrawTarget::stderr(),

I've changed this to:

Self::Default => ProgressDrawTarget::stderr_with_hz(1),

and

Self::Default => ProgressDrawTarget::stderr_with_hz(1000),

@konstin Can you help me how to debug the issue? The profiles running your above command have quite some variance.

konstin · 2025-01-08T13:40:43Z

I haven't looked into it in depth so i can't really help with the code.

Usually, i try to optimize by e.g. counting how often the function was called myself or by inserting dbg!s. For making the profiles more stable, you can try setting a different cpu govenor, on linux taskset to pin the process to performance cores and adjusting the samply sampling rate.

jaheba · 2025-01-08T17:00:07Z

I've added a simple counter to invocations of set_message. There are 7384 invocations (packages that are reported), of which 24 are allowed to be sent to stderr. With a default of 20Hz that sounds about right.

If I set the frequency to the highest value (255), I get ~70 allowed events, however lower values also end up around 24 events.

In addition, there is a burst mechanism in indicatif, which I haven't looked at yet.

I think it's fair to say that there is not a lot of potential here by optimising the stderr_with_hz setting.

konstin · 2025-01-08T17:02:05Z

It looks like most of the time is spent in rendering the message, it may be possible to skip or delay the rendering until we actually (this is only a guess from the profile though)

jaheba · 2025-01-09T10:15:51Z

I've run some benchmarks using hyperfine on the file above:

	0.5.15	0.5.16	local	local (noop)
--show-output	318.6 ms	159.3 ms	161.7 ms	148.6 ms
/dev/null	310.3 ms	149.6 ms	153.8 ms	147.7 ms

What's most noticable is that there is a big jump from 0.5.15 to 0.5.16, and the latest release is just twice as fast.

The local-noop version does nothing on on_progress for the progress bar, so that should give us some indication what the overhead is. From this, it appears that there is about a 6ms overhead of using indicatif, even though the output is disabled. There is roughly similar penalty to actually render the output. (At least on my machine)

I've tried setting different values stderr_with_hz, and there is just no difference in the runtime. Higher refresh rates feel much snappier though.

I've also implemented a simple custom rate-limiter, that returns early in on_progress. It takes just shy of 150 ms using 50Hz, and ~151 using 100Hz. I guess we could shave off most of the overhead of indicatif.

konstin added the performance Potential performance improvement label Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress bar rendering through indicatif is slow #10384

Progress bar rendering through indicatif is slow #10384

konstin commented Jan 8, 2025

jaheba commented Jan 8, 2025

jaheba commented Jan 8, 2025

konstin commented Jan 8, 2025

jaheba commented Jan 8, 2025

konstin commented Jan 8, 2025

jaheba commented Jan 9, 2025

Progress bar rendering through indicatif is slow #10384

Progress bar rendering through indicatif is slow #10384

Comments

konstin commented Jan 8, 2025

jaheba commented Jan 8, 2025

jaheba commented Jan 8, 2025

konstin commented Jan 8, 2025

jaheba commented Jan 8, 2025

konstin commented Jan 8, 2025

jaheba commented Jan 9, 2025