What's the problem with performance, implementation or usage? #641

TheLudlows · 2024-03-11T06:51:42Z

https://github.com/bytedance/monoio/blob/master/docs/en/benchmark.md

As described in the test.

glommer · 2024-03-11T09:24:01Z

Likely implementation.
For example, I have made the explicit decision of keeping a hash of the completion entries instead of just casting their address as unsafe.

That's a decision I don't regret, since early io_uring code was full of issues that essentially led to wrong addresses being added there, completions disappearing, etc. I am sure it's less of an issue now, but I have never done the work to methodically go chase performance issues.

TheLudlows · 2024-03-11T14:52:42Z

It looks like it's almost 30% worse，is any way to improve it?

vlovich · 2024-03-13T23:15:58Z

Discussed previously in #554. Probably just needs someone to run things under a profiler to figure it out. I know that monoio relies on nightly features (e.g. fast thread local) and that could also be contributing to the performance difference (it doesn't require it anymore but the benchmarks are run against nightly with that feature on).

Would be interesting to hear from @ihciah if he can give a high level guess if he intentionally did something differently with monoio to get higher perf.

bryandmc · 2024-03-14T22:59:51Z

Their implementation also uses fastpoll which is much better than what we do which is poll+read if I am remembering all this correctly. Honestly the way to resolve this now is probably to re-do sockets with all the new io_uring features that have become available.. Of which I would include buffer select, buffer rings, fastpoll, and zc where possible. Also worth checking the Semaphore implementation as they mention in #554.

vlovich · 2024-03-15T03:24:45Z

@bryandmc do you know if this impacts disk I/O performance at all or if this is just issues in the net stack?

bryandmc · 2024-03-18T23:09:13Z

@vlovich from an io_uring perspective, we already do most (if not all?) of the things that ensure fast disk reads. The additional performance could be obtained through profiling, etc, but unlike the net stack I don't think there are features we have "left on the table" that we aren't currently using.. Because of that, I would defer to @glommer explanation, which is just that it hasn't been optimized at all. Probably some easy performance wins for anyone with a little time and a profiler..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the problem with performance, implementation or usage? #641

What's the problem with performance, implementation or usage? #641

TheLudlows commented Mar 11, 2024

glommer commented Mar 11, 2024

TheLudlows commented Mar 11, 2024

vlovich commented Mar 13, 2024 •

edited

Loading

bryandmc commented Mar 14, 2024

vlovich commented Mar 15, 2024

bryandmc commented Mar 18, 2024

What's the problem with performance, implementation or usage? #641

What's the problem with performance, implementation or usage? #641

Comments

TheLudlows commented Mar 11, 2024

glommer commented Mar 11, 2024

TheLudlows commented Mar 11, 2024

vlovich commented Mar 13, 2024 • edited Loading

bryandmc commented Mar 14, 2024

vlovich commented Mar 15, 2024

bryandmc commented Mar 18, 2024

vlovich commented Mar 13, 2024 •

edited

Loading