Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the problem with performance, implementation or usage? #641

Open
TheLudlows opened this issue Mar 11, 2024 · 6 comments
Open

What's the problem with performance, implementation or usage? #641

TheLudlows opened this issue Mar 11, 2024 · 6 comments

Comments

@TheLudlows
Copy link

https://github.com/bytedance/monoio/blob/master/docs/en/benchmark.md

As described in the test.

bench

@glommer
Copy link
Collaborator

glommer commented Mar 11, 2024

Likely implementation.
For example, I have made the explicit decision of keeping a hash of the completion entries instead of just casting their address as unsafe.

That's a decision I don't regret, since early io_uring code was full of issues that essentially led to wrong addresses being added there, completions disappearing, etc. I am sure it's less of an issue now, but I have never done the work to methodically go chase performance issues.

@TheLudlows
Copy link
Author

It looks like it's almost 30% worse,is any way to improve it?

@vlovich
Copy link
Contributor

vlovich commented Mar 13, 2024

Discussed previously in #554. Probably just needs someone to run things under a profiler to figure it out. I know that monoio relies on nightly features (e.g. fast thread local) and that could also be contributing to the performance difference (it doesn't require it anymore but the benchmarks are run against nightly with that feature on).

Would be interesting to hear from @ihciah if he can give a high level guess if he intentionally did something differently with monoio to get higher perf.

@bryandmc
Copy link
Collaborator

Their implementation also uses fastpoll which is much better than what we do which is poll+read if I am remembering all this correctly. Honestly the way to resolve this now is probably to re-do sockets with all the new io_uring features that have become available.. Of which I would include buffer select, buffer rings, fastpoll, and zc where possible. Also worth checking the Semaphore implementation as they mention in #554.

@vlovich
Copy link
Contributor

vlovich commented Mar 15, 2024

@bryandmc do you know if this impacts disk I/O performance at all or if this is just issues in the net stack?

@bryandmc
Copy link
Collaborator

@vlovich from an io_uring perspective, we already do most (if not all?) of the things that ensure fast disk reads. The additional performance could be obtained through profiling, etc, but unlike the net stack I don't think there are features we have "left on the table" that we aren't currently using.. Because of that, I would defer to @glommer explanation, which is just that it hasn't been optimized at all. Probably some easy performance wins for anyone with a little time and a profiler..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants