-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelism Compensation for CoroutineDispatcher and runBlocking #4132
base: master
Are you sure you want to change the base?
Parallelism Compensation for CoroutineDispatcher and runBlocking #4132
Conversation
…utineScheduler.Worker.currentTask
To return to the problem in #4084 (comment): it doesn't look like we're shielded from similar scenarios by this PR, correct? The specific problem of a thread holding a lock not being able to wake up won't happen, and val dispatcher = Dispatchers.IO.softLimitedParallelism(Runtime.getRuntime().availableProcessors()) // <-- changed
fun doTheThing() {
runBlocking(dispatcher) { // <-- changed
repeat(100) {
launch {
getService() // and do some stuff with it
}
}
}
}
class Service
suspend fun <T> initializeService(serviceClass: Class<T>): T = TODO() // does not interact with the monitor in any way
private val serviceHolder = lazy(mode = LazyThreadSafetyMode.SYNCHRONIZED) {
runBlocking(dispatcher) { // <-- changed
initializeService(Service::class.java)
}
}
fun getService(): Service = serviceHolder.value In general, it is possible that all threads available to us will get stuck in a deadlock because all the new threads that That said, the general idea of having soft limits in addition to hard limits is appealing. It's true that
That said, without a more clear rule as to which dispatchers can extend their limits, which can't, in what conditions it happens and why, this behavior is too irregular and "magical" to include. We have to carefully consider why exactly this works (and whether it can be improved to also cover more aspects of the general case of new threads being allocated to compensate for blocked ones but used for things that don't help resolve blocking), formulate clear and predictable rules, and then implement them. It doesn't mean that the rules have to be conceptually simple, or they won't be implemented, though. Just look at "prompt cancellation" for an example of a complex rule that has to be internalized (and is much tougher to grasp than the comfortable confines of linearizability and atomicity) that we still managed to introduce to the library because this was actually useful. The bottom line is, I think this version is too conceptually fuzzy to include as is, but I think we should give this approach a chance, carefully consider it, and maybe we can make it work. In any case, thanks for this nice idea! |
I agree with you, but would like to share my thoughts on this problem in general. I don't have any formal claims for the following, but it seems plausible to me. Regarding the problem with synchronized, if we set a goal to build an Ultimate Scheduler Without Starvation At All, then we won't be able to achieve it without the full control over the runtime, because we must be able to react to any call that blocks the thread. This way we'll probably end up reinventing Golang (or Loom, which still hasn't fixed the issue with pinning on object monitors as of today). Maybe Kotlin/Native may provide additional control here though. Otherwise, we can only hope to provide a "higher order approximations" of that ideal solution. In a sense, this particular patch widens the space of starvation-free Kotlin programs with coroutines by allowing to use But yes, this patch as it is doesn't look like something that should be in the library and the idea probably can be refined further. Thanks for taking a look on this! |
And I, in turn, agree with you, but want to highlight that, in a library, predictable behavior is as important as good behavior. Tiny changes to user code shouldn't lead to outsized and strange effects. |
This is a follow-up on #4084 and #3983 and an implementation of a different approach.
In #4084 the main idea was to try to preserve the parallelism by transferring the computational permits from
runBlocking
thread and then returning it back. It failed because there are scenarios when we should not let go the permit at all, otherwise we may introduce a deadlock.In this solution we implement a parallelism compensation mechanism: the main difference with the previous patch is that now
runBlocking
does not release the computational permit and instead increases the allowed parallelism limit of the associated coroutine dispatcher for the duration of the park() (i.e., while it has nothing to process). An important detail here is that while it is easy to increase the parallelism limit, it is difficult to decrease it back and at the same time make sure that the effective parallelism stays within the limit. In this solution we allow coroutine dispatchers to exceed the instantaneous parallelism limit with the property that they take care of eventually adjusting the effective parallelism to fit to the limit.runBlocking
basically can increase the parallelism limit, but it can only request to reduce the parallelism limit back.Parallelism compensation may potentially cause an unlimited inflation of the scheduler's thread pool, and in this case we are willing to trade thread starvation for OOMs.
It is easy to notice that parallelism compensation cannot be implemented for
LimitedDispatcher
, because it would break its contract on "hard" parallelism limit. So parallelism compensation is not a general property of coroutine dispatchers, but we can for sure implement it forDispatchers.Default
andDispatchers.IO
. As an alternative to.limitedParallelism
,.softLimitedParallelism
is introduced which supports parallelism compensation.Performance considerations:
park()
is that while the thread ofrunBlocking
is parked, it is not actually using its permit, meaning that its thread is not executing and is not taking up a CPU (of course, it is not exactly so, but should be close enough to reality). So with compensation, the "factual" parallelism is expected to be close to the target parallelism.CoroutineScheduler
works like this: the worker holding a CPU permit checks if it should let go the permit after it completes each CPU task. If there are pending requests for decompensation, it releases its permit.The "hot path" for CPU workers now includes checking the
cpuDecompensationRequests
atomic, which is expected to have non-zero value rarely. It is expected to change rather infrequently and thus should have negligible impact on performance in "good" case when there are norunBlocking
s running on dispatcher threads.runBlocking
-heavy" case: before increasing the parallelism limit it first checks if it can instead remove one decompensation request. If it succeeds, it means that some CPU worker would just continue to run instead of first releasing the permit and then reacquiring it back (not always exactly so, but it is a good example).It is not a goal though to make "
runBlocking
-heavy" case performant, if code for some reason ends up in such situation, it is kinda already can't be (the most) performant.One of the potential issues I see with this approach is that the theoretically unlimited inflation of the coroutine scheduler thread pool caused by
runBlocking
(even if it happens only occasionally) may hit performance of CPU workers due to longer searches in task stealing.I ran the existing benchmarks and didn't see anything suspicious in the results, but I'm not a jmh guru. Probably it requires some additional research.
The main development branch is here, this one I prepared specifically for MR. I don't expect this to be a real MR to be merged, rather just a reference implementation.