Optimal way to write sparse console output #16208

alabuzhev · 2023-10-21T14:33:08Z

alabuzhev
Oct 21, 2023

TL;DR: an app needs to update various potions of the screen from time to time, typically due to some user actions. To avoid flickering and improve performance, the app uses double buffering, i.e. all writes go to a memory buffer, which is then flushed to the console periodically. The changed areas are not necessarily contiguous or adjacent, so either multiple host calls must be made or extra areas have to be updated, up to and including the whole screen. The question is about finding the perfect balance between the number of host calls and the number of screen cells to update.

Let's say we have this nice default state initially:

And need to draw this (red areas represent some generic colored text the user wants to see):

We can traverse the buffer line by line, find modified blocks and send each of them to the host via WriteConsoleOutput or WriteConsole with some VT seasoning:

The number of such calls, however, will be quite high, 13 in this case and likely up to a hundred in real world scenarios.
We can optimize it by merging the adjacent rectangles:

The number of calls is still relatively high, because perfectly aligned blocks are rare. To decrease it further, we can increase tolerance and include some unmodified cells to merge more, resulting in 5 larger rectangles and 5 host calls:

Going further, we can say "to hell with it, if there are any changes we're updating the whole line", and get something like:

And, ultimately, we can redraw the whole screen in one call if there are any changes anywhere without all this nonsense:

The question is, which way is ultimately cheaper and more preferable - "more calls, less data" or "fewer calls, more data"?

I tried to measure it of course, but the differences are not that pronounced to make a conclusion (except that redrawing the whole screen apparently isn't as bad as one might expect).

Perhaps you have some internal stats, metrics, observations or general guidelines?

P.S. I suppose these days all the initial blocks can be sent in just one WriteConsole call with multiple VT cursor positioning instructions, but for older OSes we still need to support WriteConsoleOutput path that only works with individual rectangles.

lhecker · 2023-10-30T15:39:54Z

lhecker
Oct 30, 2023
Maintainer

big.txt has 6'488'666 characters. Printing it all at once with WriteFile on my Windows 11 machine (build 25983), conhost takes 0.18s (35MB/s). Printing it character-by-character takes ~90s (72kB/s). It should be in the same order of magnitude all the way down to Windows XP, even on worse HW. In any case, this means that the overhead per call is somewhere around 90s/6488666 = 14us and that the overhead per character is around 0.18s/6488666 = 0.028us. I haven't tested WriteConsoleOutput but I suspect that it performs about the same.

In other words, if you know you need to print at least once (~14us), then a second, separate print (~28us total) is only cheaper if it avoids rewriting at least 14us/0.028us = 500 unrelated characters. I don't think dirtying too many cells is a concern, because GDI honestly isn't that slow and realistically most people have a GPU.

If VT support is available however, I'd always use CUP sequences, because they're comparatively very cheap and can draw disjoint regions in a single call.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal way to write sparse console output #16208

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Optimal way to write sparse console output #16208

alabuzhev Oct 21, 2023

Replies: 1 comment

lhecker Oct 30, 2023 Maintainer

alabuzhev
Oct 21, 2023

lhecker
Oct 30, 2023
Maintainer