Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChainDB: let the BlockFetch client add blocks asynchronously #2721

Closed
wants to merge 2 commits into from

Conversation

mrBliss
Copy link
Contributor

@mrBliss mrBliss commented Nov 2, 2020

Fixes IntersectMBO/ouroboros-consensus#655.

Currently, the effective queue size when adding blocks to the ChainDB is 1 (for
why, see IntersectMBO/ouroboros-consensus#655). In this commit, we let the BlockFetch client add blocks fully
asynchronously to the ChainDB, which restores the effective queue size to the
configured value again, e.g., 10.

The BlockFetch client will no longer wait until the block has been written to
the VolatileDB (and thus also not until the block has been processed by chain
selection). The BlockFetch client can just hand over the block and continue
downloading with minimum delay. To make this possible, we change the behaviour
of getIsFetched and getMaxSlotNo to account for the blocks in the queue,
otherwise the BlockFetch client might try to redownload already-fetched blocks.

This is an alternative to #2489, which let the BlockFetch client write blocks to
the VolatileDB synchronously. The problem with that approach is that multiple
threads are writing to the VolatileDB, instead of a single background thread. We
have relied on the latter to simplify the VolatileDB w.r.t. consistency after
incomplete writes.

@mrBliss mrBliss added the consensus issues related to ouroboros-consensus label Nov 2, 2020
@mrBliss
Copy link
Contributor Author

mrBliss commented Nov 2, 2020

Draft because the impact on bulk sync speed should be measured.

@mrBliss mrBliss force-pushed the mrBliss/add-blocks-fully-async branch 2 times, most recently from 6e98685 to aa1413e Compare November 3, 2020 07:58
return $ \pt ->
case pointToWithOriginRealPoint pt of
Origin -> False
NotOrigin pt' -> checkBlocksToAdd pt' || checkVolDb (realPointHash pt')
Copy link
Contributor Author

@mrBliss mrBliss Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: when a block has been removed but not yet written to the VolatileDB, this will return False. Possible solution: in the background thread, peek the block, process it, and then remove it from the queue. Add a comment that these two are not disjoint.

cc: @edsko.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thing someone is paying attention 🙄

@mrBliss mrBliss force-pushed the mrBliss/add-blocks-fully-async branch from c5b3a28 to 2defb1f Compare November 3, 2020 09:47
Fixes #2487.

Currently, the effective queue size when adding blocks to the ChainDB is 1 (for
why, see #2487). In this commit, we let the BlockFetch client add blocks fully
asynchronously to the ChainDB, which restores the effective queue size to the
configured value again, e.g., 10.

The BlockFetch client will no longer wait until the block has been written to
the VolatileDB (and thus also not until the block has been processed by chain
selection). The BlockFetch client can just hand over the block and continue
downloading with minimum delay. To make this possible, we change the behaviour
of `getIsFetched` and `getMaxSlotNo` to account for the blocks in the queue,
otherwise the BlockFetch client might try to redownload already-fetched blocks.

This is an alternative to #2489, which let the BlockFetch client write blocks to
the VolatileDB synchronously. The problem with that approach is that multiple
threads are writing to the VolatileDB, instead of a single background thread. We
have relied on the latter to simplify the VolatileDB w.r.t. consistency after
incomplete writes.
See the comment on `cdbBlocksToAdd`.
@mrBliss mrBliss force-pushed the mrBliss/add-blocks-fully-async branch from 2defb1f to 5299069 Compare November 3, 2020 09:49
Niols added a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Jul 11, 2024
Niols added a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Jul 11, 2024
facundominguez pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Jul 11, 2024
facundominguez pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Jul 23, 2024
facundominguez added a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Aug 6, 2024
amesgen pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Aug 7, 2024
@amesgen
Copy link
Member

amesgen commented Oct 25, 2024

Routine cleanup of stale PRs targeting Consensus components (which nowadays live in https://github.com/IntersectMBO/ouroboros-consensus).

No work is lost, see https://stackoverflow.com/a/17954767.

@amesgen amesgen closed this Oct 25, 2024
@amesgen amesgen deleted the mrBliss/add-blocks-fully-async branch October 25, 2024 09:05
amesgen added a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Nov 18, 2024
amesgen added a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Dec 7, 2024
amesgen added a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Dec 7, 2024
neilmayhew pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Dec 13, 2024
neilmayhew pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Dec 13, 2024
neilmayhew pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Dec 18, 2024
neilmayhew pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Dec 18, 2024
jasagredo pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Dec 23, 2024
lehins pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Dec 23, 2024
neilmayhew pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Jan 7, 2025
github-merge-queue bot pushed a commit to IntersectMBO/ouroboros-consensus that referenced this pull request Jan 8, 2025
Integrates a new implementation of the BulkSync mode, where blocks are
downloaded from alternative peers as soon as the node has no more blocks
to validate while there are longstanding requests in flight.

This PR depends on the new implementation of the BulkSync mode
(IntersectMBO/ouroboros-network#4919).
`cabal.project` is made to point to a back-port of the BulkSync
implementation on `ouroboros-network-0.16.1.1`.

### CSJ Changes

CSJ is involved because the new BulkSync mode requires to change the
dynamo if it is also serving blocks, and it is not sending them promptly
enough. The dynamo choice has an influence in the blocks that are chosen
to be downloaded by BlockFetch.

For this sake, b93c379 gives the ability to order the ChainSync clients,
so the dynamo role can be rotated among them whenever BlockFetch
requests it.

b1c0bf8 provides the implementation of the rotation operation.

### BlockFetch tests

c4bfa37 allows to specify in tests in which order to start the peers,
which has an effect on what peer is chosen as initial dynamo.

c594c09 in turn adds a new BlockFetch test to show that syncing isn't
slowed down by peers that don't send blocks.

### Integration of BlockFetch changes

The collection of ChainSync client handles now needs to be passed
between BlockFetch and ChainSync so dynamo rotations can be requested by
BlockFetch.

The parameter `bfcMaxConcurrencyBulkSync` has been removed since blocks
are not coordinated to be downloaded concurrently.

These changes are in 6926278.

### ChainSel changes

Now BlockFetch requires the ability to detect if ChainSel has run out of
blocks to validate. This motivates 73187ba, which implements a mechanism
to measure if ChainSel is waiting for more blocks (starves), and
determines for how long.

The above change is not sufficient to measure starvation. The queue to
send blocks for validation used to allow only for one block to sit in
the queue. This would interfere with the ability to measure starvation
since BlockFetch would block waiting for the queue to become empty, and
the queue would quickly become empty after taking just 1 block. For
download delays to be amortized, a larger queue capacity was needed.
This is the reason why a fix similar to
IntersectMBO/ouroboros-network#2721 is part of
0d3fc28.

### Miscellaneous fixes

#### CSJ jump size adjustment

When syncing from mainnet, we discovered that CSJ wouldn't sync the
blocks from the Byron era. This was because the jump size was set to the
length of the genesis window of the Shelley era, which is much larger
than Byron's. When the jump size is larger than the genesis window, the
dynamo will block on the forecast horizon before offering a jump that
allows the chain selection to advance. In this case, CSJ and chain
selection will deadlock.

For this reason we set the default jump size to the size of Byron's
genesis window in 028883a. This didn't show an impact on syncing time in
our measures. Future work (as part of deploying Genesis) might involve
allowing the jump size to vary between different eras.

#### GDD rate limit

GDD evaluation showed an overhead of 10% if run after every header
arrives via ChainSync. Therefore, in b7fa122 we limited how often it
could run, so multiple header arrivals could be handled by a single GDD
evaluation.

#### Candidate fragment comparison in the ChainSync client

We stumbled upon a test case where the candidate fragments of the dynamo
and an objector were no longer than the current selection (both peers
were adversarial). This was problematic because BlockFetch would refuse
to download blocks from these candidates, and ChainSync in turn would
wait for the selection to advance in order to download more headers.

The fix in e27a73c is to have the ChainSync client disconnect a peer
which is about to block on the forecast horizon if its candidate isn't
better than the selection.

#### Candidate fragment truncations

At the moment, it is possible for a candidate fragment to be truncated
by CSJ when a jumper jumps to a point that is not younger than the tip
of its current candidate fragment. We encountered tests where the jump
point could be so old that it would fall behind the immutable tip, and
GDD would ignore the peer when computing the Limit on Eagerness. This in
turn would cause the selection to advance into potentially adversarial
chains.

The fix in dc5f6f7 is to have GDD never drop candidates. When the
candidate does not intersect the current selection, the LoE is not
advanced. This is a situation guaranteed to be unblocked by the
ChainSync client since it will either disconnect the peer or bring the
candidate to intersect with the current selection.
@Niols
Copy link
Collaborator

Niols commented Jan 8, 2025

Also, no work is lost because IntersectMBO/ouroboros-consensus#1179 contains a variant of this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus issues related to ouroboros-consensus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chain DB: addBlock queue ineffective
4 participants