Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ledger): fuzz testing #452

Merged
merged 28 commits into from
Jan 15, 2025
Merged

fix(ledger): fuzz testing #452

merged 28 commits into from
Jan 15, 2025

Conversation

dadepo
Copy link
Contributor

@dadepo dadepo commented Dec 18, 2024

No description provided.

@dadepo
Copy link
Contributor Author

dadepo commented Dec 19, 2024

$ zig build fuzz -- ledger_rocksdb 10 1_000_000 > output.txt 2>&1
$ grep -c 'Put actions' output.txt
890
$ grep -c 'Delete actions' output.txt
889
$ grep -c 'Delete Files in Range actions' output.txt
925
$ grep -c 'getBytes actions' output.txt
10
$ grep -c 'get actions' output.txt
11
$ grep -c 'count actions' output.txt
997
$ grep -c 'contains actions' output.txt
11
$ grep -c 'Batch actions' output.txt
904

get, getBytes and contains are order of magnitude slower. Need to investigate

Update: Only running the get and getBytes leads to

$ grep -c 'get actions' output.txt
988
$ grep -c 'getBytes actions' output.txt
978

Update: This is related to try batch.deleteRange(cf1, start, end);. It seems deleteRange call negatively impacts the get calls. This would be investigated in a follow up PR.

@dadepo dadepo marked this pull request as ready for review December 20, 2024 17:13
@dadepo dadepo force-pushed the dade/ledger-fuzz-test branch from bcf058d to 06644b4 Compare December 20, 2024 17:20
@dadepo dadepo requested a review from dnut December 20, 2024 17:40
@dnut dnut linked an issue Jan 2, 2025 that may be closed by this pull request
Copy link
Contributor

@0xNineteen 0xNineteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing thats great about this is that it performs many actions in parallel and if something errors it will break and catch it - however, theres no consistency check that the values which are read, are what they should be - which i think we care about - since we have multiple database backends, i think this should be simple to do: perform the same operations and after each 'step' ensure they are returning equivalent values -- the accountsdb fuzzer does this which could be a reference

src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
Copy link
Contributor

@dnut dnut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally a fuzzer would be deterministic and reproducible. When a fuzz test fails you should be able to use the same seed to trigger the same error and debug what went wrong. But in this case, the multithreaded behavior is random and not reproducible by using the same seed. I do see the value of identifying issues with concurrency though, and I'm not sure how to test that in a reproducible way. Any thoughts?

Also I noticed stdout is flooded very fast with a ton of text when running these fuzz tests. It might be good to slow that down a bit. Normally, for fuzz tests, I would use stdout to print one line for each discrete test case that can be individually reproduced using a particular seed that is included on that line. Something on the order of 1 log message per second seems reasonable.

This test only deals with the database implementation which is a pretty small part of the ledger. In the future we should write some fuzz tests for the ShredInserter and BlockstoreReader, which is where I would expect to see more problems than RocksDB for example.

src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Show resolved Hide resolved
@dadepo dadepo requested review from 0xNineteen and dnut January 10, 2025 14:48
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
src/ledger/fuzz.zig Outdated Show resolved Hide resolved
0xNineteen
0xNineteen previously approved these changes Jan 13, 2025
Copy link
Contributor

@0xNineteen 0xNineteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm :shipit:

@dadepo
Copy link
Contributor Author

dadepo commented Jan 14, 2025

Ideally a fuzzer would be deterministic and reproducible. When a fuzz test fails you should be able to use the same seed to trigger the same error and debug what went wrong. But in this case, the multithreaded behavior is random and not reproducible by using the same seed. I do see the value of identifying issues with concurrency though, and I'm not sure how to test that in a reproducible way. Any thoughts?

I think randomizing the order the methods are called is a good middle ground. It achieves the same objective behind going with the multithreading.

Also I noticed stdout is flooded very fast with a ton of text when running these fuzz tests. It might be good to slow that down a bit. Normally, for fuzz tests, I would use stdout to print one line for each discrete test case that can be individually reproduced using a particular seed that is included on that line. Something on the order of 1 log message per second seems reasonable.

This should be addressed now. I got the idea for logging every 1000 count from another fuzzer in the codebase. But I've dropped it now.

This test only deals with the database implementation which is a pretty small part of the ledger. In the future we should write some fuzz tests for the ShredInserter and BlockstoreReader, which is where I would expect to see more problems than RocksDB for example.

That is indeed true and that was partially intentionally, since the mis-compilation issue was reproducible with only the ledger implementation. But I do agree other parts of the ledger should be included...and as you suggested that can be done in follow up PRs.

@dnut
Copy link
Contributor

dnut commented Jan 14, 2025

Also I noticed stdout is flooded very fast with a ton of text when running these fuzz tests. It might be good to slow that down a bit. Normally, for fuzz tests, I would use stdout to print one line for each discrete test case that can be individually reproduced using a particular seed that is included on that line. Something on the order of 1 log message per second seems reasonable.

This should be addressed now. I got the idea for logging every 1000 count from another fuzzer in the codebase. But I've dropped it now.

You've removed all printing. I was suggesting that there should be discrete test cases that are actually printed.

Here's the problem I'm trying to address. I just ran the fuzzer for a long time. Maybe 20 minutes. The fuzzer eventually crashed due to an error in a test. I'd like to reproduce this, but the only way to do that is by running the fuzzer again for 20+ minutes. Identifying the cause of this failure will not be easy.

If instead the fuzzer ran many distinct and isolated test cases, and it printed the seed for each case, it would be easy to rerun a single one of those cases when they fail. You can look at the allocator fuzzer to see what I mean.

@dadepo
Copy link
Contributor Author

dadepo commented Jan 15, 2025

If instead the fuzzer ran many distinct and isolated test cases, and it printed the seed for each case, it would be easy to rerun a single one of those cases when they fail. You can look at the allocator fuzzer to see what I mean.

The allocator fuzzer seems to have a different structure. It allows fuzzing two different implementations (disk or batch) or both and provides ability to select which to run.

This is different from the structure for the ledger fuzzing as it fuzzes various methods on the same ledger implementation. And the pattern used for it is similar to what is used for the accounts db, and gossip fuzzer.

Here's the problem I'm trying to address. I just ran the fuzzer for a long time. Maybe 20 minutes. The fuzzer eventually crashed due to an error in a test. I'd like to reproduce this, but the only way to do that is by running the fuzzer again for 20+ minutes. Identifying the cause of this failure will not be easy.

I am not sure if it is possible to shortcut this, as it may be that it requires that much time for the random interaction of the methods being fuzzed to trigger the issue.

@dadepo
Copy link
Contributor Author

dadepo commented Jan 15, 2025

I got the idea for logging every 1000 count from another fuzzer in the codebase. But I've dropped it now.

Checked again. This is was from gossip_fuzz_service.

@dnut
Copy link
Contributor

dnut commented Jan 15, 2025

The allocator fuzzer seems to have a different structure. It allows fuzzing two different implementations (disk or batch) or both and provides ability to select which to run.

This is not the relevant part of the fuzzer that I'm talking about. You can say the same thing about the ledger fuzzer allowing multiple databases. It just happens to deal with that abstraction at comptime whereas the allocator fuzzer does it at runtime. But this is a separate concern.

This is different from the structure for the ledger fuzzing as it fuzzes various methods on the same ledger implementation. And the pattern used for it is similar to what is used for the accounts db, and gossip fuzzer.

This is exactly the same pattern followed by the allocator fuzzer. But that's also not relevant. You can structure any fuzzing approach as a sequence of discrete test cases.

Here's the problem I'm trying to address. I just ran the fuzzer for a long time. Maybe 20 minutes. The fuzzer eventually crashed due to an error in a test. I'd like to reproduce this, but the only way to do that is by running the fuzzer again for 20+ minutes. Identifying the cause of this failure will not be easy.

I am not sure if it is possible to shortcut this, as it may be that it requires that much time for the random interaction of the methods being fuzzed to trigger the issue.

I agree it's not going to be possible to shortcut that specific scenario. The idea is to create different scenarios... scenarios that are easily reproducible


Currently you have something like this:

const random = rng(seed).random();
while (true) {
     runIteration(random);
}

instead it could look like something like this:

while (true) {
     seed += 1;
     print("using seed: {}", .{seed});
     const random = rng(seed).random();
     for (0..test_case_size) |_| {
          runIteration(random);
     }
}

@dadepo
Copy link
Contributor Author

dadepo commented Jan 15, 2025

instead it could look like something like this:

while (true) {
     seed += 1;
     print("using seed: {}", .{seed});
     const random = rng(seed).random();
     for (0..test_case_size) |_| {
          runIteration(random);
     }
}

I am not sure I follow the intended benefit here. Given that the number of runs depends on test_case_size what this modification does is to add 1 to the provided seed and used that to run the iteration.

If the intention is to get to the seed that caused the issue faster, I am also not sure if the proposal solves the original problem statement. Given that if an error is triggered at seed:10, that does not make it easier to be reproduced using that seed:10 directly. This is because the fuzzer is stateful, and the error case that appeared at seed:10 might be because of the 9 previous seed runs.

Also to be sure I am on the same page with the proposed improvement, is the current ledger fuzzer implementation similarly in structure to exiting fuzzer like accounts_db and gossip? and is the proposed modification something that would also need to be applied to existing fuzzer? If that is the case, what about discussing the proposed improvement separately (as I am still not sure I follow how it solves the identified problem) - and then apply to all the other fuzzers if it improves things?

@dnut
Copy link
Contributor

dnut commented Jan 15, 2025

It solves the problem if you reset the state for each test case. I was just giving a basic pseudocode example to express the general idea. I wasn't trying to write the code that solves the problem in its entirety. At that point I think it's more efficient to just make the change in the code, so I committed it already.

Regarding gossip and accountsdb, I do think those fuzzers are flawed and need to be reworked, but that's not in scope right now. I just think we should not be adding more fuzzers that have the same flaw.

dnut
dnut previously approved these changes Jan 15, 2025
@dnut dnut dismissed their stale review January 15, 2025 15:35

didn't realize old feedback was still not addressed

@dnut dnut enabled auto-merge (squash) January 15, 2025 15:43
@dadepo
Copy link
Contributor Author

dadepo commented Jan 15, 2025

@dnut I think the recent changes might not be working as intended.

Running

zig build fuzz -Dblockstore=rocksdb -- ledger 10 10 -freference-traced

continually prints:

action_name reached max actions: 10
action_name reached max actions: 10
action_name reached max actions: 10

Until I kill the process.

Also the action_name bit is not providing the intended information on which action is being executed.

@dnut dnut merged commit fbc554e into main Jan 15, 2025
8 checks passed
@dnut dnut deleted the dade/ledger-fuzz-test branch January 15, 2025 15:57
@dnut
Copy link
Contributor

dnut commented Jan 15, 2025

My mistake. I thought you added this to CI. I'll open a pr to add it to CI and fix the loop issue.

// the method calls being fuzzed return expected data.
var data_map = std.AutoHashMap(u32, Data).init(allocator);
defer data_map.deinit();
for (0..1_000) |_| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnut The benefit of this change is that the seed used is not just the one passed in. The observation though is that, this fixes the iteration for each seed at 1000. - which may seem arbitrary.

It would probably be an improvement to have this configurable (which will differ from the other fuzzer) or

Do not change the seed value like this put have a random value generated by CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

test(ledger): add fuzz tests
3 participants