P99 read error threshold for 150K step in read load predefined test with tablets #9852

juliayakovlev · 2025-01-19T17:48:38Z

Latency of 150K step in read load vnode test is under 0.5 ms every run.

But in tablets test the latency for this step most runs is close to 2 ms. Do we want to change error threshold for this step in the tablets test?

Packages

Scylla version: 2024.3.0~dev-20250103.bff28d159209 with build-id dcfa9cda493b7ac449fa7a955ee7b110a3d10fb6
Kernel Version: 6.8.0-1021-aws

Issue description

This issue is a regression.
It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 3 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

perf-regression-predefined-steps-ub-db-node-df9c990a-3 (100.24.106.188 | 10.12.1.136) (shards: 14)
perf-regression-predefined-steps-ub-db-node-df9c990a-2 (3.235.59.208 | 10.12.3.10) (shards: 14)
perf-regression-predefined-steps-ub-db-node-df9c990a-1 (3.234.249.91 | 10.12.1.215) (shards: 14)

OS / Image: ami-06098414206b3c86c (aws: undefined_region)

Test: scylla-enterprise-perf-regression-predefined-throughput-steps-tablets
Test id: df9c990a-8caf-41fc-8bff-d7a1bbdb468f
Test name: scylla-enterprise/perf-regression/scylla-enterprise-perf-regression-predefined-throughput-steps-tablets
Test method: performance_regression_gradual_grow_throughput.PerformanceRegressionPredefinedStepsTest.test_read_gradual_increase_load
Test config file(s):

perf-regression-predefined-throughput-steps.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor df9c990a-8caf-41fc-8bff-d7a1bbdb468f
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs df9c990a-8caf-41fc-8bff-d7a1bbdb468f

Logs:

db-cluster-df9c990a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/df9c990a-8caf-41fc-8bff-d7a1bbdb468f/20250105_090336/db-cluster-df9c990a.tar.gz
sct-runner-events-df9c990a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/df9c990a-8caf-41fc-8bff-d7a1bbdb468f/20250105_090336/sct-runner-events-df9c990a.tar.gz
sct-df9c990a.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/df9c990a-8caf-41fc-8bff-d7a1bbdb468f/20250105_090336/sct-df9c990a.log.tar.gz
loader-set-df9c990a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/df9c990a-8caf-41fc-8bff-d7a1bbdb468f/20250105_090336/loader-set-df9c990a.tar.gz
monitor-set-df9c990a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/df9c990a-8caf-41fc-8bff-d7a1bbdb468f/20250105_090336/monitor-set-df9c990a.tar.gz

Jenkins job URL
Argus

The text was updated successfully, but these errors were encountered:

roydahan · 2025-01-20T15:02:56Z

What is the max throughput of both cases?
If it represents almost the same percentage of max throughput (e.g. 150K is ~40% of mx throughput in both cases) , we need to report an issue for the latency.
If the max throughput is significantly lower in tablets case, we need to report an issue for that.

Please note that both the "initial tablets number" issue and using latest c-s or latest driver can affect the tablets case...

juliayakovlev · 2025-01-22T08:05:24Z

https://github.com/scylladb/scylla-enterprise/issues/5174

juliayakovlev assigned fruch and roydahan Jan 19, 2025

github-actions bot assigned juliayakovlev Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P99 read error threshold for 150K step in read load predefined test with tablets #9852

P99 read error threshold for 150K step in read load predefined test with tablets #9852

juliayakovlev commented Jan 19, 2025

Logs:

roydahan commented Jan 20, 2025

juliayakovlev commented Jan 22, 2025

P99 read error threshold for 150K step in read load predefined test with tablets #9852

P99 read error threshold for 150K step in read load predefined test with tablets #9852

Comments

juliayakovlev commented Jan 19, 2025

Packages

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

roydahan commented Jan 20, 2025

juliayakovlev commented Jan 22, 2025