Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P99 read error threshold for 150K step in read load predefined test with tablets #9852

Open
2 tasks
juliayakovlev opened this issue Jan 19, 2025 · 2 comments
Open
2 tasks
Assignees

Comments

@juliayakovlev
Copy link
Contributor

Latency of 150K step in read load vnode test is under 0.5 ms every run.

But in tablets test the latency for this step most runs is close to 2 ms. Do we want to change error threshold for this step in the tablets test?

Image

Packages

Scylla version: 2024.3.0~dev-20250103.bff28d159209 with build-id dcfa9cda493b7ac449fa7a955ee7b110a3d10fb6
Kernel Version: 6.8.0-1021-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 3 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

  • perf-regression-predefined-steps-ub-db-node-df9c990a-3 (100.24.106.188 | 10.12.1.136) (shards: 14)
  • perf-regression-predefined-steps-ub-db-node-df9c990a-2 (3.235.59.208 | 10.12.3.10) (shards: 14)
  • perf-regression-predefined-steps-ub-db-node-df9c990a-1 (3.234.249.91 | 10.12.1.215) (shards: 14)

OS / Image: ami-06098414206b3c86c (aws: undefined_region)

Test: scylla-enterprise-perf-regression-predefined-throughput-steps-tablets
Test id: df9c990a-8caf-41fc-8bff-d7a1bbdb468f
Test name: scylla-enterprise/perf-regression/scylla-enterprise-perf-regression-predefined-throughput-steps-tablets
Test method: performance_regression_gradual_grow_throughput.PerformanceRegressionPredefinedStepsTest.test_read_gradual_increase_load
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor df9c990a-8caf-41fc-8bff-d7a1bbdb468f
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs df9c990a-8caf-41fc-8bff-d7a1bbdb468f

Logs:

Jenkins job URL
Argus

@roydahan
Copy link
Contributor

What is the max throughput of both cases?
If it represents almost the same percentage of max throughput (e.g. 150K is ~40% of mx throughput in both cases) , we need to report an issue for the latency.
If the max throughput is significantly lower in tablets case, we need to report an issue for that.

Please note that both the "initial tablets number" issue and using latest c-s or latest driver can affect the tablets case...

@juliayakovlev
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants