Performance issue? Monitoring dashboard

Eric · May 24, 2023, 2:49am

version：4.6.4
os： Ubuntu 16
cluster nodes： 6

used for loki foundition storage

how about cluster performance are？ is there some improvement for the cluster？thanks in advance

Botond_Denes · May 24, 2023, 5:23am

I don’t understand the problem. Both you read and write latencies look really excellent. I don’t see anything wrong here.

Can you please expand on what it is you would like to improve on?

Eric · May 24, 2023, 7:28am

thanks .
the reads per instance metric looks like unbalanced，how to avoid it ？

Eric · May 25, 2023, 7:37am

use nodetool to analyze and found top 2 request nodes‘s local read count is very high，as following：

but the other 4 nodes are normal：

                Table: roles
                SSTable count: 1
                SSTables in each level: [1]
                Space used (live): 5588
                Space used (total): 5588
                Space used by snapshots (total): 0
                Off heap memory used (total): 2500
                SSTable Compression Ratio: 1.03947
                Number of partitions (estimate): 1
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 0
                Local read count: 16
                Local read latency: 0.631 ms

why？

Eric · May 25, 2023, 7:49am

nodetool toppartitions

WRITES Sampler:
  Cardinality: ~256 (256 capacity)
  Top 10 partitions:
        Partition                                                                    Count       +/-
        (loki_ingress:loki_ingress_index_19502) 39:fake:d19502:logs:host_ip            271       182
        (loki_ingress:loki_ingress_index_19502) 06:fake:d19502:logs                    264       166
        (loki_ingress:loki_ingress_index_19502) 24:fake:d19502:logs:pod_name           258       182
        (loki_ingress:loki_ingress_index_19502) 37:fake:d19502:logs:host_ip            254       186
        (loki_ingress:loki_ingress_index_19502) 04:fake:d19502:logs:app                253       182
        (loki_ingress:loki_ingress_index_19502) 35:fake:d19502:logs                    251       186
        (loki_ingress:loki_ingress_index_19502) 68:fake:d19502:logs:cluster            250       177
        (loki_cluster:loki_cluster_index_19502) 36:fake:d19502:logs:project_name       250       177
        (loki_ingress:loki_ingress_index_19502) 37:fake:d19502:logs:region             250       182
        (loki_cluster:loki_cluster_index_19502) 37:fake:d19502:logs:host_ip            250       186

READS Sampler:
  Cardinality: ~256 (256 capacity)
  Top 10 partitions:
        Partition                                                                               Count       +/-
        (system_auth:roles) loki_cluster                                                         2514         3
        (system_auth:role_permissions) loki_cluster                                               840         3
        (loki_ingress:loki_ingress_index_19502) XJfwfiLs+ChyIots/lMsA1u3kBzqcOZhbXFRdKExZRo        13        12
        (loki_ingress:loki_ingress_index_19502) DoddOMPdUq8qacga5dmT+Eizo+3nU7FS7GdpmNMVzxM        13        12
        (loki_ingress:loki_ingress_index_19502) BHyQ/bLGQ+OHBjWn2d/+6XHecebF3YOQB7O1vHcf+QA        13        12
        (loki_ingress:loki_ingress_index_19502) znaTja8BYdthKvleV8nXWmMd2WqSqSxCxCqfO1cRjbg        13        12
        (loki_ingress:loki_ingress_index_19502) Pqz27XaXVywaGDZLRgsDqX3NFC+yxDt8CEci3wOW8Yg        13        12
        (loki_ingress:loki_ingress_index_19502) 9p12QBPRrBoTt2QEVxFTl3IgNESw3q7+Gd36wvy1Piw        13        12
        (loki_ingress:loki_ingress_index_19502) /wJ5x7ruQrVLUgO+ySEDw1ZLMnTDXy8741vJE4GfNXk        13        12
        (loki_ingress:loki_ingress_index_19502) 1MgAXkgL4pxmMuFvOIe6WShU91w7E7NCxc1Gq8Plb+k        13        12

Botond_Denes · May 29, 2023, 5:40am

The imbalance is simply due to the low cardinality of the system_auth.* table. These tables have only a handful of entries and whichever shard owns one (or more) of these few entries, will see all of the traffic aimed at this table. You can see this from the fact that on the coordinator level, your requests are well balanced, the imbalance is entirely on the replica side.

I think there is nothing to worry here. The only reason this imbalance is even visible is because your cluster is very lightly loaded. If the imbalance persists on higher loads or even becomes more pronounced, that might indicate a problem.

avikivity · May 29, 2023, 1:27pm

The toppartitions result is inaccurate (look at the +/- column). You can increase the accuracy with the -s capacity parameter (default 256). Increase it and try again, until the error drops to reasonable levels.

Topic		Replies	Views
High read latency ScyllaDB performance , operator , latency	2	328	August 29, 2024
Performance issue, throughput drop and latency increase ScyllaDB data-model , performance , troubleshooting , sizing	0	122	November 28, 2024
ScyllaDB's read path vs. Cassandra's and performance when using HDD vs SSD ScyllaDB cassandra , performance	4	670	December 5, 2022
P99 and p95 spikes, hot partitions, performance and data modeling ScyllaDB data-model , performance , kubernetes , hot-partition	0	19	June 3, 2025
Need help in explaination on rlatencyp95 metrics exposed by scylla ScyllaDB scylladb-monitoring	3	197	May 8, 2024

Performance issue? Monitoring dashboard

nodetool toppartitions

Related topics