version:4.6.4
os: Ubuntu 16
cluster nodes: 6
used for loki foundition storage
how about cluster performance are? is there some improvement for the cluster?thanks in advance
version:4.6.4
os: Ubuntu 16
cluster nodes: 6
used for loki foundition storage
how about cluster performance are? is there some improvement for the cluster?thanks in advance
I don’t understand the problem. Both you read and write latencies look really excellent. I don’t see anything wrong here.
Can you please expand on what it is you would like to improve on?
use nodetool to analyze and found top 2 request nodes‘s local read count is very high,as following:
but the other 4 nodes are normal:
Table: roles
SSTable count: 1
SSTables in each level: [1]
Space used (live): 5588
Space used (total): 5588
Space used by snapshots (total): 0
Off heap memory used (total): 2500
SSTable Compression Ratio: 1.03947
Number of partitions (estimate): 1
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 16
Local read latency: 0.631 ms
why?
WRITES Sampler:
Cardinality: ~256 (256 capacity)
Top 10 partitions:
Partition Count +/-
(loki_ingress:loki_ingress_index_19502) 39:fake:d19502:logs:host_ip 271 182
(loki_ingress:loki_ingress_index_19502) 06:fake:d19502:logs 264 166
(loki_ingress:loki_ingress_index_19502) 24:fake:d19502:logs:pod_name 258 182
(loki_ingress:loki_ingress_index_19502) 37:fake:d19502:logs:host_ip 254 186
(loki_ingress:loki_ingress_index_19502) 04:fake:d19502:logs:app 253 182
(loki_ingress:loki_ingress_index_19502) 35:fake:d19502:logs 251 186
(loki_ingress:loki_ingress_index_19502) 68:fake:d19502:logs:cluster 250 177
(loki_cluster:loki_cluster_index_19502) 36:fake:d19502:logs:project_name 250 177
(loki_ingress:loki_ingress_index_19502) 37:fake:d19502:logs:region 250 182
(loki_cluster:loki_cluster_index_19502) 37:fake:d19502:logs:host_ip 250 186
READS Sampler:
Cardinality: ~256 (256 capacity)
Top 10 partitions:
Partition Count +/-
(system_auth:roles) loki_cluster 2514 3
(system_auth:role_permissions) loki_cluster 840 3
(loki_ingress:loki_ingress_index_19502) XJfwfiLs+ChyIots/lMsA1u3kBzqcOZhbXFRdKExZRo 13 12
(loki_ingress:loki_ingress_index_19502) DoddOMPdUq8qacga5dmT+Eizo+3nU7FS7GdpmNMVzxM 13 12
(loki_ingress:loki_ingress_index_19502) BHyQ/bLGQ+OHBjWn2d/+6XHecebF3YOQB7O1vHcf+QA 13 12
(loki_ingress:loki_ingress_index_19502) znaTja8BYdthKvleV8nXWmMd2WqSqSxCxCqfO1cRjbg 13 12
(loki_ingress:loki_ingress_index_19502) Pqz27XaXVywaGDZLRgsDqX3NFC+yxDt8CEci3wOW8Yg 13 12
(loki_ingress:loki_ingress_index_19502) 9p12QBPRrBoTt2QEVxFTl3IgNESw3q7+Gd36wvy1Piw 13 12
(loki_ingress:loki_ingress_index_19502) /wJ5x7ruQrVLUgO+ySEDw1ZLMnTDXy8741vJE4GfNXk 13 12
(loki_ingress:loki_ingress_index_19502) 1MgAXkgL4pxmMuFvOIe6WShU91w7E7NCxc1Gq8Plb+k 13 12
The imbalance is simply due to the low cardinality of the system_auth.*
table. These tables have only a handful of entries and whichever shard owns one (or more) of these few entries, will see all of the traffic aimed at this table. You can see this from the fact that on the coordinator level, your requests are well balanced, the imbalance is entirely on the replica side.
I think there is nothing to worry here. The only reason this imbalance is even visible is because your cluster is very lightly loaded. If the imbalance persists on higher loads or even becomes more pronounced, that might indicate a problem.
The toppartitions
result is inaccurate (look at the +/- column). You can increase the accuracy with the -s capacity
parameter (default 256). Increase it and try again, until the error drops to reasonable levels.