Performance issue? Monitoring dashboard

os: Ubuntu 16
cluster nodes: 6

used for loki foundition storage

how about cluster performance are? is there some improvement for the cluster?thanks in advance

I don’t understand the problem. Both you read and write latencies look really excellent. I don’t see anything wrong here.

Can you please expand on what it is you would like to improve on?

thanks .
the reads per instance metric looks like unbalanced,how to avoid it ?

use nodetool to analyze and found top 2 request nodes‘s local read count is very high,as following:
but the other 4 nodes are normal:

                Table: roles
                SSTable count: 1
                SSTables in each level: [1]
                Space used (live): 5588
                Space used (total): 5588
                Space used by snapshots (total): 0
                Off heap memory used (total): 2500
                SSTable Compression Ratio: 1.03947
                Number of partitions (estimate): 1
                Memtable cell count: 0
                Memtable data size: 0
                Memtable off heap memory used: 0
                Memtable switch count: 0
                Local read count: 16
                Local read latency: 0.631 ms


nodetool toppartitions

WRITES Sampler:
  Cardinality: ~256 (256 capacity)
  Top 10 partitions:
        Partition                                                                    Count       +/-
        (loki_ingress:loki_ingress_index_19502) 39:fake:d19502:logs:host_ip            271       182
        (loki_ingress:loki_ingress_index_19502) 06:fake:d19502:logs                    264       166
        (loki_ingress:loki_ingress_index_19502) 24:fake:d19502:logs:pod_name           258       182
        (loki_ingress:loki_ingress_index_19502) 37:fake:d19502:logs:host_ip            254       186
        (loki_ingress:loki_ingress_index_19502) 04:fake:d19502:logs:app                253       182
        (loki_ingress:loki_ingress_index_19502) 35:fake:d19502:logs                    251       186
        (loki_ingress:loki_ingress_index_19502) 68:fake:d19502:logs:cluster            250       177
        (loki_cluster:loki_cluster_index_19502) 36:fake:d19502:logs:project_name       250       177
        (loki_ingress:loki_ingress_index_19502) 37:fake:d19502:logs:region             250       182
        (loki_cluster:loki_cluster_index_19502) 37:fake:d19502:logs:host_ip            250       186

READS Sampler:
  Cardinality: ~256 (256 capacity)
  Top 10 partitions:
        Partition                                                                               Count       +/-
        (system_auth:roles) loki_cluster                                                         2514         3
        (system_auth:role_permissions) loki_cluster                                               840         3
        (loki_ingress:loki_ingress_index_19502) XJfwfiLs+ChyIots/lMsA1u3kBzqcOZhbXFRdKExZRo        13        12
        (loki_ingress:loki_ingress_index_19502) DoddOMPdUq8qacga5dmT+Eizo+3nU7FS7GdpmNMVzxM        13        12
        (loki_ingress:loki_ingress_index_19502) BHyQ/bLGQ+OHBjWn2d/+6XHecebF3YOQB7O1vHcf+QA        13        12
        (loki_ingress:loki_ingress_index_19502) znaTja8BYdthKvleV8nXWmMd2WqSqSxCxCqfO1cRjbg        13        12
        (loki_ingress:loki_ingress_index_19502) Pqz27XaXVywaGDZLRgsDqX3NFC+yxDt8CEci3wOW8Yg        13        12
        (loki_ingress:loki_ingress_index_19502) 9p12QBPRrBoTt2QEVxFTl3IgNESw3q7+Gd36wvy1Piw        13        12
        (loki_ingress:loki_ingress_index_19502) /wJ5x7ruQrVLUgO+ySEDw1ZLMnTDXy8741vJE4GfNXk        13        12
        (loki_ingress:loki_ingress_index_19502) 1MgAXkgL4pxmMuFvOIe6WShU91w7E7NCxc1Gq8Plb+k        13        12

The imbalance is simply due to the low cardinality of the system_auth.* table. These tables have only a handful of entries and whichever shard owns one (or more) of these few entries, will see all of the traffic aimed at this table. You can see this from the fact that on the coordinator level, your requests are well balanced, the imbalance is entirely on the replica side.

I think there is nothing to worry here. The only reason this imbalance is even visible is because your cluster is very lightly loaded. If the imbalance persists on higher loads or even becomes more pronounced, that might indicate a problem.

The toppartitions result is inaccurate (look at the +/- column). You can increase the accuracy with the -s capacity parameter (default 256). Increase it and try again, until the error drops to reasonable levels.