We are facing latency issues and while debugging through grafana rlatencyp95, it is exposing on instance,shard level and cluster and DC level.
When the issue occurs, rlatencyp95 for instance level still seems in microseconds but on cluster level it seems spiking in seconds.
Whats the diff in metrics on instance and cluster level ?Below is the query which i used to debug the latency issue. I wanted a breakdown at instance level to identify if there is any particular instance causing this latency issue.
avg(rlatencyp99{by=“cluster”, instance, cluster=~“scyl-test”, scheduling_group_name!=“streaming”} > 0) by (cluster, instance)
But after applying this query, I observed that the latency shown at instance level is still in microseconds while the cluster level metrics still shows 30 sec as latency.
Can we get any official documentation for understanding the scope. ?