I have been tasked to come up with “some number” of how much memory it takes to store one record in Scylla (and related overhead if any). We had commands in Redis Cache that produced these metrics. Is that even possible to do in Scylla? I have scyllatop running and also the monitoring stack.
One way to accomplish this task is to create a million “average” records, execute “nodetool flush” to make sure they get flushed to disk (which will also place them in cache), then look at the cache metrics and divide the cache memory usage by the total rows in cache.
Thanks, a couple of follow-up questions. When you say “cache metrics,” do you imply ones provided by scyllatop ? I presume this is where I get cache memory usage from? Also, would that work in a clustered environment (I have 8 nodes spread over four DCs). I know cache is over cluster, but what happens if not all records are in the cache? How would I find out how many have been put into cache?
The metrics are available by scyllatop, though a nicer way to see them is with Prometheus or Grafana (see scylla-monitoring.git). The metrics contain the number of partitions in cache, rows in cache, and memory in cache, so from there you compute any statistic you want.
About the rows not in cache, well they don’t have any cache footprint.