Originally from the User Slack
@Dylan_Piette: Hello to the scylla team !
I’m trying to understand why some of my queries are slow and how I can change that
Basically I created an index on a table and then I directly query this index to get the distinct partition key (scheduled_windows) of the index
select DISTINCT scheduled_window from email_scheduled_messages_schedule_window_index using timeout 60s;
And it takes a long time even if there’s no data in it
I understand that it might be related to tombstones
So I changed the gc_grace_periods of the table and the materialized view created by the index
I then ran nodetool compact and expected all the tombstones to disappear and then to get better performance but still I get horrible performances for an empty table
Here is a tracing session of the query
Could someone help me understand what’s going wrong ?
And for information I have only one node and it’s on a big machine with an nvme disk
@Felipe_Cardeneti_Mendes: shard 3 was picked as the query coordinator. You can see the source_elapsed
columns increases considerably as you hit ranges with dead rows. First one is here:
> Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 530275 clustering row(s) (0 live, 530275 dead) and 0 range tombstone(s) [shard 3] | 2024-05-02 15:26:24.799904 | 172.16.0.4 | 927112 | 172.16.0.4
Second one is:
> Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 355826 clustering row(s) (0 live, 355826 dead) and 0 range tombstone(s) [shard 3] | 2024-05-02 15:26:25.426583 | 172.16.0.4 | 1553790 | 172.16.0.4
And the time it took to fully run the query:
> Done processing - preparing a result [shard 3] | 2024-05-02 15:26:25.427603 | 172.16.0.4 | 1554810 | 172.16.0.4
So you’re running a fullscan, this full scan takes a while to populate a page worth of data, eventually hits ranges still with tombstones, and processing time increases.
You may see if you get better result with BYPASS CACHE
but, in general you would want a restriction while querying to avoid the full scan all the time
@Piette_Dylan: Thanks for your answer
What do you mean by putting a restriction ? Because I need to do a full scan if I want to do get all the different partition keys no ?
And how do you explain that changing the gc_grace_periods of the table and the view, compacting them still doesn’t make the situation better ?
@Felipe_Cardeneti_Mendes: If you frequently want to retrieve all distinct keys, then do a efficient token scan instead and append BYPASS CACHE
to it. This will break down the scan to smaller queries which are going to be picked up by different shards and nodes, rather than a single query landing in an individual shard having to carry on the entire work.
> And how do you explain that changing the gc_grace_periods of the table and the view, compacting them still doesn’t make the situation better ?
I don’t know how the situation was before, so I can’t say whether the situation improved or not. One possibility could be that you had data in memtables shadowing SSTables, which would be true if you use USING TIMESTAMP
for inserts. Another possibility could be simply that your gc_grace wasn’t low enough to expire the tombstones.
In any case, your trace clearly shows that scans went all to cache, maybe you’ve hit https://github.com/scylladb/scylladb/issues/6033 - a way to check is to just BYPASS CACHE
or restart the node the empty the cache.
@Piette_Dylan: It is indeed that, the tombstones are not evicted from the cache
The bypass cache works great, thanks !
If I understood correctly that problem would be solved if I upgrade to 5.4 ?
@Felipe_Cardeneti_Mendes: Things should be improved, yes - But maybe not the way you’d expect.
In particular, expired row and range tombstones are evicted on access from the cache. It means that if your cache accumulated many of them, an initial scan will still be penalized, but subsequent ones won’t.
Cell tombstones and partition tombstones aren’t addressed yet - the bottom of the issue explains the rationale around those. Most likely partition tombstones are just fine, but cell tombstones aren’t.
But all this may be orthogonal to your situation actually. The thing with full scans is that it will require scanning your entire data set, populating the cache causing potentially eviction of important rows, and so on. So for full scans specifically, it is generally just better to simply BYPASS CACHE
, unless you know for certain your cache can hold your entire data set or that evictions won’t bottleneck other maybe important queries to frequently accessed partitions.
There’s also the more common scenario of non-expired tombstones. It is important to try to understand why many ended up accumulating after all, maybe tuning compaction settings.
@avi: Note: ScyllaDB is particularly bad with empty or almost-empty tables, since it can’t amortize the large number of vnodes and shards over a large number of rows. This is expected to improve with tablets.