My table has the following primary key: PRIMARY KEY((tenant_id, group_id))
If I have to query data based on tenant_id, I have two options:
select * from tablename where tenant_id=‘something’ ALLOW FILTERING
Pros:
No need to create an index(no extra materialized view is created)
Useful if there are fewer calls on the above query
Cons:
filtering the whole table
Create an index on tenant_id:
Pros:
No need for allow filtering
Cons:
Updating requires updating the index
Questions:
In #1 If I use part of the partition key with allow filtering will it scan the whole table (I want to know how filtering works when the partition key part is specified) If you can share any related post, that will be useful.
Is there any other approach that can be used?
*Originally asked on ScyllaDB’s community slack channel
Yes, despite using part of the partition key, this will result in a full scan and filtering. The reason is that Scylla hashes the partition key as a whole, and there is no way to infer all relevant hashes (tokens) from a partial key.
Not that I can think of. The choice is a tradeoff, and which one is best depends on how often you run this query. If it is very rare, the filtering is better. If this is regular, then I recommend creating an index.
The tradeoff is this: pay all costs at query time or spread the cost over all writes, each paying a small portion of it.
Understood. If I create an index on “tenant_id” and the table has records let’s say, in 1 lakh per tenant. Will this work properly with the paginated response?
As explained in the below post, it uses in clause internally.