Originally from the User Slack
@Andres: I have an issue where subgroups of my data have vastly different volumes and this volume disparity isn’t something we can change.
A common query I need to do is “what is the latest state for all active ‘agents’ in a subgroup” so I need to design my tables such that searching with just the subgroup is efficient, meaning have it be the partition key:
CREATE TABLE agents (
subgroup text,
agent_id uuid,
/* snip agent metadata */,
PRIMARY KEY (subgroup, agent_id)
);
However, this leads to hot/large partitions for the larger subgroups that have magnitudes more number of agents than our smaller subgroups.
If I make the agent_id
part the partition key, then I no longer have a hot partition (each agent writes at about the same rate) but I then lose the ability to quickly query for all agents in a subgroup (queries for smaller subgroups still have to go sift through the data of the larger subgroups).
I considered maybe making separate tables or keyspaces per subgroup (since I don’t need to ever search across subgroups) but it seems like having a dynamic number of tables is an anti-pattern (https://forum.scylladb.com/t/any-restriction-on-number-of-tables-in-a-keyspace/482). Do you all have any ideas or alternatives?
@Felipe_Cardeneti_Mendes: Thinking out loud — maybe hash the agent_id UUID (a 16 byte array) and split it to corresponding buckets
When you need to query a specific agent you already know which bucket it lives
When you want to scan all agents, simply read from all buckets concurrently. Find the good balance where ((subgroup, bucket), agent_id) is optimal
@Andres: So keep a static set of bucket identifiers so if I need to query for all agents I iterate through all the buckets client-side, doing a query per bucket?
@Felipe_Cardeneti_Mendes: Yeap— this should spread the load accordingly. The problem is that you cant be too optimistic on the number of buckets as you otherwise risk the smaller subgroups from becoming expensive to read — if you are too conservative you wont resolve the problem as it is.
Another option would be to maybe only apply this to outliers (under a corresponding table) and transition subgroups as needed.
Scanning should be fast as you query all buckets in parallel
@Andres: It’s an interesting idea, unfortunate that unless I only do it to outliers then even the small subgroups also get the “cost” of buckets but it may be worth it. I’d have to think further on how painful, if at all, it would be to do it to only outliers and then transition as needed.
I was also considering maintaining a redis ordered set of subgroup to agent uuids, so that way I can have the agent uuids in the partition key because I would know the uuids via the redis cache
@Felipe_Cardeneti_Mendes: Yeap, a reverse lookup table also works. Scanning an entire subgroup with too many agents may be slower though. Instead of scanning each bucket individually you will now query each agent— you will need to decide on what works and scales best for your use case.
@Andres: Since it will be a query using the IN operation, instead of a query per agent, it shouldn’t be that much slower right? I was imagining a query for every few thousand agents so similarly to paging.
@Felipe_Cardeneti_Mendes: > Since it will be a query using the IN operation, instead of a query per agent, it shouldn’t be that much slower right? I was imagining a query for every few thousand agents so similarly to paging.
It works, but may create shard contention under some circumstances, beware of it. A query (IN or not) is handled by a single coordinator shard. When you ask for many other keys on potentially different shards, this coordinator needs to coordinate with other replicas. Best to test and ensure it matches your P99 latencies requirements.