Using a clustering key, impact on performance, data distribution and partition size

Guy · June 24, 2024, 4:54am

Originally from the User Slack

@Bohdan_Smal: Hello everyone,
I have a somewhat general question. Could you please advise on the potential risks of not using a clustering key?
Currently, we have a table where the primary key is a combination of brand and client_id, and the clustering key is transaction_id. We have observed that we could achieve better performance and more even data distribution if we don’t use a clustering key. Instead, we would set the primary key as transaction_id, brand, and client_id.
This way, each transaction becomes a separate partition, ensuring no imbalance in partition distribution across nodes, even if some clients have more transactions. However, we are not fully aware of the potential risks associated with having a large number of partitions.
Can anyone explain the possible risks of this approach?
Thank you!

@Karol_Baryła: I’m not sure what are the performance implications of large number of partitions. What comes to my mind is usability. With such a schema you can’t e.g:
• Select all transactions for a given user without using ALLOW FILTERING and making the query much slower this way.
• Use LWT / Batches with LWT to atomically update several transactions for a given user.

@Bohdan_Smal: Got it, thank you for your response. Have a great day!

@avi: In fact having more and smaller partitions is better than having fewer and larger partitions. So if you don’t need a clustering key for sorting and grouping, don’t use it.

@Bohdan_Smal: thank you)

Topic		Replies	Views
How Do Many Small Partitions Influence Memory Usage in ScyllaDB? ScyllaDB data-model , performance , bloom-filter	1	157	September 25, 2024
What is the difference between Clustering, Primary, Partition, and Composite (or Compound) Keys in ScyllaDB? Knowledge Base	0	1083	November 2, 2022
Querying by non partition key column, creating an index ScyllaDB data-model , secondary-index	0	118	March 31, 2024
Recommendations for partitioning imbalanced data ScyllaDB data-model , hot-partition	1	143	November 22, 2024
Data model for frequent deletes with partition key ScyllaDB data-model	7	89	August 8, 2024

Using a clustering key, impact on performance, data distribution and partition size

Related topics