Datamodel for Scylla/Cassandra for table partition key is not known beforehand -> static field?

Jasper_Visser · January 29, 2023, 5:20pm

I am using ScyllaDb, but I think this also applies to Cassandra since ScyllaDb is compatible with Cassandra.

I have the following table (I got ~5 of this kind of tables):

create table batch_job_conversation (
    conversation_id uuid,
    primary key (conversation_id)
);

This is used by a batch job to make sure some fields are kept in sync. In the application, a lot of concurrent writes/reads can happen. Once in a while, I will correct the values with a batch job.

A lot of writes can happen to the same row, so it will overwrite the rows. A batch job currently picks up rows with this query:

select * from batch_job_conversation

Then the batch job will read the data at that point and makes sure things are in sync. I think this query is bad because it stresses all the partitions and the node coordinator because it needs to visit ALL partitions.

My question is if it is better for this kind of tables to have a fixed field? Something like this:

create table batch_job_conversation
(
    always_zero     int,
    conversation_id uuid,
    primary key ((always_zero), conversation_id)
);

And than the query would be this:

select * from batch_job_conversation where always_zero = 0

For each batch job I can use a different partition key. The amount of rows in these tables will be roughly the same size (a few thousand at most). The tables will overwrite the same row probably a lot of times.

Is it better to have a fixed value? Is there another way to handle this? I don’t have a logical partition key I can use.

Botond_Denes · February 9, 2023, 12:37pm

Having just a single partition in the entire table is also bad because it makes the load on the cluster uneven: only certain shards of certain nodes will have any work to do.
Scanning a table with just a few thousand partitions should not be a problem.

Topic		Replies	Views
Is High-Cardinality Partition Key a Problem in ScyllaDB? ScyllaDB data-model , secondary-index	1	112	June 6, 2025
Recommendations for partitioning imbalanced data ScyllaDB data-model , hot-partition	1	316	November 22, 2024
Using a clustering key, impact on performance, data distribution and partition size ScyllaDB data-model , performance , large-partitions , hot-partition	0	102	June 24, 2024
Partitions in Scylla ScyllaDB	1	380	July 19, 2023
What is the maximum number of records that a scylla table can carry? ScyllaDB	3	1699	June 8, 2023

Datamodel for Scylla/Cassandra for table partition key is not known beforehand -> static field?

Related topics