I am using ScyllaDb, but I think this also applies to Cassandra since ScyllaDb is compatible with Cassandra.
I have the following table (I got ~5 of this kind of tables):
create table batch_job_conversation ( conversation_id uuid, primary key (conversation_id) );
This is used by a batch job to make sure some fields are kept in sync. In the application, a lot of concurrent writes/reads can happen. Once in a while, I will correct the values with a batch job.
A lot of writes can happen to the same row, so it will overwrite the rows. A batch job currently picks up rows with this query:
select * from batch_job_conversation
Then the batch job will read the data at that point and makes sure things are in sync. I think this query is bad because it stresses all the partitions and the node coordinator because it needs to visit ALL partitions.
My question is if it is better for this kind of tables to have a fixed field? Something like this:
create table batch_job_conversation ( always_zero int, conversation_id uuid, primary key ((always_zero), conversation_id) );
And than the query would be this:
select * from batch_job_conversation where always_zero = 0
For each batch job I can use a different partition key. The amount of rows in these tables will be roughly the same size (a few thousand at most). The tables will overwrite the same row probably a lot of times.
Is it better to have a fixed value? Is there another way to handle this? I don’t have a logical partition key I can use.