How to ensure a Batch query reaches the correct partition?

bayov · January 27, 2024, 3:45pm

Hi all. Completely new to Cassanda/Scylla type databases. Here’s my question:

Let’s say I’m trying to model a simple event-sourcing database using ScyllaDB:

create table test_keyspace.event_streams (
    stream_id int,
    commit_number int,
    payload text,
    primary key (stream_id, commit_number)
)

So partitioning is based on stream_id.

I want to perform batch event inserts, where all events have the same stream_id, and I want this query to be sent to the right nodes (the ones matching the stream_id partition), how can I ensure that?

Example batch query:

begin batch

insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 0, 'create') if not exists;

insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 1, 'rename') if not exists;

insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 2, 'foo') if not exists;

insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 3, 'bar') if not exists;

apply batch;

Can I ensure the query above reaches only the nodes that are part of the partition of stream_id = 42?

bayov · January 27, 2024, 3:48pm

Also, can anyone help me understand LOGGED vs UNLOGGED better?

Botond_Denes · January 29, 2024, 8:17am

ScyllaDB takes care of this automatically, you don’t have to do anything for a query to be only sent to the nodes it should be sent to.,
Note that your driver will send the query to a single ScyllaDB node (the coordinator – this is chosen according to the session’s load-balancing policy), the coordinator will take care of only contacting replicas which are affected by the query.

BTW why do you want to use a BATCH? In general, individual queries are better.

Botond_Denes · January 29, 2024, 8:18am

Logged batches are written into a system table, before they are executed. Upon failure, they will be retried until they succeed.

Lorak · January 29, 2024, 12:27pm

To add to this from Rust driver perspective (I see that the question has a rust tag):
Rust driver will base it’s choice of coordinator node for the batch on the first statement of it. This is a heuristic to get some form of shard-awareness for batches. This heuristic is perfect for your scenario @bayov , assuming you only use prepared statements in batches - which you definitely should! Unprepared statements containing values in batches have a massive performance penalty in Rust driver.

bayov · January 29, 2024, 3:33pm

Hi! thanks for the response!

I’m trying to model event sourcing using ScyllaDB. I want persisted events to represent immutable facts in my system.

Sometimes, a single action in the system will generate more than one event (probably always less than a dozen though). I want to atomically commit them to the DB to ensure that either all of these events become immutable facts, or none of them do.

My desired transaction boundary is a single stream ID, so I won’t try to achieve atomic writes to multiple different stream IDs, only to one. And a stream ID should always exist in a single partition, so it should be possible to guarantee this transaction boundary.

Does that make sense?

bayov · January 29, 2024, 3:36pm

What is the use-case for UNLOGGED mode? Is it simply to save network bandwidth and perform multiple queries together, but without any transactional-guarantees?

Botond_Denes · January 30, 2024, 6:04am

Yes, BATCH inserts to a single partition are indeed applied as a single mutation. A single mutation is applied atomically, it either succeed or fails, there is no in-between.

Topic		Replies	Views
Datamodel for Scylla/Cassandra for table partition key is not known beforehand -> static field? ScyllaDB	1	301	February 9, 2023
What is the intended way to handle errors from insert/update using the rust driver? ScyllaDB rust-driver	0	34	July 15, 2024
Are BATCH insert/update into different tables logged + "safe"? ScyllaDB open-source	5	939	October 13, 2023
Best practice for multi-row insertions ScyllaDB data-model , drivers , rust , batch	1	56	April 6, 2025
[RELEASE] ScyllaDB Rust Driver 0.14.0 Release Notes release , drivers , rust-driver , rust	0	49	September 5, 2024

How to ensure a Batch query reaches the correct partition?

Related topics