How to ensure a Batch query reaches the correct partition?

Hi all. Completely new to Cassanda/Scylla type databases. Here’s my question:

Let’s say I’m trying to model a simple event-sourcing database using ScyllaDB:

create table test_keyspace.event_streams (
    stream_id int,
    commit_number int,
    payload text,
    primary key (stream_id, commit_number)
)

So partitioning is based on stream_id.

I want to perform batch event inserts, where all events have the same stream_id, and I want this query to be sent to the right nodes (the ones matching the stream_id partition), how can I ensure that?

Example batch query:

begin batch

insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 0, 'create') if not exists;

insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 1, 'rename') if not exists;

insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 2, 'foo') if not exists;

insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 3, 'bar') if not exists;

apply batch;

Can I ensure the query above reaches only the nodes that are part of the partition of stream_id = 42?

Also, can anyone help me understand LOGGED vs UNLOGGED better?

ScyllaDB takes care of this automatically, you don’t have to do anything for a query to be only sent to the nodes it should be sent to.,
Note that your driver will send the query to a single ScyllaDB node (the coordinator – this is chosen according to the session’s load-balancing policy), the coordinator will take care of only contacting replicas which are affected by the query.

BTW why do you want to use a BATCH? In general, individual queries are better.

2 Likes

Logged batches are written into a system table, before they are executed. Upon failure, they will be retried until they succeed.

1 Like

To add to this from Rust driver perspective (I see that the question has a rust tag):
Rust driver will base it’s choice of coordinator node for the batch on the first statement of it. This is a heuristic to get some form of shard-awareness for batches. This heuristic is perfect for your scenario @bayov , assuming you only use prepared statements in batches - which you definitely should! Unprepared statements containing values in batches have a massive performance penalty in Rust driver.

1 Like

Hi! thanks for the response!

I’m trying to model event sourcing using ScyllaDB. I want persisted events to represent immutable facts in my system.

Sometimes, a single action in the system will generate more than one event (probably always less than a dozen though). I want to atomically commit them to the DB to ensure that either all of these events become immutable facts, or none of them do.

My desired transaction boundary is a single stream ID, so I won’t try to achieve atomic writes to multiple different stream IDs, only to one. And a stream ID should always exist in a single partition, so it should be possible to guarantee this transaction boundary.

Does that make sense?

What is the use-case for UNLOGGED mode? Is it simply to save network bandwidth and perform multiple queries together, but without any transactional-guarantees?

Yes, BATCH inserts to a single partition are indeed applied as a single mutation. A single mutation is applied atomically, it either succeed or fails, there is no in-between.

1 Like