I want to perform batch event inserts, where all events have the same stream_id, and I want this query to be sent to the right nodes (the ones matching the stream_id partition), how can I ensure that?
Example batch query:
begin batch
insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 0, 'create') if not exists;
insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 1, 'rename') if not exists;
insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 2, 'foo') if not exists;
insert into test_keyspace.event_streams (stream_id, commit_number, payload)
values (42, 3, 'bar') if not exists;
apply batch;
Can I ensure the query above reaches only the nodes that are part of the partition of stream_id = 42?
ScyllaDB takes care of this automatically, you don’t have to do anything for a query to be only sent to the nodes it should be sent to.,
Note that your driver will send the query to a single ScyllaDB node (the coordinator – this is chosen according to the session’s load-balancing policy), the coordinator will take care of only contacting replicas which are affected by the query.
BTW why do you want to use a BATCH? In general, individual queries are better.
To add to this from Rust driver perspective (I see that the question has a rust tag):
Rust driver will base it’s choice of coordinator node for the batch on the first statement of it. This is a heuristic to get some form of shard-awareness for batches. This heuristic is perfect for your scenario @bayov , assuming you only use prepared statements in batches - which you definitely should! Unprepared statements containing values in batches have a massive performance penalty in Rust driver.
I’m trying to model event sourcing using ScyllaDB. I want persisted events to represent immutable facts in my system.
Sometimes, a single action in the system will generate more than one event (probably always less than a dozen though). I want to atomically commit them to the DB to ensure that either all of these events become immutable facts, or none of them do.
My desired transaction boundary is a single stream ID, so I won’t try to achieve atomic writes to multiple different stream IDs, only to one. And a stream ID should always exist in a single partition, so it should be possible to guarantee this transaction boundary.
What is the use-case for UNLOGGED mode? Is it simply to save network bandwidth and perform multiple queries together, but without any transactional-guarantees?
Yes, BATCH inserts to a single partition are indeed applied as a single mutation. A single mutation is applied atomically, it either succeed or fails, there is no in-between.