Using IN in CQL Queries, Performance

Guy · October 5, 2023, 10:46am

View in #general on Slack

@Hartmut: Hi, which would be the recommended/preferred query pattern or anti-pattern?
(A vs. B)
CREATE TABLE test (
    pk   text,
    val1 text,
    val2 text,
    PRIMARY KEY (pk)
);

INSERT INTO test (pk, val1, val2) VALUES ('a', 'foo', 'bar');
INSERT INTO test (pk, val1, val2) VALUES ('b', 'some', 'data');
INSERT INTO test (pk, val1, val2) VALUES ('c', 'scylla', 'db');

-- A) individual queries (in parallel, pooled)
SELECT * FROM test WHERE pk='a';
SELECT * FROM test WHERE pk='c';
-- B) `IN ()`
SELECT * FROM test WHERE pk IN ('a', 'c');
B) obviously isn’t shard-aware but needs to be orchestrated
I guess it may depend on the actual use case, how many rows are to be fetched and so on…
But still, I wonder if anyone has any experience or insights to share…?

@avi: Individual queries are generally better. You’ve moving some of the coordination from the server to the client, which is more easily scaled. The single IN query cannot be made shard/token aware, so you pay with an extra hop.

@Hartmut:
CREATE TABLE test2 (
    pk   text,
    ck text,
    val1 text,
    PRIMARY KEY (pk, ck)
);
SELECT * FROM test2 WHERE pk='a' AND ck IN ('x', 'y');
On the contrary, when querying a specific partition, it should be perfectly valid, correct?

@avi: Yes, in this case IN is preferable

Topic		Replies	Views
How the IN query works internally and when to use it ScyllaDB data-model	1	402	January 22, 2023
Using IN in a query for a specific partition, is the entire partition fetched? ScyllaDB data-model , performance	0	32	July 23, 2024
IN clause, composite partition key and range quereis ScyllaDB data-model , cql	0	47	January 9, 2025
Multi DC cluster, query performance with batches, GOCQL driver ScyllaDB performance , unanswered , go-driver , batch , latency	1	33	April 21, 2025
Unable to Search Same String in Two Columns Using IN Operator in ScyllaDB ScyllaDB	3	407	October 27, 2023

Using IN in CQL Queries, Performance

Related topics