Per partition local ordering

Artem_Golovko · November 27, 2024, 11:34pm

Installation details
ScyllaDB version: 6.2.1

Data model:

CREATE TABLE data(
  p TEXT,
  c TIMESTAMP,
  v BLOB,
  PRIMARY KEY(p, c)
)

Queries: Give me the first and the last value for each partition.

SELECT * FROM data WHERE p IN (‘p1’, ‘p2’, ‘p3’) PER PARTITION LIMIT 1
Here is we can get the first point for each partition and don’t care about “global” ordering, so resulted rows won’t be sorted by clustering key and natural order will be applied only locally within each partition. Result may look like this:

p1 2024-11-01 00:35:00
p2 2024-11-01 00:30:00
p3 2024-11-01 00:31:00

SELECT * FROM data WHERE p IN (‘p1’, ‘p2’, ‘p3’) ORDER BY c DESC PER PARTITION LIMIT 1
It’s not possible to use IN and ORDER BY together with paging enabled. The reason is that scylla applies order “localy” within each partition, but also applies it “globally” across the resulted rows that makes scylla store result in-memory to apply “global” sorting. But if I don’t care about “global” ordering and only wanted to specify ordering within each partition that introduce performance overhead. Result may look like this:

p1 2024-11-27 23:35:00
p2 2024-11-27 23:30:00
p3 2024-11-27 23:31:00

Question: Does scylla supports “per partition ordering” only? And if not, are there any plans to support it?

Botond_Denes · December 2, 2024, 12:42pm

The short answer is that ScyllaDB doesn’t support this and I’m not aware of any plans to support it either.

Note that making the ordering restricted to a single partition will still not make it possible to execute this query: a partition can be arbitrarily big and to order it according to any other order than the clustering one, we still have to read it all into memory, which might not be possible.

Artem_Golovko · December 6, 2024, 7:10pm

Thanks for the answer, but I still didn’t get why we need read whole result in memory when we are sorting by clustering key, but in reversed order? Let’s have a look at the following queries:

Get me first row for each partition and ignore sorting across the final result

SELECT * FROM data WHERE p IN (‘p1’, ‘p2’, ‘p3’) 
PER PARTITION LIMIT 1

We don’t need to read everything in the memory, since we don’t need to sort the final result, right? To follow further examples, let’s introduce new keyword “PER PARTITION ORDER”, so we can write the same query in more explicit way

SELECT * FROM data WHERE p IN (‘p1’, ‘p2’, ‘p3’) 
PER PARTITION LIMIT 1 
PER PARTITION ORDER DEFAULT

Get me first row for each partition and perform sorting across the final result

SELECT * FROM data WHERE p IN (‘p1’, ‘p2’, ‘p3’) 
PER PARTITION LIMIT 1 
ORDER BY c ASC

We do need to read everything in memory, because we want the final result to be sorted.

And now I want to make the same query like the first one, but in reversed order. So, it should return the last row for each partition, but the final result not necessary to be sorted. With suggested keyword it will look like that

SELECT * FROM data WHERE p IN (‘p1’, ‘p2’, ‘p3’) 
PER PARTITION LIMIT 1 
PER PARTITION ORDER REVERSED

But instead I need to do this that introduce overhead and only possible to do with pagging disabled

SELECT * FROM data WHERE p IN (‘p1’, ‘p2’, ‘p3’) 
PER PARTITION LIMIT 1 
ORDER by c DESC

Botond_Denes · December 10, 2024, 8:43am

You are right, examples (1) and (3) do not require reading all results into memory, as the end results are partially sorted only.
I think this PER PARTITION ORDER would be possible to implement.

Artem_Golovko · December 10, 2024, 2:33pm

Thanks, sounds great. Do I need to create feature request on the github for that? Or this feature may be considered only for the Enterprise version?

Botond_Denes · December 10, 2024, 3:06pm

Feel free to create a feature request issue in the open-source repository.

Topic		Replies	Views
Does Scylla reorder columns in newly created tables? ScyllaDB data-model	3	254	January 31, 2024
Materialized Views and Indexing, filtering columns by range, ALLOW FILTERING ScyllaDB data-model , materialized-views , secondary-index	0	148	May 8, 2024
Partitions in Scylla ScyllaDB	1	230	July 19, 2023
Local Secondary Index filtering ScyllaDB	9	758	November 29, 2023
Knowing ScyllaDB Limitations ScyllaDB data-model	1	2929	January 9, 2023

Per partition local ordering

Related topics