What is the difference between Clustering, Primary, Partition, and Composite (or Compound) Keys in ScyllaDB?

In ScyllaDB (and Apache Cassandra for that matter) A Primary Key is defined within a table. It is one or more columns used to identify a row. All tables must include a definition for a Primary Key. For example, in the table:

CREATE TABLE heartrate_v1 (
pet_chip_id uuid,
time timestamp,
heart_rate int,
PRIMARY KEY (pet_chip_id)
);

The Primary Key is a single column – the pet_chip_id. If a Primary Key is made up of a single column, it is called a Simple Primary Key.

It’s also possible to define the Primary Key to include more than one column, in which case it is called a Composite (or Compound) key. For example:

CREATE TABLE heartrate_v2 (
pet_chip_id uuid,
time timestamp,
heart_rate int,
PRIMARY KEY (pet_chip_id, time)
);

In this case, the first part of the Primary Key is called the Partition Key (pet_chip_id in the above example) and the second part is called the Clustering Key (time).
The Partition Key is responsible for data distribution across the nodes. It determines which node will store a given row. It can be one or more columns.
architecture_ring_modified

The Clustering Key is responsible for sorting the rows within the partition. It can be zero or more columns.

If a table has more than one column defined as the Primary Key, for example:

CREATE TABLE heartrate_v3 (
pet_chip_id uuid,
time timestamp,
heart_rate int,
pet_name text,
PRIMARY KEY ((pet_chip_id, time), pet_name)
);

In this case, the Partition Key includes two columns: pet_chip_id and time, and the Clustering Key is pet_name. Every query must include all the columns defined in the Partition Key (pet_chip_id and time) in this case.

Look at another example:

CREATE TABLE heartrate_v4 (
pet_chip_id uuid,
time timestamp,
heart_rate int,
pet_name text,
PRIMARY KEY (pet_chip_id, pet_name, heart_rate)
);

If there is more than one column in the Clustering Key (pet_name and heart_rate in the example above), the order of these columns defines the clustering order. For a given partition, all the rows are physically ordered inside ScyllaDB by the clustering order. This order determines what select queries you can efficiently run on this partition.
In this example, the ordering is first by pet_name and then by heart_rate.

In addition to the Partition Key columns, a query may include the Clustering Key. If it does include the Clustering Key columns, they must be used in the same order as they were defined.

Additional Resources:

1 Like