How alternator creates MV with 2 columns not present in PK of original table?

I want to utilize single table design in Scylla without resorting to DynamoDB. I suspected it does something non trivial to support Dynamo GSI Hash and Sort keys so I created a sample table through alternator and when I ran DESC on the keyspace I see the impossible:

CREATE TABLE alternator_dynamo_entities.dynamo_entities (
    "PK" text,
    "SK" text,
    ":attrs" map<text, blob>,
    "GSI1PK" text,
    "GSI1SK" text,
    PRIMARY KEY ("PK", "SK")
) WITH CLUSTERING ORDER BY ("SK" ASC) ...

CREATE MATERIALIZED VIEW alternator_dynamo_entities."dynamo_entities:GSI1" AS
    SELECT *
    FROM alternator_dynamo_entities.dynamo_entities
    WHERE "GSI1PK" IS NOT NULL AND "GSI1SK" IS NOT NULL
    PRIMARY KEY ("GSI1PK", "GSI1SK", "PK", "SK")
    WITH CLUSTERING ORDER BY ("GSI1SK" ASC, "PK" ASC, "SK" ASC) ...

Evidently I get error trying to replicate this schema manually since only 1 extra column not present in PK is allowed. How is this possible and what should I do to be able to create alternator-like secondary hash and sort keys?

As you noticed, in CQL (both Cassandra and Scylla), when you create a materialized view you are limited to adding at most one regular column from the base table to the view’s key. You are not allowed to add two. The reason for this limitation appears sporadically on documents on the web, but isn’t documented well enough and already three years ago I opened an issue Document better the restriction of a view to have only one new key column · Issue #6714 · scylladb/scylladb · GitHub to document it better (but unfortunately never got around to it).

The reason why adding two base attributes to the view key was not allowed wasn’t some laziness or oversight, but rather a genuine problem that can happen if the user can manipulate the “liveness” properties of these two attributes - their timestamp and ttl - separately - and whether we can handle in such case the liveness of the view row.

So, you may be asking now, why did we allow doing this seemingly-dangerous thing in Alternator? The thinking was (and issue #6714 was created to explain this in more detail) that because Alternator does not have TTLs, and doesn’t let you control timestamps (writes always use the current timestamps, later writes always have a later timestamp), it is not vulnerable to this problem. So we allow this case in Alternator, which is lucky because we need this feature for DynamoDB compatibility.

In the aforementioned issue, I also suggested that we should “discuss whether it might be possible for a CQL user who promises to adhere to certain limitations on the TTLs or timestamps, that are similar to Alternator’s, could also be allowed to use multiple regular base columns for the view key.”. But unfortunately, this doesn’t currently exist - the code doesn’t allow you to create such table via CQL.

By the way, Cassandra also doesn’t have this feature, and a request to add it has been open for 8 years now - [CASSANDRA-9928] Add Support for multiple non-primary key columns in Materialized View primary keys - ASF JIRA.

2 Likes

Thanks a ton! I stumbled upon this exact issue and even ended up down to reading the same unresolved issues with Cassandra. I understand the problem is genuine and requires user to restrict the usage of some advanced features of Scylla. Still, it would be great to eventually expose an option to allow this from CQL and remember not to touch that features. A better option would be to enforce that on DDL level so the table knows it has multiple column MVs and disables conflicting subset of CQL.