I’m using ScyllaDB to store subscription relationships in the following business scenario:
Requirements: The data model needs to support these queries:
app_id = ? AND user_id = ? AND ts >= <lastTs> LIMIT 10
for incremental data queries to sync with the client.app_id = ? AND user_id = ? AND target_user_id = ?
for precise subscription lookups.
Table Structure:
CREATE KEYSPACE IF NOT EXISTS userstore WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
CREATE TABLE IF NOT EXISTS subscription (
app_id bigint,
user_id text,
target_user_id text,
expiry int,
ts bigint,
PRIMARY KEY((app_id, user_id), target_user_id)
) WITH CLUSTERING ORDER BY (target_user_id ASC);
CREATE MATERIALIZED VIEW IF NOT EXISTS subscription_by_time AS
SELECT app_id, user_id, target_user_id, ts, expiry
FROM subscription
WHERE app_id IS NOT NULL
AND user_id IS NOT NULL
AND target_user_id IS NOT NULL
AND ts IS NOT NULL
PRIMARY KEY((app_id, user_id), ts, target_user_id)
WITH CLUSTERING ORDER BY (ts ASC);CREATE KEYSPACE IF NOT EXISTS userstore WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
Table Explanation:
- subscription: Supports Query 2 (app_id = ? AND user_id = ? AND target_user_id = ?).
- subscription_by_time: Supports Query 1 (app_id = ? AND user_id = ? AND ts >= LIMIT 10).
Issue:
- When a subscription relationship is updated (e.g., updating the ts field with the latest timestamp), this triggers an update on the materialized view, where ts is part of the primary key.
- This operation generates tombstones (deleted entries) in the materialized view. Tombstone accumulation leads to slower queries and, in severe cases, may cause cluster instability.
Is there a data structure that satisfies the functional requirements while avoiding tombstone issues?