Hello, I am the owner of a roblox extension and I had recently encounter dificulties with ScyllaDB that made me consider switching to MongoDB. My application is stable right now but I want to know what I did wrong in my context.
My queries are to SELECT based solely on the following fields server_id , place_id and last_updated. I had created secondary queries for those fields like this:
await client.execute(` CREATE INDEX IF NOT EXISTS game_servers_new_last_updated_idx ON game_servers_new (last_updated ); `);
My issue is with this query: SELECT game_id, active_players, last_updated FROM games WHERE last_updated < ? LIMIT ? ALLOW FILTERING
So my question is:
Even if i have secondary index on last_updated why does it says that my query can only run if ALLOW FILTERING is present?
Where is my mistake? How should be my data constructed?
To answer your question about the correct model schema for your ScyllaDB use case and why your query requires ALLOW FILTERING even with a secondary index:
Summary of the issue:
You created a table with server_id as the primary key, and a secondary index on last_updated.
Query:
SELECT game_id, active_players, last_updated FROM games WHERE last_updated < ? LIMIT ? ALLOW FILTERING
The database says you must use ALLOW FILTERING.
Explanation:
In ScyllaDB (and Cassandra), secondary indexes only allow efficient equality queries (WHERE last_updated = ?), not range queries (WHERE last_updated < ?).
For range queries on a secondary index (<, >, etc.), ScyllaDB does a full table scan behind the scenes and applies the filter after collecting the data. This operation is risky and may be expensive, so the system forces you to use ALLOW FILTERING to acknowledge the load and risk.
If you need to frequently query by ranges (e.g., last_updated < ?), you should change your schema so that last_updated is part of your primary key (partition key or clustering key). For example:
Range queries over secondary indexes will almost always require ALLOW FILTERING, indicating a potentially slow, unoptimized query.
Best Practice: Model your schema so that your most common query fields are part of your primary key (partition/clustering), or use materialized views if you need alternative fast access patterns.
You are a genius. I didnt know that. I perfectly got it now.
I have 1 more easy question about the data modeling:
server_id is a random uuid, and it shall be unique
place_id is a number
Multiple server ids can have the same place_id.
My question is the following:
Why did you opted for PRIMARYKEY(game_id, last_updated) ) instead of PRIMARYKEY(server_id, last_updated) )
My queries are the following:
server_id - exact query
place_id - exact query
only last_updated is a range query
Therefore Is it better for my db to be structured like this? : PRIMARYKEY(server_id, last_updated) )
So we have unique server id
and secondary index on place_id where exact querys will work by default?
Great to hear you got the concept, thanks for the compliments, it’s just about knowing how the DB operates and proper modeling
Regarding your question about the choice of primary key — whether to use PRIMARY KEY (game_id, last_updated) or PRIMARY KEY (server_id, last_updated) — here are some considerations:
Uniqueness & Query Patterns:
Since server_id is a unique random UUID and you will query by it with exact matches, using PRIMARY KEY (server_id, last_updated) is a good design choice. It ensures rows are uniquely identified by server_id and locally clustered by last_updated for efficient range queries per server.
In your case, game_id vs server_id as partition key depends on which identifier you query mostly and which uniquely identifies the row. If your intent is to store data keyed by servers (each server having multiple records distinguished by timestamp), then (server_id, last_updated) makes more sense.
Handling place_id queries:
Since place_id can be shared by many servers, an exact query on place_id would require a secondary index (or materialized view). Scylla handles exact queries on secondary indexes fine. Just be mindful of the performance implications if cardinality of place_id is very high.
To Summarize:
The schema should reflect your most common query patterns and uniqueness constraints.
If you frequently query by server_id and last_updated ranges, structure your table with PRIMARY KEY (server_id, last_updated).
Use a secondary index or materialized view to support queries by place_id since it is not part of the primary key.
This schema aligns well with your use case of unique servers queried by exact ID and timestamp ranges, plus location filtering via secondary indexing.
Thank you so much. Now everything makes way more sense to me. I think I will get back to scylla now that I know what my mistake was. It had incredible performance but I got upset since I didnt understeand what I did wrong.
I also tried right now to use a materialized view but I got:
[CENTRALIZED CACHE] Failed to refill server cache: Only EQ and IN relation are supported on the partition key (unless you use the token() function or ALLOW FILTERING) ResponseError: Only EQ and IN relation are supported on the partition key (unless you use the token() function or ALLOW FILTERING) at FrameReader.readError (/root/BetterBloxAPI/node_modules/cassandra-driver/lib/readers.js:389:17) at Parser.parseBody (/root/BetterBloxAPI/node_modules/cassandra-driver/lib/streams.js:209:66) at Parser._transform (/root/BetterBloxAPI/node_modules/cassandra-driver/lib/streams.js:152:10) at Transform._write (node:internal/streams/transform:171:8) at writeOrBuffer (node:internal/streams/writable:572:12) at _write (node:internal/streams/writable:501:10) at Writable.write (node:internal/streams/writable:510:10) at Protocol.ondata (node:internal/streams/readable:1009:22) at Protocol.emit (node:events:518:28) at addChunk (node:internal/streams/readable:561:12) { info: ‘Represents an error message from the server’, code: 8704, query: ‘SELECT server_id, place_id, last_updated, datacenter \n’ + ’ FROM game_servers_by_last_updated \n’ + ’ WHERE last_updated < ?’ }
Based on ScyllaDB best practices and your handling of the secondary index issue: If your changes remove the need for ALLOW FILTERING by designing queries around primary and clustering keys or by modeling data to access patterns, your solution is aligned with recommended ScyllaDB schema strategies. Secondary indexes are best used for low-cardinality, infrequently updated fields; for anything dynamic or high-volume, it’s better to rely on a schema designed for efficient partition and clustering key reads.
If your approach only needs secondary indexes for rare access paths and keeps core queries fast, this is considered acceptable and in line with current ScyllaDB guidance. If you encounter further performance issues or need a more opinionated schema review, please share your exact table definitions and query patterns—happy to offer more targeted advice!