Querying by non partition key column, creating an index

Guy · March 31, 2024, 8:05am

Originally from the User Slack

@Benaceur_Ayoub: I have a table “stores” where 'id’ is the partition key, but sometimes I want to query by ‘city’, the problem is each city will contain thousands of stores, so if I create an index on ‘city’ would that would make the partitions large for that index so therefor this is not viable solution. is this correct ?

@Felipe_Cardeneti_Mendes: Correct! Such an index would be large and likely very imbalanced

@Benaceur_Ayoub: so there is no way I can query by city ?

@Felipe_Cardeneti_Mendes: Well, if the ratio of stores per city is up to a few thousands an index will do just fine. If it gets to hundreds of thousands then it wouldn’t.

You probably want to make sure you have a StoreByCity kind of table where you add more cardinality with — for example — a zip code range… then you would simply run parallel queries until you walked over all zips for a given city.

Or just full table scan with Spark if this is adhoc

Topic		Replies	Views
Data Modeling question, choosing the Partition Key, Clustering Key, Indexing and performance impact ScyllaDB data-model , cql , secondary-index	0	149	May 22, 2024
Tradeoff between allow filtering and creating an index ScyllaDB data-model	3	598	December 26, 2022
Using IN in a query for a specific partition, is the entire partition fetched? ScyllaDB data-model , performance	0	24	July 23, 2024
How to model my data when having subgropus of data with vastly different volumes ScyllaDB data-model , large-partitions , hot-partition , data-types	0	12	December 9, 2024
Materialized Views and Indexing, filtering columns by range, ALLOW FILTERING ScyllaDB data-model , materialized-views , secondary-index	0	132	May 8, 2024

Querying by non partition key column, creating an index

Related topics