Originally from the User Slack
@Benaceur_Ayoub: I have a table “stores” where 'id’ is the partition key, but sometimes I want to query by ‘city’, the problem is each city will contain thousands of stores, so if I create an index on ‘city’ would that would make the partitions large for that index so therefor this is not viable solution. is this correct ?
@Felipe_Cardeneti_Mendes: Correct! Such an index would be large and likely very imbalanced
@Benaceur_Ayoub: so there is no way I can query by city ?
@Felipe_Cardeneti_Mendes: Well, if the ratio of stores per city is up to a few thousands an index will do just fine. If it gets to hundreds of thousands then it wouldn’t.
You probably want to make sure you have a StoreByCity kind of table where you add more cardinality with — for example — a zip code range… then you would simply run parallel queries until you walked over all zips for a given city.
Or just full table scan with Spark if this is adhoc