Data model with a lot of empty columns, collections

Guy · November 15, 2022, 4:21pm

Hello, I’m trying to find a good practice for column vs. Map column. For example, if I have 200 columns in a table and I usually use only 30% of them, am I better off using a Map columns (collection) instead of having empty columns? I’ve read here that storage is not affected but memory is. Any advice on this?

Guy · November 15, 2022, 4:22pm

Yes, storage is not affected by empty columns. They are simply not stored if empty. It is similar in memory.
It used to be that we used a different container for columns storage based on the number of columns in the schema: we used a vector (very efficient lookups but empty columns also take memory) by default and switched to a set for larger column counts (less efficient lookups but empty columns don’t use memory). We now uniformly switched to a compact radix tree, which should also not use any memory for empty columns.
So overall, I think you are better off with columns. Although if you are not yet using clustering keys, you might consider refactoring your schema, so some of these maybe-empty columns are separate rows.

Guy · November 15, 2022, 4:23pm

Thank you for your answer! We are effectively organizing our schema to migrate our data from PostgreSQL.
Can you elaborate a little more about the idea of an empty column as a separate row? Like giving me an example and why it will be better that way.

Guy · November 15, 2022, 4:24pm

I meant that if there is a pattern of some columns being empty in certain partitions, you can maybe organize your schema such that these are separate clustering rows instead, part of the same partition. Can’t really write an example without knowing more about the schema. But having partitions with lots of columns is also completely fine.

alexsunny123 · November 22, 2022, 10:54am

thanks for the awesome information.

Topic		Replies	Views
Data modeling question: save space using lookup tables? ScyllaDB	4	434	November 24, 2023
Select * with static columns efficiently ScyllaDB data-model	2	247	December 14, 2023
How do I model a schema such that I avoid using space University and Training data-model , open-source	1	170	March 12, 2024
Efficient use of cache with different numbers of columns on Select queries ScyllaDB data-model , performance	1	160	December 19, 2023
What are the differences between column families in Cassandra's data model compared to Bigtable? ScyllaDB bigtable , data-model	1	2143	December 5, 2022

Data model with a lot of empty columns, collections

Related topics