Hello, I’m trying to find a good practice for column vs. Map column. For example, if I have 200 columns in a table and I usually use only 30% of them, am I better off using a Map columns (collection) instead of having empty columns? I’ve read here that storage is not affected but memory is. Any advice on this?
Yes, storage is not affected by empty columns. They are simply not stored if empty. It is similar in memory.
It used to be that we used a different container for columns storage based on the number of columns in the schema: we used a vector (very efficient lookups but empty columns also take memory) by default and switched to a set for larger column counts (less efficient lookups but empty columns don’t use memory). We now uniformly switched to a compact radix tree, which should also not use any memory for empty columns.
So overall, I think you are better off with columns. Although if you are not yet using clustering keys, you might consider refactoring your schema, so some of these maybe-empty columns are separate rows.
Thank you for your answer! We are effectively organizing our schema to migrate our data from PostgreSQL.
Can you elaborate a little more about the idea of an empty column as a separate row? Like giving me an example and why it will be better that way.
I meant that if there is a pattern of some columns being empty in certain partitions, you can maybe organize your schema such that these are separate clustering rows instead, part of the same partition. Can’t really write an example without knowing more about the schema. But having partitions with lots of columns is also completely fine.
thanks for the awesome information.