I am in the process of finalizing my proof-of-concept of switching from PostgreSQL to ScyllaDB for my commercial project that I’ve been working on for the past 3 years.
I have a working code for the data migration and most of the main operations that the project currently does for the existing PostgreSQL database.
I am a bit stuck on one dilemma.
My database has been growing steadily by approximately 21-28 GB every day in the past year, which is roughly 50-70 million rows per day.
Each row contains 12 numeric and timestamp fields, but also 4 text fields, each value varying between 1 and 6 UTF-8 characters long.
Those text fields contain values from 2 different lists of strings, each of which contains fewer than 300 unique string values.
1 of those 4 fields is going to be part of the composite partition key (the other part of the partition key is a date) in my ScyllaDB table.
In other words, a substantial percentage of my database contains repeated values in the NoSQL/denormalized schema.
Obviously, in a relational database those fields would contain foreign keys to one of the two lookup tables.
Although space is not an issue yet, but I anticipate it to become an issue in less than half a year.
Would you recommend that I create and use (in the client side code) the lookup tables, or would I be better off letting ScyllaDB split the table between multiple servers based on partitions?
I estimate that the cost of renting some extra servers (if the schema is not optimized for space savings) is roughly equivalent (or at least is comparable to) the cost of dealing with the increased code complexity if the schema is optimized for space.
So, each row is something close to 0.3 - 0.5 Kb in size.
How significant would database space savings be if I replace roughly 12 UTF8 characters with 4 short-integer foreign key values, but add a database overhead of 2 extra tables and all related compaction and other maintenance etc.?
I do not have nearly enough experience with ScyllaDB to figure this out, and I would greatly appreciate a nudge in either of these 2 directions.
Thank you very much in advance.