Optimize Deletion Through Partitioning

I have the following schema that I tried to tailor to optimize data deletion by leveraging partitioning on the expiry and bucket fields. I thought this approach was helpful when I wanted to bulk delete data based on the expiry date and possibly distribute the data across different buckets for balanced storage and efficiency, as everything in a specific partition will be marked as deleted(or not, at least I thought it would.)

Queries are offloaded to materialized views designed for particular access patterns that use id and expiry compared to the date they were made. I am not sure whether this is recognized in the realm of distributed databases that the base table’s purpose is to optimize deletion and not directly used for retrieval of its data? I need to deal with time-based data need to be purged after a certain period, even if the data is never used through the materialized view.

I appreciate any advice or suggestions.

Thank you :slight_smile:

CREATE TABLE records (
   expiry date, e..g,. '2024-02-06'
   bucket smallint, // derived from id when inserting records. It will be like hash(id) % 1000 for 1000 buckets
   id text, // id which will be unique
   some_relevant_id tinyint, // some id which belongs to id 
   PRIMARY KEY((expiry, bucket), id, segment_id)
);
CREATE MATERIALIZED VIEW records_by_id AS
SELECT *
FROM records
WHERE expiry IS NOT NULL
  AND bucket IS NOT NULL
  AND id IS NOT NULL
  AND some_relevant_id IS NOT NULL
PRIMARY KEY (id, expiry, bucket, some_relevant_id)
WITH CLUSTERING ORDER BY (expiry ASC, bucket ASC, some_relevant_id ASC);

It sounds like TTL would be relevant for your use case. The Expiring Data with TTL (Time to Live) lesson on ScyllaDB University would be a good starting point.
You might also want to look at Time-Window Compaction Strategy (TWCS).

Thank you for the suggestion to explore TTL and TWCS in ScyllaDB University. These features are indeed powerful for managing data with natural expiration patterns, and I appreciate the guidance towards these resources.

However, my use case involves a dynamic expiry value that varies depending on the some_relevant_id , which introduces a level of complexity that might not align perfectly with the standard application of TWCS. While TWCS is optimized for scenarios with predictable, uniform expiration patterns, my dataset features varied TTLs, making it challenging to leverage TWCS for efficient compaction and storage management directly.
My concern revolves around ensuring efficient data deletion that doesn’t affect read operations performed, but this design choice has led me to rely on materialized views to handle my query needs, particularly for accessing data by id, which is not the primary key in my base table schema.

Given this context, I’m curious about the broader community’s experience and best practices regarding this approach. Specifically:

  • Is it a common practice to design the base table in a way that it’s optimized for certain operations (such as write throughput or deletion efficiency) and not directly use it for queries, instead relying on materialized views or secondary indexes to support read operations?
  • In scenarios where the base table’s partition key does not align with the primary query access patterns, how have others managed to maintain high performance and efficiency, especially concerning read operations?
  • Are there any particular considerations or pitfalls one should be aware of when adopting a strategy that relies on materialized views for the majority of the query workload, especially in the context of dynamic data models where aspects like expiry and some_relevant_id can dictate data retention and access patterns?

Thank you for your guidance, and I look forward to any further discussion or advice that can help refine our strategy. :slight_smile:

@Guy

Please correct me if I am wrong, here are my guesses.

Configuring my records table to use the Time Window Compaction Strategy with a 1-day window size might offer indirect benefits to query performance.
TWCS optimizes the compaction process within the base table by organizing data into time-based windows. This leads to more efficient storage and management of data, which indirectly benefits materialized views by ensuring they are updated and maintained with less overhead.

Operations on the ScyllaDB, such as querying, inserting, updating, and deleting data, engage both partitioning logic and SSTable mechanics. For instance, an update operation involves finding the right partition and then creating a new SSTable to reflect the update. By minimizing write amplification in the base table through efficient compaction, TWCS may indirectly reduce the load on the database during updates. This results in faster and more efficient updates to materialized views.
As for the TTL, is deliberately setting slightly different TTL values even for the records within the same partition will prevent all the data from expiring, or being marked for deletion, mitigate the potential performance issues associated with a large volume of data expiring simultaneously?

@hanishi you should be using TWCS when your access patterns involve append-only TTL data with very few (or none at all) updates. You should NOT use it when manually deleting data.

Also notice that the compaction strategy of your choice will primarily be used for your base table only. That is, by default, once you create a view your view table
is going to default to STCS, though you can manually change it after your view gets created.

To be quite honest, I don’t understand what exactly you are trying to accomplish. Your base table feeds an underlying view, but you are primarily querying the view? Is it correct to assume that you want to delete all records with a given expiry date? If yes, then why don’t you do it the other way around (the base table follows your view schema, and then you have a view which has your existing base table schema)?

Another thing you can do, as @Guy mentioned is to rely on TTL and get rid of the expiry and bucket columns. You insert the data with a specific time-to-live period, if that period ever changes, you upsert the record with a new TTL. If you don’t do anything, then past X period of time your entries should get evicted.

Let us know whether this helps in any way. On top of that, as you asked about delete specific patterns, maybe these talks may shed you some light:

1 Like

@felipemendes
Thank you for your insights, I really appreciate.
The expiry date is critical for analytics for understanding or predicting the volume of data that will be deleted in the future I need to keep this field as it serves a different purpose than TTL, acting as a piece of actionable data rather than just a mechanism for data deletion.
However, I can still make use of TTL to handle the automatic deletion of data. When inserting records, the TTL value based on the expiry date can be set to delete the rows automatically. I tried to tailor to optimize data deletion by leveraging partitioning on the expiry and bucket fields, because all rows are to be deleted eventually on that expiry date or later.

Regarding the design of the records table we’re managing with ScyllaDB, it’s important to note that most of the records in the database are never accessed and only exist until their expiry date. This was the rationale behind my initial approach. However, as long as deletion by TTL remains efficient and does not negatively impact read operations on the base table, we could proceed with a simple base table structure without the need for any attached materialized views.

The adoption of TWCS is anticipated to offer significant benefits, particularly in the context of our reconsidered base table schema outlined below:


CREATE TABLE records (
   bucket_id int,
   id text,
   expiry date,
   some_relevant_id tinyint,
   PRIMARY KEY ((bucket_id), id, expiry, some_relevant_id)
) WITH CLUSTERING ORDER BY (id ASC, expiry ASC, some_relevant_id ASC)
AND compaction = { 
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_unit': 'DAYS', 
    'compaction_window_size': '1'
};

Here, the bucket_id could be derived from the id by applying a consistent hash function and modulo operation to limit the number of buckets. This way, although id itself remains unique, several id values share the same partition which will mitigate creation of vast amount of small partitions, each will only contain the data for that specific id. (Please correct me if I am wrong about this)

I still need to filter the record by id when retrieving data I need though.

Could you please take a moment to evaluate this rearranged records table? Your expertise and feedback would be immensely helpful in ensuring it’s optimized for our use case.

Thank you in advance for your time and assistance. :slight_smile: