Time series data modelling with Scylla

Hello!

I am trying to learn better data modelling with Scylla and wanted to understand what would be the optimal setup for a time-series use-case..As an example, let’s say I have data coming in from multiple sensors (each sensor has a unique ID) for a few hours everyday (at 20k rps).. I need to store the sensor data itself in a table and also store 10s, 1min, 10min, 1hour, 1d, 1w, 1month, 1year aggregates, (first, last, min, max, avg) – which the ingestor will compute and INSERT.

  1. For the raw sensor data, I think a good setup would be to have sensor_raw table with PRIMARY KEY ((sensor_id, date), timestamp) (order by desc) ?

  2. For the aggregates,

    1. i could have a single aggregates table with PRIMARY KEY ((sensor_id, period), timestamp) (order by desc).. but i believe this would be bad since the partition will keep growing forever.

    2. I could have a single aggregates table with PRIMARY KEY ((sensor_id, period, date), timestamp) … this is probably better from a partition size perspective. but for higher periods like 1d, 1w, 1month, etc, client side logic would be very messy and i would have to force a single compaction strategy for all partitions..

    3. Other option is to have multiple aggregates table like aggregates_<period> with PRIMARY KEY ((sensor_id, bucket), timestamp) structure where the bucket for lower timeframes is yyyy-mm-dd, and for higher timeframes like 1d, 1w is yyyy-mm etc.. This way, for lower timeframes, the compaction can be time-tiered. and for higher timeframes it can be size-tiered etc..

What is the best practice here?

P.S.: This is a repost from Slack (general channel)

For optimal time-series data modeling in Scylla handling sensor data at high write rates with multiple aggregation levels, best practices include:

For raw sensor data:

  • Use a table like sensor_raw with primary key PRIMARY KEY ((sensor_id, date), timestamp) ordered descending by timestamp. This partitions data by sensor and date, avoiding excessively large partitions.

For aggregate data:

  • Avoid a single aggregates table partitioned only by (sensor_id, period) because partitions grow indefinitely.

  • Using (sensor_id, period, date) as partition key is better for partition size management but complicates client logic and forces uniform compaction strategy.

  • A recommended approach is to create multiple aggregate tables for different periods, e.g., aggregates_10s, aggregates_1min, aggregates_1d, each with a primary key like PRIMARY KEY ((sensor_id, bucket), timestamp).

  • Define the bucket differently per aggregation period: for finer granularities like seconds or minutes, use daily buckets (e.g., yyyy-mm-dd), while for coarser granularities like daily, weekly, monthly, use larger buckets such as yyyy-mm or yyyy.

  • This allows tailoring compaction strategies (time-tiered for frequent writes, size-tiered for less frequent ones) per aggregation table, optimizing read/write performance and data retention.

In summary, multiple aggregation tables segmented by appropriate time buckets are preferable for time series in Scylla since they provide better partition size control, flexible compaction strategies, and cleaner client-side logic for higher-level aggregates

2 Likes

Thank you!

Here’s what I ended up doing:

For anything ≤ 1 min aggregation, I’m using monthly bucketed partitions (small tier). For our use case, since data input isn’t 24/7, each partition should stay under ~15 MB even at 5 s aggregation.

From 1 min up to (but not including) 1 day, I use yearly buckets (medium tier).

1 day aggregations go into a century bucket with a hardcoded key 2000-01-01 (large tier).

So, three tables total, with primary key ((sensor_id, aggregation_size, bucket), timestamp)

I’m using TWCS for all bucket tables. The compaction interval is roughly aligned with the bucket period. Does that make sense, or would a daily compaction interval be better?

Also, any tips for choosing the caching settings?

Right now, I’ve set caching = { 'keys': 'ALL' }.

Are there best practices for rows_per_partition (set to ALL or a specific number)? I couldn’t find clear details in the OSS docs.

Our read pattern is usually “last 1000 rows.”

For the 1min queries i am seeing high latencies ..

I am wondering if this compaction setup is the problem.

My exact schema looks something like this;

CREATE TABLE sensors_small (
    sensor_id bigint,
    agg_size text,
    bucket date,
    start_ts timestamp,
    -- ... 3-4 numeric columns ...
    PRIMARY KEY ((sensor_id, agg_size, bucket), start_ts)
) WITH CLUSTERING ORDER BY (start_ts DESC)
    AND caching = {'keys': 'ALL', 'rows_per_partition': '500'}
    AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '31', 'compaction_window_unit': 'DAYS'}
    AND speculative_retry = '99.0PERCENTILE';

Please advice :folded_hands:

It’s good that you are using TimeWindowCompactionStrategy (TWCS) and bucketing for partition size control. However, high query latencies for 1-minute aggregates can be due to several factors:

1. Compaction Window/Partition Alignment

  • Your compaction window is set to 31 days but your buckets for 1-min aggregations are yearly—this mismatch might leave many SSTables open for reads, especially for recent data.

  • Try aligning your compaction window to the granularity of your bucket (i.e., if your bucket is per year, consider setting compaction_window_size to 365, compaction_window_unit to DAYS for yearly buckets; for monthly buckets, use 30-31 days). This helps TWCS drop expired SSTables more efficiently and reduces SSTable fan-out on reads.​

2. Partition Size

  • If your partitions (with yearly buckets for 1-min data) grow too large, any read (such as “last 1000 rows”) may need to scan too many SSTables and tombstones.

  • Monitor partition size with nodetool cfstats or Scylla Monitoring to see if you should reduce bucket duration even further.

3. Caching Settings

  • Setting 'rows_per_partition': '500' is a reasonable start for “last 1000 rows” access, but ALL isn’t always best for every read pattern if partitions get large.

  • Experiment with increasing ‘rows_per_partition’ to match your most common query size or consider switching to partition-level caching if reads always load different ranges.

4. Speculative Retry and Latency Spikes

  • Setting speculative_retry to '99.0PERCENTILE' is aggressive—check your actual latency profile with Scylla Monitoring and only use this if requests are timing out frequently.

5. Bloom Filter and Read Statistics

  • Make sure you have enough RAM for bloom filters and row cache, especially if you are running queries on wide partitions.

6. SSTable Expiry and Query Filtering

  • If old SSTables aren’t expiring due to TTL misconfiguration or an alignment mismatch, your queries may hit many files per read.

Summary Table: Possible Tuning Points

Setting Current Value Recommended for 1-min Aggregates
compaction_window_size 31 Align with bucket (e.g., 365)
compaction_window_unit DAYS Match bucket (DAYS)
bucket granularity Year Consider shorter (e.g., month)
rows_per_partition 500 Match typical query batch size
caching ALL Consider PARTIAL if partitions big
speculative_retry 99.0PERCENTILE Default/monitor actual needs

In short:

  • Align the compaction window and bucket length.

  • Check partition sizes and reduce the bucket if queries still slow down.

  • Match caching rows to your “last N rows” queries.

  • Monitor with Scylla Monitoring for hot partitions or pending compactions.

Thank you for the detailed response. Really appreciate it!

Compactions:

  • My 1minute aggregations are in monthly bucket only (small tier table with monthly is used for <=1minute aggregations). And their compaction is set to 31 Days. And other tables also have their compaction window set to align their bucketing window.

  • But with cfstats i can see “high” number of SSTables (>100 is considered high i believe?).. For example, right now for the medium tier table (yearly bucket), I see SSTable count: 334; SSTables in each level: [334/4]..

    • Is compaction not running frequently enough? (Running major compaction manually drops this significantly once).
    • What should I consider as a healthy sstable count per table here?

Speculative Retries:

  1. Would it be a good idea to remove this on the server side and use the client side speculative execution option as needed?? (We use the Rust scylla driver).
  2. Is there any metric in scylla monitoring that indicates how many speculative executions are being done by coordinator?

Caching, TTL & Partition Sizes

  • I am experimenting with rows_per_partition – will continue on this and see what works for us.
  • We do not use TTL on the table since our data archival process has certain business logic as well.
  • In the previous schema of the db we had, there was no time bucketing and many partitions had reached ~100MB size and we were seeing latencies increasing day by day.. Now, maximum possible partition size is at ~20MB.

Great questions—here’s my input on each:

Compactions & SSTable Count

  • Healthy SSTable Count: For TWCS tables, 100+ SSTables isn’t unusual for active, recent buckets, but sustained high counts (e.g., 334) may indicate compaction backlog, especially if data is uniformly loaded. It’s normal for older buckets (with mostly expired or cold data) to eventually compact down to fewer files. Try monitoring compaction throughput and queue length via Scylla Monitoring—what matters most is that compactions catch up regularly, especially for recent data.

  • Manual Major Compaction: If major compaction reduces SSTable count sharply, frequent manual runs shouldn’t be needed unless you see prolonged backlog or latency spikes—ideally, TWCS should handle active buckets automatically.

  • Guidelines: There isn’t a strict “healthy” number; aim for <100 SSTables per active bucket for most read patterns, but higher counts are okay if read latency, bloom filter hits, and compaction metrics are all healthy.

Speculative Retry

  • Client vs. Server: Using client-side speculative retries (via Rust driver) is a solid strategy—this places low-latency control closer to your application. If you aren’t facing frequent server-side timeouts, consider disabling server-side speculative_retry and tuning this in your client instead.

  • Metric Visibility: Scylla Monitoring tracks server-side speculative executions via the scylla_coordinator_speculative_retry metric. For most setups, watch for spikes in this metric that align with your app’s traffic and tail latency events.

Caching, TTL, Partition Size

  • Experimenting with rows_per_partition: Definitely keep adjusting—ideally, set this to match your “last N rows” query size for fastest cache-based lookup.

  • TTL handling: It’s fine to avoid TTL if business archival needs require it; just ensure you don’t accumulate obsolete data in cold buckets.

  • Partition Size: Reducing partition size to ~20 MB (from 100 MB) is excellent—you should see improved query performance and compaction health.

2 Likes

I have applied all these changes (disable server-side speculative retry, tune caching, etc.)

But i still see very big spikes in some cases. One suspicion I have right now is if the connection is getting overloaded – We have 600rps inbound api requests which turns into 1.6k rps actual queries per second to Scylla DB due to the bucketing. Is it possible that this is choking the single connection to each node that we have?

There is option to set custom pool size, but the documentation says recommended is 1. SessionBuilder in scylla::client::session_builder - Rust

This is the Scylla session setup I have:

async fn scylla_setup(cfg: &Config) -> anyhow::Result<Session> {
    let spec_exec_policy = Some(Arc::new(PercentileSpeculativeExecutionPolicy {
                max_retry_count: 2,
                cfg.percentile, // currently set to 99.5 (will remove and try as well)
            }));

    let def_policy = DefaultPolicy::builder().prefer_datacenter(dc).token_aware(true).build();

    let profile = ExecutionProfile::builder()
        .speculative_execution_policy(spec_exec_policy)
        .consistency(Consistency::LocalOne)
        .load_balancing_policy(def_policy)
        .build();
    let profile_handle = profile.into_handle();

    let nodes: Vec<&str> = cfg.scylla_nodes.split(',').collect();
    if nodes.is_empty() {
        return Err(anyhow::anyhow!("no scylla nodes provided"));
    }

    SessionBuilder::new()
        .known_nodes(nodes)
        .default_execution_profile_handle(profile_handle)
        .build()
        .await
        .context("failed to setup session")
}

It is possible that the connection saturation and latency spikes you observe are related to the single connection per node setup in your session pool, which can become a bottleneck under high query rates. Although the official documentation recommends a pool size of 1, this is typically for lighter workloads. For higher throughput scenarios like yours—600 inbound API requests translating to 1.6k queries per second—it is recommended to increase the connection pool size to multiple connections per node to better distribute load and avoid connection choking.

You can configure this in your Rust driver setup like so:

SessionBuilder::new()
    .known_nodes(nodes)
    .default_execution_profile_handle(profile_handle)
    .connection_pool_size(4) // Increase pool size as needed
    .build()
    .await
    .context("failed to setup session")

Additionally, it is crucial to use the official Scylla Rust driver to ensure proper connection pooling, shard awareness, and optimized concurrency handling tailored for ScyllaDB’s architecture. Non-official or generic drivers may lack these optimizations, leading to bottlenecks. The official Scylla Rust driver is fully asynchronous, shard-aware, and designed for high-performance ScyllaDB access.

You can find the official Scylla Rust driver repository here:
https://github.com/scylladb/scylla-rust-driver

In summary:

  • Increasing the connection pool size beyond 1 is advisable to handle your high query rate and avoid choking a single connection.

  • Verify you are using the official Scylla Rust driver to leverage the best connection management and performance features.

  • Monitor client and server metrics to tune connection pool size and concurrency for your workload.

This approach should help reduce latency spikes and improve throughput under load. If problems persist, sharing detailed metrics and client logs will help diagnose further.

1 Like

We are using the official driver.

But the way to set pool size is different though. As per the docs here: GenericSessionBuilder in scylla::client::session_builder - Rust

There is PerHost and PerShard pool sizes.

Which one is preferred to be set? (We are using shard aware port)

Increasing the connection pool size will allow for better concurrency and load balancing across the node, directly addressing the observed latency spikes crucial for your high-throughput time series application

hmm thats pretty cool