The ScyllaDB team is pleased to announce the release of ScyllaDB 2025.3.5, a production-ready patch release for ScyllaDB 2025.3 Feature Release.
Related Links
Bug Fixes
The following issues are fixed in this release.
Change Data Capture (CDC)
-
Critical errors due to a malformed SSTable exception
-
Issue: Critical errors (
sstables::malformed_sstable_exception) were occurring because a column was reported as missing in the current schema for thecdc_logtable. This could happen when recreating a column too soon. -
Fix: Added a check to prevent recreating a column too soon, and the logic was updated to set the column drop timestamp in the future to prevent the schema mismatch. scylladb#26340, scylladb#27036
-
-
Notification about expiring ERM held for too long was broken
-
Issue: The system failed to properly notify when an Effective Replication Map (ERM) token was held for too long after its expiry.
-
Fix: The notification logic for the expiring ERM held for too long was corrected. scylladb#27141, scylladb#27275
-
Cloud/Connectivity
-
EC2 metadata querying should use back-off for “service unavailable”
-
Issue: When querying EC2 metadata (used by AWS KMS), “service unavailable” responses (e.g., HTTP 503 errors) were not handled with a retry mechanism.
-
Fix: The KMS host was updated to include the HTTP error code in KMS errors, and an exponential backoff-retry mechanism was added specifically for 503 errors. scylladb#27062, scylladb#27063
-
-
S3 client error handling for transient network errors
-
Issue: The S3 client was not classifying all transient network errors as retryable, leading to unnecessary failures.
-
Fix: Error handling for the S3 client was extended to correctly classify additional transient network errors as retryable. scylladb#27349, scylladb#27390
-
Operations/Management
-
Automatic cleanup improvements
-
Issue: Automatic cleanup logic was limited and lacked user-facing controls.
-
Fix: Automatic cleanup was improved to allow a node to opt out of automatic cleanup. This update also introduced a RESTful API to reset the cleanup needed flag, and a
nodetool cluster cleanupcommand to run cleanup on all dirty nodes. scylladb#26866, scylladb#27093
-
-
Maintenance mode functionality was broken
-
Issue: Maintenance mode was non-functional, and the related test (
test_maintenance_mode) did not perform as expected. -
Fix: The service QoS was updated to fall back to the default scheduling group when using the maintenance socket, restoring maintenance mode functionality. scylladb#26816, scylladb#27039
-
-
More logging for load_new_sstables/download_new_sstables
-
Issue: The logging output for
load_new_sstablesanddownload_new_sstableswas insufficient, lacking logging of all option values. -
Fix: The functions were updated to log all option values used during execution, and additional logging was added to streaming operations. scylladb#27299, scylladb#27341
-
-
Node locator missing `_excluded` field in operations
-
Issue: The node locator was not preserving the
_excludedfield inclone()and omitting it from the verbose formatter. -
Fix: The locator logic was updated to preserve and include the
_excludedfield in all necessary places. scylladb#27290
-
Stability/Reliability
-
Conflicting tablet migrations in the scheduler
-
Issue: The tablet scheduler could emit conflicting migrations for the same tablet in different DCs or conflicting inter-node and intra-node migrations, resulting in incorrect reads.
-
Fix: The scheduler logic was updated to prevent emitting conflicting migrations in the plan and during merge colocation. scylladb#26038, scylladb#27304, scylladb#26048, scylladb#27312, scylladb#27330
-
-
Load-and-stream with tablets failing with “Unable to load SSTable”
-
Issue: Load-and-stream operations with tablets would sometimes fail with an “Unable to load SSTable” error.
-
Fix: Synchronization logic was added to the
sstables_loaderto prevent bypassing synchronization when the topology is busy. scylladb#22707, scylladb#26730
-
-
Multiple oversized memory allocation errors with Vnodes
-
Issue: Creating thousands of tables with Vnodes could lead to multiple
seastar_memory - oversized allocationerrors. -
Fix: The issue was resolved by changing the internal type of a table metadata variable. scylladb#26787, scylladb#27198
-
-
Node coredumped after tablet cleanup log line
-
Issue: A node could coredump after logging that tasks were stopped for compactions due to tablet cleanup.
-
Fix: The replica logic was updated to fail a timed-out single-key read on a cleaned-up tablet replica. scylladb#26229, scylladb#27155
-
-
Premature break causes SSTables to be skipped during streaming
-
Issue: A premature loop break in the
tablet_sstable_streamer::streamfunction was causing SSTables to be unexpectedly skipped. -
Fix: The loop break condition in
tablet_sstable_streamer::streamwas fixed. scylladb#26979, scylladb#27153
-
-
Race condition between tablet split and load-and-stream
-
Issue: A race condition could occur between the tablet split process and the load-and-stream operation.
-
Fix: Synchronization logic was implemented to correctly synchronize tablet split and load-and-stream. scylladb#26455, scylladb#26648
-