The ScyllaDB team is pleased to announce the release of ScyllaDB 2025.4.3, a production-ready patch release for ScyllaDB 2025.4 Feature Release.
Related Links
Bug Fixes
The following issues are fixed in this release.
Native backup on AWS S3
- The AWS error handling logic did not correctly process all restartable nested exceptions, which could lead to failed operations during transient cloud service disruptions. The logic was updated to correctly fix nested exception handling and explicitly handle all restartable nested exception types. This improves resilience and reliability for AWS-based operations by ensuring automatic retries on appropriate service errors.
scylladb#28243, scylladb#28344 - An update was required for S3 client functionality, which depends on the
seastarsubmodule. Theseastarsubmodule was updated with assorted fixes for the S3 client. This improves stability and functionality for S3-related operations.
scylladb#28482
Commitlog
- A race condition or corruption in the commitlog could cause startup failure for a node when encountering a file with a corrupt file header. Commitlog replay will now handle files with a corrupt file header (non-zero) as data loss instead of as a fatal startup failure, which increases robustness during node startup and recovery from commitlog issues.
scylladb#27682
Native backup Connection & DNS
- The connection factory needed improvements for handling network instability and DNS resolution. The connection factory was enhanced to introduce a TTL timer, retry on failures, and use all resolved DNS addresses, in addition to general cleanup and refactoring. This significantly improves connection reliability and fault tolerance, especially in dynamic environments with frequent DNS updates or transient failures.
scylladb#28404
Vector Search - Data Modeling & Querying
-
When passing a null vector to an Approximate Nearest Neighbor (ANN) query, the system would fail with a non-informative error. The CQL interface was updated to fail with a better, more explicit error when a null vector is passed to an ANN query, which improves the user experience and debugging process by providing clearer error messages for vector search queries.
scylladb#28052 -
The default compression change for CQL to
LZ4WithDictsCompressorwas not applied consistently to all table types, specifically in Alternator and Materialized Views. The schema initialization process was updated to applysstable_compression_user_table_optionsto CQL auxiliary and Alternator tables, which ensures consistent performance and space usage across CQL, Alternator, and Materialized View table types.
scylladb#26914
Database & Internal
-
The
system_replicated_keys keyspacewas not correctly marked as a system keyspace, leading to incorrect internal management, and thereplicated_key_providerrequired theKSNAMEto be made public. Thesystem_replicated_keys keyspaceis now correctly marked and handled as a system keyspace, and theKSNAMEwas made public. This ensures correct internal management and behavior of system keyspaces and allows for proper external referencing.
scylladb#27903, scylladb#28237 -
The service layer was not correctly propagating the topology guard to the Replication Backpressure and Node Operations (RBNO) service. The service was updated to pass the topology guard to RBNO, which prevents assertion failures and ensures cluster stability during topology changes.
scylladb#28298
Raft & Topology
- A node could sometimes remain in the Raft topology with a pending leave request, creating an inaccurate cluster state. The topology coordinator now completes pending operations for a replaced node, which ensures a cleaner and more accurate cluster topology state, particularly after node replacement operations.
scylladb#27990 - Disabling tablet balancing via the REST API (/storage_service/tablets/balancing) did not properly integrate with the internal topology request system and failed to interrupt the tablet scheduler immediately. Disabling balancing via REST now correctly goes through a topology request, and the RPC for balancing disabling will preempt tablet transitions, ensuring the interruption of the tablet scheduler. This guarantees atomic, consistent disabling of load balancing, ensuring prompt cessation of balancing activity.
scylladb#27647, scylladb#27210
Repair
- The repair service lacked session support for the
rebuild_with_repairoperation. Session support has been added torepair_service::rebuild_with_repair, which enables more complex and stateful repair operations.
scylladb#27759 - Incorrect values were reported for
progress_totalandprogress_completedfor tablet repair tasks. The reporting logic was corrected, and progress reporting support was added to the tablet repair task. This provides accurate, detailed visibility into the progress and status of tablet repair operations.
scylladb#26896, scylladb#22564 - Memory corruption was detected when running a specific repair test with disjoint rows and a different shard count. The logic for
sstable_list_to_mark_as_repairedwas fixed to work correctly with a multishard writer, which eliminates a memory safety issue and improves repair operation reliability.
scylladb#27666, scylladb#28064
Storage & SSTables
- Concurrent
SELECT ... FROM MUTATION_FRAGMENTS(...)queries alongside a regularSELECTon the same partition could lead to a segmentation fault (nullptr dereference). The row cache reader logic was updated to pass a cache tracker to the snapshot inmake_nonpopulating_reader(), which prevents the nullptr dereference and improves system reliability under concurrent query load.
scylladb#26847, scylladb#28279 - A refresh of load statistics (
load_stats) could fail, throwing ano_such_column_familyerror and incorrectly handling dropped tables. The refresh logic was fixed to correctly handle dropped tables and prevent the error. This ensures accurate and reliable load statistics reporting, improving monitoring capabilities.
scylladb#28470 - An assertion failure could occur when a node was in maintenance mode due to validation logic, and the topology was not properly set up. The system was updated to skip
validate_read_replicaand is now updated to properly set up the topology in maintenance mode. This ensures node stability and successful completion of maintenance operations.
scylladb#27988, scylladb#28498 - A non-UTF8 character error could occur during a database snapshot test. A fix was applied to the serialization of the partition key, which prevents crashes and ensures data integrity during database snapshot and key serialization.
scylladb#28195
Streaming
- A resource leak was detected in the streaming semaphore after new nodes started streaming. The handling of base resources in the
reader_concurrency_semaphorewas improved, which prevents resource exhaustion and improves long-term stability during node operations like adding or replacing nodes.
scylladb#28083, scylladb#28245 - A use-after-free memory bug was present in the
streaming_task_impl::runfunction of the node operations service. Coroutine lambda wrappers were removed innode_ops, which eliminates a critical memory safety bug and improves the reliability of node operations.
scylladb#28200 - The streaming process did not consistently use a session variable. The streaming service was updated to use a session variable for streaming, which improves correctness and consistency for streaming operations.
scylladb#28298
Vector Search - Permissions
- Vector search permissions lacked the necessary scope to cover CDC streams and timestamps. The fix adds CDC streams and timestamps to vector search permissions, which ensures proper access control and security when using these features with vector search.
scylladb#28537