[RELEASE] ScyllaDB 2025.4.5

The ScyllaDB team is pleased to announce the release of ScyllaDB 2025.4.5, a production-ready patch release for ScyllaDB 2025.4 Feature Release.

Related Links

Bug Fixes

Compaction

  • The internal compaction function maybe_wait_for_sstable_count_reduction() had a bug that allowed it to hang indefinitely, blocking compaction progress. This issue is resolved by fixing the function’s logic, which improves compaction process reliability and prevents indefinite system hangs.
    scylladb#28801

Networking

  • A bug in the transport layer caused connection code to consume semaphore units incorrectly, which could result in connections hanging forever in the AUTHENTICATING state. The connection code is fixed to consume only the semaphore units initially taken, resolving the issue of hung connections and improving client-node connection stability.
    scylladb#28715

  • Redundant futurize_invoke calls were present within the counted data sink and source of the transport layer. These redundant calls have been removed, resulting in a minor efficiency improvement in the core networking layer.
    scylladb#27526

Kubernetes

  • For users running in Kubernetes (k8s) environments, stalls were disabled by default, which can obscure potential performance issues. A configurable blocked-reactor-notify-ms parameter has been added to the Docker/distribution settings, which provides operators with control over reactor stall notifications in containerized environments.
    scylladb#26971

Raft

  • Excessive INFO level logging in hints during topology changes was occurring, which was exacerbated by the repeated generation of notifications about released Raft nodes. The fix addresses both issues by ensuring the released node notification is generated only once and correcting the excessive logging, significantly reducing log volume during cluster state changes.
    scylladb#28301, scylladb#28611

Reliability

  • A critical concurrency bug existed where a concurrent group 0 modification could occur during a keyspace drop operation, leading to unexpected behavior or crashes. The fix ensures group 0 state is modified safely during keyspace drop, preventing concurrent metadata mutations and improving robustness of schema operations.
    scylladb#25938

  • A deadlock was possible in the migration listener component due to an issue with nested notifications during schema updates, highlighted by the flakiness of test_mv_build_during_shutdown. The fix resolves this deadlock, which improves system stability and reliability, especially during concurrent schema migrations and materialized view operations.
    scylladb#27364, scylladb#28557

  • A lambda-coroutine fiasco (an internal C++ bug) was found within the hint_endpoint_manager.cc component responsible for managing node hints. This issue has been fixed, leading to improved stability and reliability of the internal hints mechanism.
    scylladb#27520, scylladb#27732

  • A race condition could occur during commitlog shutdown if reserve replenishment finished before entering the allocation call. This race condition is resolved by ensuring the replenish queue is always aborted upon loop exit, leading to a cleaner and more reliable shutdown process.
    scylladb#28678, scylladb#28692

S3

  • Multipart S3 uploads previously lacked a concurrency limit, which could lead to resource contention and instability on ScyllaDB nodes during large uploads. The fix limits the multipart upload concurrency in the S3 client, improving system stability during backup and restore operations with S3.
    scylladb#28666

  • The internal logic for calculating S3 multipart part-size and the number of parts could result in an invalid number of parts greater than the 10k S3 limit for very large files. The S3 client logic has been corrected to respect the 10k limit and properly calculate part sizes, ensuring that large S3 multipart uploads are successful.
    scylladb#28696

Schema

  • DESCRIBE statements were incorrectly listing internal Paxos state tables, which could clutter the output of schema inspection commands. This issue is resolved by fixing the DESC TABLES/KEYSPACE/SCHEMA statements to hide these internal paxos state tables, which improves the clarity of schema inspection by showing only user-facing objects. scylladb#28183, scylladb#28507

Vector Search

  • The HTTPS vector search client could fail with a C++ error, including a missing timeout on the TLS handshake and issues with CA certificate rewriting logic. The fix adds the proper timeout handling and stabilizes the TLS connections and certificate validation flows, which improves the reliability of vector search clients configured for HTTPS.
    scylladb#28012, scylladb#28642