The ScyllaDB team is pleased to announce the release of ScyllaDB 2025.4.1, a production-ready patch release for ScyllaDB 2025.4 Feature Release.
Related Links
Bug Fixes
The following issues are fixed in this release.
Memory Management and Stability
- Prevent crashes when memory kill thresholds are exceeded
Creating a reader could crash ScyllaDB when memory kill thresholds were exceeded.
Exception propagation was enabled by removing noexcept, allowing graceful handling under memory pressure.
scylladb#27475
Alternator
- Fix use-after-free during internal table initialization
Under rare conditions, Alternator could trigger a use-after-free while preparing ScyllaDB internal tables, potentially leading to crashes.
Paging state encoding was fixed to ensure safe memory handling during Alternator operations.
scylladb#27125
Batchlog and Repair
-
Prevent data resurrection when batchlog replay fails
Failed batchlog replays during repair could incorrectly update repair timestamps, allowing previously deleted data to reappear.
Repair now fails explicitly if batch replay or flush fails, and repair timestamps are updated only after successful completion.
scylladb#24415 -
Harden batchlog replay logic and validation
Batchlog replay could incorrectly skip batches with unknown or invalid versions, masking replay errors.
Replay logic is now coroutine-based, validates batch versions, and fails explicitly when replay errors occur.
scylladb#26766 -
Fail tablet repair on partial batch delivery
Tablet repair previously continued even if some batches failed to send, risking an inconsistent data state.
Tablet repair now fails if any batch is not successfully delivered, ensuring stronger consistency guarantees.
scylladb#26766
Cloud Storage and Streaming
- Classify additional S3 network errors as retryable
Some transient S3 network failures were treated as fatal, causing unnecessary operation failures.
Additional transient error types are now classified as retryable, improving the robustness of S3-backed workflows.
scylladb#27349
Failure Detection and Raft
-
Fix race conditions and flakiness in failure detector tests
Raft failure detector tests suffered from race conditions and intermittent failures.
Timeout handling and scheduling were improved, and abort-related exceptions are now handled consistently.
scylladb#27136 -
Improve direct failure detector efficiency and scheduling
Direct failure detector pings incurred unnecessary overhead and suboptimal scheduling behavior.
The detector now runs in the gossiper scheduling group, avoids redundant cross-core calls, and passes timeouts explicitly.
scylladb#27483 -
Ensure node removal notifications are emitted only once
Raft topology coordination could emit duplicate node removal notifications.
Notifications are now deduplicated to ensure consistent cluster state propagation.
scylladb#27913
Indexes and Query Processing
-
Prevent invalid vector index configurations
Vector indexes could be created with invalid option values or without tablets enabled.
ScyllaDB now rejects invalid configurations and requires tablets for vector indexes.
scylladb#27234 -
Fix query failures with low tombstone page limits
Low query_tombstone_page_limit values could result in misleading “Key column not found” errors.
Query execution was corrected to handle tombstone pagination safely.
scylladb#27001
Monitoring and Tooling
- Extend scylla-node-exporter with ethtool support
Network interface statistics exposed by scylla-node-exporter were incomplete.
The node exporter now includes ethtool, enabling richer network monitoring metrics.
scylladb#27508
SSTables and Storage
- Fix boot failures caused by missing SSTable TOC files
ScyllaDB could fail to boot if SSTable TOC files were missing or modified concurrently.
A new semaphore serializes SSTable linking and component rewrites, preventing corruption and startup failures.
scylladb#25919
Tablets and Topology
-
Prevent conflicting tablet migrations
The tablet scheduler could emit conflicting inter-node and intra-node migrations, leading to incorrect reads.
Scheduler logic was corrected to avoid conflicting migrations during colocation merges.
scylladb#27304 -
Improve topology coordinator robustness during streaming and repair
Certain topology coordinator operations were skipped or executed at incorrect times.
Session IDs are now set correctly for streaming, and repair compaction control is always executed.
scylladb#27867
Vector Search
- Reject invalid primary key restrictions in vector search queries
Vector search allowed queries with unsupported primary key restrictions.
Such queries now fail fast with a clear validation error.
scylladb#27668