The expansion of the 180-node cluster has failed

I currently have a 180-node cluster, distributed across 3 availability zones (AZs), with 60 nodes in each AZ. When I tried to add a new node to one of the AZs, the node failed during the expansion process. The reason for the failure was that other nodes in the cluster gradually went down (DN) during the expansion, and the cause of the DN was bad_alloc.

Note: I was performing the expansion with business testing, during which a single node had a write load of 1.5k and already contained 1TB of existing data.

I want to know why the expansion failed and how to fix it.

Here are the logs from the expansion node.

Oct 25 11:29:39 hostname-01 scylla[67026]:  [shard  0:stre] repair - bootstrap_with_repair: started with keyspace=system_distributed_everywhere, nr_ranges=46337
Oct 25 11:31:58 hostname-01 scylla[67026]:  [shard  0:main] raft_group_registry - marking Raft server 60185caf-e956-415b-9e77-5821513b7ed0 as dead for raft groups
Oct 25 11:32:04 hostname-01 scylla[67026]:  [shard 17:stre] gossip - failure_detector_loop: Send echo to node 10.235.195.95, status = failed: seastar::rpc::timeout_error (rpc call timed out)
Oct 25 11:32:04 hostname-01 scylla[67026]:  [shard 17:stre] gossip - failure_detector_loop: Mark node 10.235.195.95 as DOWN
Oct 25 11:32:04 hostname-01 scylla[67026]:  [shard  0:stre] gossip - InetAddress 10.235.195.95 is now DOWN, status = NORMAL
Oct 25 11:32:04 hostname-01 scylla[67026]:  [shard  0:stre] node_ops - bootstrap[a7984c7a-e972-47fe-b159-ad74f17e5fa6]: Failed to get heartbeat response from node=10.235.195.95
Oct 25 11:32:56 hostname-01 scylla[67026]:  [shard 35:stre] rpc - client 10.235.195.95:65363: server connection dropped: recv: Connection reset by peer
Oct 25 11:32:56 hostname-01 scylla[67026]:  [shard 44:stre] rpc - client 10.235.195.95:60764: server connection dropped: recv: Connection reset by peer
Oct 25 11:32:56 hostname-01 scylla[67026]:  [shard 42:stre] rpc - client 10.235.195.95:57450: server connection dropped: recv: Connection reset by peer
Oct 25 11:32:56 hostname-01 scylla[67026]:  [shard 17:stre] rpc - client 10.235.195.95:56897: server connection dropped: recv: Connection reset by peer
Oct 25 11:32:56 hostname-01 scylla[67026]:  [shard 38:stre] rpc - client 10.235.195.95:63302: server connection dropped: recv: Connection reset by peer
Oct 25 11:32:56 hostname-01 scylla[67026]:  [shard 21:stre] rpc - client 10.235.195.95:65301: server connection dropped: recv: Connection reset by peer
Oct 25 11:32:56 hostname-01 scylla[67026]:  [shard 25:stre] rpc - client 10.235.195.95:63241: server connection dropped: recv: Connection reset by peer
Oct 25 11:33:06 hostname-01 scylla[67026]:  [shard  0:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: sync data for keyspace=system_distributed_everywhere, status=started
Oct 25 11:33:08 hostname-01 scylla[67026]:  [shard  1:stre] seastar_memory - oversized allocation: 3338240 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at 0x60d608e 0x60d6650 0x60d6928 0x5b91736 0x5b91a81 0x5b94781 0x43417a8 0x439ffe0 0x43a0f50 0x5bd57ef 0x5bd6ac7 0x5bfa673 0x5ba552a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x112763
                                                                               seastar::smp_message_queue::async_work_item<seastar::sharded<repair_service>::invoke_on<repair::data_sync_repair_task_impl::run()::$_0::operator()()::{lambda(repair_service&)#1}, , seastar::future<void> >(unsigned int, seastar::smp_submit_to_options, repair::data_sync_repair_task_impl::run()::$_0::operator()()::{lambda(repair_service&)#1}&&)::{lambda()#1}>
Oct 25 11:33:08 hostname-01 scylla[67026]:  [shard  9:stre] seastar_memory - oversized allocation: 3338240 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at 0x60d608e 0x60d6650 0x60d6928 0x5b91736 0x5b91a81 0x5b94781 0x43417a8 0x439ffe0 0x43a0f50 0x5bd57ef 0x5bd6ac7 0x5bfa673 0x5ba552a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x112763
                                                                               seastar::smp_message_queue::async_work_item<seastar::sharded<repair_service>::invoke_on<repair::data_sync_repair_task_impl::run()::$_0::operator()()::{lambda(repair_service&)#1}, , seastar::future<void> >(unsigned int, seastar::smp_submit_to_options, repair::data_sync_repair_task_impl::run()::$_0::operator()()::{lambda(repair_service&)#1}&&)::{lambda()#1}>
Oct 25 11:33:08 hostname-01 scylla[67026]:  [shard 16:stre] seastar_memory - oversized allocation: 3338240 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at 0x60d608e 0x60d6650 0x60d6928 0x5b91736 0x5b91a81 0x5b94781 0x43417a8 0x439ffe0 0x43a0f50 0x5bd57ef 0x5bd6ac7 0x5bfa673 0x5ba552a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x112763
                                                                               seastar::smp_message_queue::async_work_item<seastar::sharded<repair_service>::invoke_on<repair::data_sync_repair_task_impl::run()::$_0::operator()()::{lambda(repair_service&)#1}, , seastar::future<void> >(unsigned int, seastar::smp_submit_to_options, repair::data_sync_repair_task_impl::run()::$_0::operator()()::{lambda(repair_service&)#1}&&)::{lambda()#1}>
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: get_sync_boundary: got error from node=10.249.140.11, keyspace=system_distributed_everywhere, table=cdc_generation_descriptions_v2, range=(7523392716462325836,7524310420715959462], error=seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 26 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: shard=4, keyspace=system_distributed_everywhere, cf=cdc_generation_descriptions_v2, range=(7523392716462325836,7524310420715959462], got error in row level repair: seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 26 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: get_sync_boundary: got error from node=10.249.140.11, keyspace=system_distributed_everywhere, table=cdc_generation_descriptions_v2, range=(7524310420715959462,7524635915212153831], error=seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 26 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: shard=4, keyspace=system_distributed_everywhere, cf=cdc_generation_descriptions_v2, range=(7524310420715959462,7524635915212153831], got error in row level repair: seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 26 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: get_sync_boundary: got error from node=10.249.140.11, keyspace=system_distributed_everywhere, table=cdc_generation_descriptions_v2, range=(7524635915212153831,7524678583898654488], error=seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 27 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: shard=4, keyspace=system_distributed_everywhere, cf=cdc_generation_descriptions_v2, range=(7524635915212153831,7524678583898654488], got error in row level repair: seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 27 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: get_sync_boundary: got error from node=10.249.140.11, keyspace=system_distributed_everywhere, table=cdc_generation_descriptions_v2, range=(7524678583898654488,7525584058137208279], error=seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 27 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: shard=4, keyspace=system_distributed_everywhere, cf=cdc_generation_descriptions_v2, range=(7524678583898654488,7525584058137208279], got error in row level repair: seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 27 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: get_sync_boundary: got error from node=10.249.140.11, keyspace=system_distributed_everywhere, table=cdc_generation_descriptions_v2, range=(7525584058137208279,7526092835354564839], error=seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 26 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: shard=4, keyspace=system_distributed_everywhere, cf=cdc_generation_descriptions_v2, range=(7525584058137208279,7526092835354564839], got error in row level repair: seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 26 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: get_sync_boundary: got error from node=10.249.140.11, keyspace=system_distributed_everywhere, table=cdc_generation_descriptions_v2, range=(7526092835354564839,7526559927440628216], error=seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 28 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: shard=4, keyspace=system_distributed_everywhere, cf=cdc_generation_descriptions_v2, range=(7526092835354564839,7526559927440628216], got error in row level repair: seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 28 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: get_sync_boundary: got error from node=10.249.140.11, keyspace=system_distributed_everywhere, table=cdc_generation_descriptions_v2, range=(7526559927440628216,7526705102766251922], error=seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 26 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: shard=4, keyspace=system_distributed_everywhere, cf=cdc_generation_descriptions_v2, range=(7526559927440628216,7526705102766251922], got error in row level repair: seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 26 free segments))
Oct 25 11:50:39 hostname-01 scylla[67026]:  [shard  4:stre] repair - repair[1102309a-c1f0-4ad6-ad59-263b4118d980]: get_sync_boundary: got error from node=10.249.140.11, keyspace=system_distributed_everywhere, table=cdc_generation_descriptions_v2, range=(7526705102766251922,7526775257311050526], error=seastar::rpc::remote_verb_error (failed to refill emergency reserve of 30 (have 27 free segments))

Check lsa memory and non-lsa memory per shard in the Detailed dashboard. It’s likely one of the shards is overloaded with metadata.

What version are you running?

How many shards per node and how much memory per node do you have?

Do you have a large amount of keyspaces/tables?

Thanks for your reply !
I am now using Scylla version 5.4.
Each node has 48 shards and 310 GB of memory.
There are more than sixty tables in the cluster, but mainly one table is being written to, which has approximately 170 billion objects, each object is 1 byte in size.

The LSA memory of the node was very low before the restart, but it increases after the abort and restart.

Here are more information about this expansion:
backtrace:

JMX is enabled to receive remote connections on port: 7199
2024-09-15 02:38:21,423 INFO success: scylla-jmx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-15 02:38:21,423 INFO success: scylla entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-09-15 02:38:21,423 INFO success: rsyslog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborting on shard 31.
Backtrace:
  0x5bc3f38
  0x5bfa082
  /opt/scylladb/libreloc/libc.so.6+0x3dbaf
  /opt/scylladb/libreloc/libc.so.6+0x8e883
  /opt/scylladb/libreloc/libc.so.6+0x3dafd
  /opt/scylladb/libreloc/libc.so.6+0x2687e
  /opt/scylladb/libreloc/libstdc++.so.6+0xa4d18
  /opt/scylladb/libreloc/libstdc++.so.6+0xb4f4b
  /opt/scylladb/libreloc/libstdc++.so.6+0xb4fb6
  0x132ec3a
  0x3889ea3
  0x38949c1
  0x385c830
  0x385a9fd
  0x1a032f6
  0x1ab4521
  0x1ad68bb
  0x1ad6488
  0x5b81512
  0x5bd57ef
  0x5bd6ac7
  0x5bfa673
  0x5ba552a
  /opt/scylladb/libreloc/libc.so.6+0x8c946
  /opt/scylladb/libreloc/libc.so.6+0x112763
2024-10-25 03:32:57,087 INFO exited: scylla (terminated by SIGABRT; not expected)

decode

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) 于 ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:64
(已内连入)seastar::backtrace_buffer::append_backtrace() 于 ./build/release/seastar/./seastar/src/core/reactor.cc:825
(已内连入)seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) 于 ./build/release/seastar/./seastar/src/core/reactor.cc:855
seastar::print_with_backtrace(char const*, bool) 于 ./build/release/seastar/./seastar/src/core/reactor.cc:867
(已内连入)seastar::sigabrt_action() 于 ./build/release/seastar/./seastar/src/core/reactor.cc:4031
(已内连入)operator() 于 ./build/release/seastar/./seastar/src/core/reactor.cc:4007
(已内连入)__invoke 于 ./build/release/seastar/./seastar/src/core/reactor.cc:4003
/opt/scylladb/libreloc/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs), BuildID[sha1]=7026fe8c129a523e07856d7c96306663ceab6e24, for GNU/Linux 3.2.0, stripped
 
addr2line: /opt/scylladb/libreloc/libc.so.6: don't know how to handle section `.relr.dyn' [0x      13]
__sigaction 于 ??:?
pthread_key_delete 于 ??:?
gsignal 于 ??:?
abort 于 ??:?
/opt/scylladb/libreloc/libstdc++.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=694f64a512ecd55e04f1587ddd790ec30cb0b726, stripped
 
__cxa_throw_bad_array_new_length 于 ??:?
std::rethrow_exception(std::__exception_ptr::exception_ptr) 于 ??:?
std::terminate() 于 ??:?
__clang_call_terminate 于 main.cc:?
seastar::shared_future<seastar::with_clock<seastar::lowres_clock> >::shared_state::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) 于 /tmp/./seastar/include/seastar/core/shared_future.hh:179
seastar::shared_future<seastar::with_clock<seastar::lowres_clock> >::get_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) const 于 /tmp/./seastar/include/seastar/core/shared_future.hh:256
(已内连入)seastar::shared_promise<seastar::with_clock<seastar::lowres_clock> >::get_shared_future(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) const 于 /tmp/./seastar/include/seastar/core/shared_future.hh:324
(已内连入)utils::flush_queue<db::replay_position, std::less<db::replay_position>, seastar::lowres_clock>::wait_for_pending(std::reverse_iterator<std::_Rb_tree_iterator<std::pair<db::replay_position const, utils::flush_queue<db::replay_position, std::less<db::replay_position>, seastar::lowres_clock>::notifier> > >, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) 于 /tmp/./utils/flush_queue.hh:164
(已内连入)utils::flush_queue<db::replay_position, std::less<db::replay_position>, seastar::lowres_clock>::wait_for_pending(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) 于 /tmp/./utils/flush_queue.hh:169
(已内连入)db::commitlog::segment::batch_cycle(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) 于 /tmp/db/commitlog/commitlog.cc:1087
seastar::future<db::rp_handle> db::commitlog::segment_manager::allocate_when_possible<db::commitlog::add_entry(utils::tagged_uuid<table_id_tag> const&, commitlog_entry_writer const&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::cl_entry_writer, db::rp_handle>(db::commitlog::add_entry(utils::tagged_uuid<table_id_tag> const&, commitlog_entry_writer const&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::cl_entry_writer, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) 于 /tmp/db/commitlog/commitlog.cc:1351
db::commitlog::add_entry(utils::tagged_uuid<table_id_tag> const&, commitlog_entry_writer const&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) 于 /tmp/db/commitlog/commitlog.cc:2426
replica::database::do_apply(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>) 于 /tmp/replica/database.cc:2176
seastar::future<void> std::__invoke_impl<seastar::future<void>, seastar::future<void> (replica::database::* const&)(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>), replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >(std::__invoke_memfun_deref, seastar::future<void> (replica::database::* const&)(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>), replica::database*&&, seastar::lw_shared_ptr<schema const>&&, frozen_mutation const&, tracing::trace_state_ptr&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, seastar::bool_class<db::force_sync_tag>&&, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>&&) 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:74
(已内连入)std::__invoke_result<seastar::future<void> (replica::database::* const&)(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>), replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >::type std::__invoke<seastar::future<void> (replica::database::* const&)(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>), replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >(seastar::future<void> (replica::database::* const&)(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>), replica::database*&&, seastar::lw_shared_ptr<schema const>&&, frozen_mutation const&, tracing::trace_state_ptr&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, seastar::bool_class<db::force_sync_tag>&&, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>&&) 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:96
(已内连入)_ZNKSt12_Mem_fn_baseIMN7replica8databaseEFN7seastar6futureIvEENS2_13lw_shared_ptrIK6schemaEERK15frozen_mutationN7tracing15trace_state_ptrENSt6chrono10time_pointINS2_12lowres_clockENSE_8durationIlSt5ratioILl1ELl1000000000EEEEEENS2_10bool_classIN2db14force_sync_tagEEESt7variantIJSt9monostateNSN_24per_partition_rate_limit12account_onlyENSS_19account_and_enforceEEEELb1EEclIJPS1_S8_SB_SD_SL_SP_SV_EEEDTclsr3stdE8__invokedtdefpT6_M_pmfspclsr3stdE7forwardIT_Efp_EEEDpOS11_ 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/functional:170
(已内连入)seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>::direct_vtable_for<std::_Mem_fn<seastar::future<void> (replica::database::*)(seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)> >::call(seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)> const*, replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>) 于 /tmp/./seastar/include/seastar/util/noncopyable_function.hh:129
seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>::operator()(replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>) const 于 /tmp/./seastar/include/seastar/util/noncopyable_function.hh:215
(已内连入)operator() 于 /tmp/./seastar/include/seastar/core/execution_stage.hh:342
(已内连入)seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>::direct_vtable_for<seastar::inheriting_concrete_execution_stage<seastar::future<void>, replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >::make_stage_for_group(seastar::scheduling_group)::{lambda(replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)#1}>::call(seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)> const*, replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>) 于 /tmp/./seastar/include/seastar/util/noncopyable_function.hh:129
seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>::operator()(replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>) const 于 /tmp/./seastar/include/seastar/util/noncopyable_function.hh:215
(已内连入)seastar::future<void> std::__invoke_impl<seastar::future<void>, seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>&, replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >(std::__invoke_other, seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>&, replica::database*&&, seastar::lw_shared_ptr<schema const>&&, frozen_mutation const&, tracing::trace_state_ptr&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, seastar::bool_class<db::force_sync_tag>&&, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>&&) 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61
(已内连入)std::__invoke_result<seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>&, replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >::type std::__invoke<seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>&, replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >(seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>&, replica::database*&&, seastar::lw_shared_ptr<schema const>&&, frozen_mutation const&, tracing::trace_state_ptr&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&&, seastar::bool_class<db::force_sync_tag>&&, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>&&) 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:96
(已内连入)_ZSt12__apply_implIRN7seastar20noncopyable_functionIFNS0_6futureIvEEPN7replica8databaseENS0_13lw_shared_ptrIK6schemaEERK15frozen_mutationN7tracing15trace_state_ptrENSt6chrono10time_pointINS0_12lowres_clockENSG_8durationIlSt5ratioILl1ELl1000000000EEEEEENS0_10bool_classIN2db14force_sync_tagEEESt7variantIJSt9monostateNSP_24per_partition_rate_limit12account_onlyENSU_19account_and_enforceEEEEEESt5tupleIJS6_SA_SD_SF_SN_SR_SX_EEJLm0ELm1ELm2ELm3ELm4ELm5ELm6EEEDcOT_OT0_St16integer_sequenceImJXspT1_EEE 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/tuple:2288
(已内连入)_ZSt5applyIRN7seastar20noncopyable_functionIFNS0_6futureIvEEPN7replica8databaseENS0_13lw_shared_ptrIK6schemaEERK15frozen_mutationN7tracing15trace_state_ptrENSt6chrono10time_pointINS0_12lowres_clockENSG_8durationIlSt5ratioILl1ELl1000000000EEEEEENS0_10bool_classIN2db14force_sync_tagEEESt7variantIJSt9monostateNSP_24per_partition_rate_limit12account_onlyENSU_19account_and_enforceEEEEEESt5tupleIJS6_SA_SD_SF_SN_SR_SX_EEEDcOT_OT0_ 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/tuple:2299
(已内连入)seastar::future<void> seastar::futurize<seastar::future<void> >::apply<seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>&, replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >(seastar::noncopyable_function<seastar::future<void> (replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce>)>&, std::tuple<replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >&&) 于 /tmp/./seastar/include/seastar/core/future.hh:1973
(已内连入)seastar::concrete_execution_stage<seastar::future<void>, replica::database*, seastar::lw_shared_ptr<schema const>, frozen_mutation const&, tracing::trace_state_ptr, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, seastar::bool_class<db::force_sync_tag>, std::variant<std::monostate, db::per_partition_rate_limit::account_only, db::per_partition_rate_limit::account_and_enforce> >::do_flush() 于 /tmp/./seastar/include/seastar/core/execution_stage.hh:251
operator() 于 ./build/release/seastar/./seastar/src/core/execution_stage.cc:149
(已内连入)seastar::future<void> seastar::futurize<void>::invoke<seastar::execution_stage::flush()::$_0&>(seastar::execution_stage::flush()::$_0&) 于 ./build/release/seastar/./seastar/include/seastar/core/future.hh:2003
(已内连入)seastar::lambda_task<seastar::execution_stage::flush()::$_0>::run_and_dispose() 于 ./build/release/seastar/./seastar/include/seastar/core/make_task.hh:45
seastar::reactor::run_tasks(seastar::reactor::task_queue&) 于 ./build/release/seastar/./seastar/src/core/reactor.cc:2651
(已内连入)seastar::reactor::run_some_tasks() 于 ./build/release/seastar/./seastar/src/core/reactor.cc:3114
seastar::reactor::do_run() 于 ./build/release/seastar/./seastar/src/core/reactor.cc:3283
operator() 于 ./build/release/seastar/./seastar/src/core/reactor.cc:4501
(已内连入)void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&) 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61
(已内连入)std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&) 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:111
(已内连入)std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke(std::_Any_data const&) 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290
std::function<void ()>::operator()() const 于 /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591
(已内连入)seastar::posix_thread::start_routine(void*) 于 ./build/release/seastar/./seastar/src/core/posix.cc:90
pthread_condattr_setpshared 于 ??:?
__clone 于 ??:?

The LSA memory of the node was very low before the restart, but it increases after the abort and restart.

LSA basically holds cache and memtables and dynamically gets resized. The Non-LSA panel is what’s interesting here: This cyan node (for some reason) had a very high memory consumption, until it eventually OOM’d.

One possibility is as @avikivity mentioned, high memory for metadata. Too many small keys under a single shard, perhaps?

It is better if you focus on this specific node, check its per-shard view and take it from there. Also upgrade, and if the problem persists, raise a GitHub issue and upload the generated coredump accordingly.

Or you may find your way through scylladb/docs/dev/debugging.md at master · scylladb/scylladb · GitHub if you ain’t like upgrading atm (in which case 5.4 is an already EOL release)

Right. But the fact the non-LSA memory is low after the restart indicates the problem is not with metadata (since it would then reccover) but with something else.

However, since it’s now low, we cannot investigate.

Suggest you monitor non-LSA memory and if it starts increasing, we can try to understand why.

p.s. 5.4 has reached end-of-life and is no longer supported.