Counter updates timeouts & bad_alloc

Hi,

is there any known issue with counters and memory allocation in scylla?

I just have seen a node causing lots of counter update errors:

com.datastax.driver.core.exceptions.OperationTimedOutException: [/10.0.0.21:9042] Timed out waiting for server response
ScyllaConnection: Recent Scylla error: com.datastax.driver.core.exceptions.TransportException: [nexthost/10.0.0.21:9042] Connection has been closed

And some bad alloc errors:

Nov 24 12:35:38 os-p-L12-1 scylla[2796088]: Reactor stalled for 69 ms on shard 2. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x168b018 0x1522f6f 0x16023c5 0x15d6281 0x15e78ed 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab4cc 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d89355 0x4d3ccca 0x92a4 0x100322
Nov 24 12:35:38 os-p-L12-1 scylla[2796088]: Reactor stalled for 129 ms on shard 2. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x1608c4a 0x19a0460 0x18adc78 0x18ad5f4 0x1588e92 0x15e1ddc 0x15ef278 0x15ead06 0x15e7674 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab4cc 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d89355 0x4d3ccca 0x92a4 0x100322
Nov 24 12:35:38 os-p-L12-1 scylla[2796088]: Reactor stalled for 237 ms on shard 2. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x19af3af 0x1602339 0x15d6281 0x15e7501 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab4cc 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d89355 0x4d3ccca 0x92a4 0x100322
Nov 24 12:35:38 os-p-L12-1 scylla[2796088]: Reactor stalled for 445 ms on shard 2. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x168b036 0x1522f6f 0x16023c5 0x15d6281 0x15e78ed 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab4cc 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d89355 0x4d3ccca 0x92a4 0x100322
Nov 24 12:37:08 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:38:12 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:39:15 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:40:21 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:41:22 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:42:27 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:43:30 os-p-L12-1 scylla[2796088]: Reactor stalled for 72 ms on shard 4. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x1608cd4 0x19a0460 0x18adc78 0x18ad5f4 0x1588e92 0x15e1ddc 0x15ef278 0x15ead06 0x15e7674 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab4cc 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d89355 0x4d3ccca 0x92a4 0x100322
Nov 24 12:43:30 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:43:30 os-p-L12-1 scylla[2796088]: Reactor stalled for 132 ms on shard 4. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x19af33b 0x16023c5 0x15d6281 0x15e78ed 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab4cc 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d89355 0x4d3ccca 0x92a4 0x100322
Nov 24 12:43:30 os-p-L12-1 scylla[2796088]: Reactor stalled for 240 ms on shard 4. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x1603285 0x1602380 0x15d6281 0x15e78ed 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab740 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d89355 0x4d3ccca 0x92a4 0x100322
Nov 24 12:44:35 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:45:39 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:46:41 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:47:47 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:48:50 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:49:50 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:50:48 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:52:01 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:52:01 os-p-L12-1 scylla[2796088]: Reactor stalled for 71 ms on shard 0. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x1608cd4 0x19a0460 0x18adc78 0x18ad5f4 0x1588e92 0x15e1ddc 0x15ef278 0x15ead06 0x15e7674 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab4cc 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d6955c 0x4d0f598 0x4d0ea71 0x1037db6 0x103539a 0x27b74 0x103426d
Nov 24 12:52:01 os-p-L12-1 scylla[2796088]: Reactor stalled for 131 ms on shard 0. Backtrace: 0x4d59732 0x4d58390 0x4d59640 0x7f48ede37a1f 0x1603271 0x1602380 0x15d6281 0x15e78ed 0x1604681 0x4d2a025 0x4d2a58f 0x4d2a8b8 0x4d2bf1f 0x4d2efe1 0x1dab4cc 0x1daa7f8 0x1df91ff 0x1de623f 0x1de789f 0x4d68f24 0x4d6a307 0x4d6955c 0x4d0f598 0x4d0ea71 0x1037db6 0x103539a 0x27b74 0x103426d
Nov 24 12:53:04 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:54:01 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:55:12 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:56:14 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:57:15 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:58:13 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 12:59:24 os-p-L12-1 scylla[2796088]:  [shard 2] cql_server - exception while processing connection: std::bad_alloc (std::bad_alloc)
Nov 24 13:00:04 os-p-L12-1 scylla[2796088]:  [shard 0] api - perform_keyspace_flush: keyspace=system_schema tables={}
Nov 24 13:00:04 os-p-L12-1 scylla[2796088]:  [shard 0] api - perform_keyspace_flush: keyspace=system_distributed_everywhere tables={}
Nov 24 13:00:04 os-p-L12-1 scylla[2796088]:  [shard 0] api - perform_keyspace_flush: keyspace=system_traces tables={}
Nov 24 13:00:04 os-p-L12-1 scylla[2796088]:  [shard 0] api - perform_keyspace_flush: keyspace=system tables={}

After a restart of that node everything was fine again.

Is there perhaps some memory issue with counters in scylla?

regards,
Christian

It looks like a bug.
Please open an issue with all relevant info