Facing Issue on nodetool rebuild : keeps getting stuck for a long time

Batman · June 10, 2024, 12:00pm

I am trying to add a new Data Center to an existing 3 node scylla cluster.

The older DC and the new DC are at different location (connected via IPsec).
I have successfully joined the new DC and is UN.

The problem arises while running nodetool rebuild command on the new node.

It runs smoothly for a while but gets stuck at a certain point and the process stops suddenly.
What could be the possible cause for this issue and how can i fix it ?

sudo nodetool status
displays all nodes Up and Normal

Version : 5.4.4-0.20240228.58a1be93b212

Logs from journalctl :

get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6060: seastar::nested_exception: seastar::rpc::clos>
[shard 10:stre] rpc - client 192.168.10.11:56965: server connection dropped: recv: Connection timed out

[shard 2:stre] rpc - client 192.168.10.11:60587: server connection dropped: recv: Connection timed out

[shard 10:stre] rpc - client 192.168.10.11:63025: server connection dropped: recv: Connection timed out

[shard 2:stre] rpc - client 192.168.10.11:65042: server connection dropped: recv: Connection timed out

[shard 4:stre] repair - Failed to process get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6027: seastar::nested_exception: seastar::rpc::clos>

[shard 0:stre] repair - Failed to process get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6008: seastar::nested_exception: seastar::rpc::clos>

[shard 9:stre] repair - Failed to process get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6009: seastar::nested_exception: seastar::rpc::clos>

[shard 1:stre] repair - Failed to process get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6002: seastar::nested_exception: seastar::rpc::clos>

[shard 3:stre] repair - Failed to process get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6035: seastar::nested_exception: seastar::rpc::clos>

[shard 14:stre] repair - Failed to process get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6025: seastar::nested_exception: seastar::rpc::clos>

[shard 8:stre] rpc - client 192.168.10.11:51413: server connection dropped: recv: Connection timed out

[shard 6:stre] rpc - client 192.168.10.11:49731: server connection dropped: recv: Connection timed out

[shard 9:stre] rpc - client 192.168.10.11:57324: server connection dropped: recv: Connection timed out

[shard 9:stre] repair - Failed to process get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6037: seastar::nested_exception: seastar::rpc::clos>

[shard 14:stre] rpc - client 192.168.10.11:59324: server connection dropped: recv: Connection timed out

[shard 7:stre] rpc - client 192.168.10.11:49627: server connection dropped: recv: Connection timed out

[shard 13:stre] rpc - client 192.168.10.11:59263: server connection dropped: recv: Connection timed out

[shard 3:stre] repair - Failed to process get_row_diff_with_rpc_stream_handler from=192.168.10.11, repair_meta_id=6052: seastar::nested_exception: seastar::rpc::clos>

[shard 14:stre] rpc - client 192.168.10.11:55484: server connection dropped: recv: Connection reset by peer

[shard 3:stre] rpc - client 192.168.10.11:55953: server connection dropped: recv: Connection reset by peer

[shard 3:stre] rpc - client 192.168.10.11:49698: server connection dropped: recv: Connection reset by peer

[shard 5:stre] rpc - client 192.168.10.11:54500: server connection dropped: recv: Connection reset by peer

[shard 11:stre] rpc - client 192.168.10.11:53981: server connection dropped: recv: Connection reset by peer

[shard 7:stre] rpc - client 192.168.10.11:59392: server connection dropped: recv: Connection reset by peer

[shard 13:stre] rpc - client 192.168.10.11:51928: server connection dropped: recv: Connection reset by peer

[shard 13:stre] rpc - client 192.168.10.11:60148: server connection dropped: recv: Connection reset by peer

[shard 0:stre] rpc - client 192.168.10.11:57195: server connection dropped: recv: Connection reset by peer

[shard 2:stre] rpc - client 192.168.10.11:64142: server connection dropped: recv: Connection reset by peer

[shard 5:stre] rpc - client 192.168.10.11:63395: server connection dropped: recv: Connection reset by peer

[shard 4:stre] rpc - client 192.168.10.11:57259: server connection dropped: recv: Connection reset by peer

[shard 3:stre] rpc - client 192.168.10.11:52008: server connection dropped: recv: Connection reset by peer

[shard 0:stre] gossip - failure_detector_loop: Send echo to node 192.168.10.11, status = failed: seastar::rpc::closed_error (connection is closed)

[shard 0:goss] gossip - Fail to send EchoMessage to 192.168.10.11: seastar::rpc::closed_error (connection is closed)

[shard 0:stre] gossip - failure_detector_loop: Send echo to node 192.168.10.11, status = failed: seastar::rpc::timeout_error (rpc call timed out)

Followed the steps exactly described on Adding a New Data Center Into an Existing ScyllaDB Cluster

avikivity · July 10, 2024, 8:12pm

Looks like the logs are truncated on the right edge. Please provide complete logs. Also look at the logs on the other nodes from the same time range.

Topic		Replies	Views
Scylladb return inconsistent data after node full rebuild Database Community	1	323	December 13, 2022
Trying to setup a 2 node multi dc cluster for the first time... Seed node comes online fine, second node gets stuck repairing tables and constantly in a state of UJ ScyllaDB open-source , troubleshooting , multi-dc	1	215	March 4, 2024
Node crashing after adding new nodes in scylla cluster ScyllaDB troubleshooting , administration , tablets , topology-change	3	106	December 27, 2024
Error when adding new nodes to cluster, and repair based node operations (RBNO) ScyllaDB troubleshooting , administration , repair , bootstrap	0	48	December 22, 2024
Nodes not joining a cluster, incrementally adding nodes to a cluster ScyllaDB nodetool	1	545	November 13, 2024

Facing Issue on nodetool rebuild : keeps getting stuck for a long time

I am trying to add a new Data Center to an existing 3 node scylla cluster.

Followed the steps exactly described on Adding a New Data Center Into an Existing ScyllaDB Cluster

Related topics