Originally from the User Slack
@Joakim_Lindqvist: Hey
I am trying to add new larger nodes to my cluster (I intend to eventually replace the old ones with this new larger nodes as we need more storage space for Scylla).
Unfortunately I seem to be unable to create new nodes as when they finish bootstrapping (which usually takes about 15 hrs or so) they end up with errors and the node then aborts.
This is the error we see.
[shard 0:stre] node_ops - bootstrap[34b488af-6ad9-4065-8557-ec78861232e5]: Operation failed, sync_nodes={192.168.6.61, 192.168.10.216, 192.168.6.76, 192.168.9.4, 192.168.1.24, 192.168.5.184, 192.168.18.249, 192.168.30.19, 192.168.17.229, 192.168.3.159, 192.168.14.214, 192.168.13.35, 192.168.2.83, 192.168.29.74}: std::runtime_error ({shard 3: std::runtime_error (repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-8976423482152603061,-8974123670229822782])), shard 10: std::runtime_error (repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-9190837334122310944,-9164910947701726914]))})
This is on a older version (5.4.5). More Error logs in thread.
Some more context around that error:
[shard 3:stre] repair - repair[b56697b2-892e-4919-a78c-58121332d977]: stats: repair_reason=bootstrap, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, ranges_nr=1956, round_nr=14961, round_nr_fast_path_already_synced=5876, round_nr_fast_path_same_combined_hashes=269, round_nr_slow_path=8815, rpc_call_nr=63189, tx_hashes_nr=162302573, rx_hashes_nr=577505758, duration=57885.67 seconds, tx_row_nr=549619, rx_row_nr=162341190, tx_row_bytes=178775229, rx_row_bytes=66915571166, row_from_disk_bytes={{192.168.6.61, 192013943004}, {192.168.6.76, 104041637280}}, row_from_disk_nr={{192.168.6.61, 435277471}, {192.168.6.76, 597054634}}, row_from_disk_bytes_per_sec={{192.168.6.61, 3.16346}, {192.168.6.76, 1.7141}} MiB/s, row_from_disk_rows_per_sec={{192.168.6.61, 7519.61}, {192.168.6.76, 10314.4}} Rows/s, tx_row_nr_peer={{192.168.6.76, 549619}}, rx_row_nr_peer={{192.168.6.76, 162341190}}
[shard 3:stre] repair - repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-8976423482152603061,-8974123670229822782])
[shard 0:stre] repair - repair[b56697b2-892e-4919-a78c-58121332d977]: sync data for keyspace=jupiter, status=failed: std::runtime_error ({shard 3: std::runtime_error (repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-8976423482152603061,-8974123670229822782])), shard 10: std::runtime_error (repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-9190837334122310944,-9164910947701726914]))})
[shard 0:stre] node_ops - bootstrap[34b488af-6ad9-4065-8557-ec78861232e5]: Operation failed, sync_nodes={192.168.6.61, 192.168.10.216, 192.168.6.76, 192.168.9.4, 192.168.1.24, 192.168.5.184, 192.168.18.249, 192.168.30.19, 192.168.17.229, 192.168.3.159, 192.168.14.214, 192.168.13.35, 192.168.2.83, 192.168.29.74}: std::runtime_error ({shard 3: std::runtime_error (repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-8976423482152603061,-8974123670229822782])), shard 10: std::runtime_error (repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-9190837334122310944,-9164910947701726914]))})
[shard 0:stre] node_ops - bootstrap[34b488af-6ad9-4065-8557-ec78861232e5]: Stopped heartbeat_updater
[shard 0:stre] node_ops - bootstrap[34b488af-6ad9-4065-8557-ec78861232e5]: Started bootstrap_abort[34b488af-6ad9-4065-8557-ec78861232e5]: ignore_nodes={}, leaving_nodes={}, replace_nodes={}, bootstrap_nodes={{192.168.6.61 -> {9207423165174751708, 9187498559333087291, 8423122522306558343, 8405555043027247516, 8290023616079636514, 8034258609937158032, 7726256095509837469, 7687620977502680315, 7638135239790972793, 7635132297564838548, 7311745190823822946, 7132063630340263340, 7085871688021077089, 7557852067083399172, 6888680491182908211, 6785851714100604243, 6679467625463114023, 8135379877196657263, 6648525004404105147, 6604230209082709445, 6332850166647891788, 7602965465955007103, 65325275849055025, 6015420634188937182, 8767413132168313835, 5972339601828660204, 5833097145632930320, 727097091325648403, 6521582088616115602, 5365461492849866348, 5288884199396740168, 5066504173021120075, 4960340349210109965, 4897412102629493784, 4759558065606708308, 4645103848508894313, 4642055106540598937, 4622330714313668163, 4392775317689017615, 4322423713949526287, 6454332036028576179, 4059763028017032877, 3861588847206460079, 4691441612551357429, 4114065484690078827, 375632169932585786, 3716584037404832811, 5520034194565344006, 3506503877857055525, 3432370250939554735, 3420831638662360404, 3269512086598296474, 9196261724844652738, 3182858873491604635, 6420497288061808764, 3076453397897102988, 3032591669664610137, 272634328729618940, 2690071207918855680, 265978181140148438, 2454555571964081964, 2349739074904864913, 8185309793164577846, 6832488992178764528, 2859670683275049064, 202839947670220205, 1952461322332603659, 1745704632338590392, 4517612929260273709, 1691697381815332893, 1672466998445143398, 148015723549577498, 145663985065843679, 1189711548744224293, 1295044004262312736, -4449460161971976897, -6188497202691048302, 5695155357446634358, -6292163157054999302, -4226332247088561412, 3839591626399957777, -410692625257722614, 8200955130688308851, -4560354798283704076, -3930689929219309785, -6628849540827226628, -3696035198278671492, 6219886873214661603, -1866711869758852026, -4980416527780995484, -4940933600217431689, 8109047591715434897, -3690155301235495959, 2503890063366368343, -3817574766701527087, -6853648264750688274, -3307673609887766476, 8921724895398731563, 4683550141096220083, -3491272318572856016, -3202435910679683011, -3190799986566944632, -1054525577021664251, -2990988261498342643, -1977973511750513258, 6963213557479684713, -1837276843410426793, 1561299558708353464, 1165093842657297940, -2036693037550577530, -3151354917288866774, -2322009821051109043, 5607748466170925839, -3772393489635600732, -2221171650186326990, -3044010036948653936, 7332887124477358067, -8269712601891159486, -4045614382343833543, -3635145318325668752, 363461640614559422, -496865206287994911, -3356308164319855694, -3506436338967522972, -1374193938833018489, -2803288673062949495, -677023464627144163, -2184178573303147609, -4893193647229102723, -9132612104944048653, -4501103244278878613, -2337146387799984847, 843620725760125766, -2690378285450879418, 6731165288641545993, 3953536587387814193, -6678088610194367599, -1250792532614803765, 336111802846124389, 2151747665710346492, -3818400244400706175, -1332592648102466550, -1081297284045596283, -1753804941770011732, -1007453709276796744, 7621259680862668078, -6763558765301434514, -7975685620977735771, 8425037882255707591, 3558098044398638807, -2039004712265860181, 2951768684063116394, -1589266830774279795, -625718027373983084, 7031838841439920283, -4533018672305678254, -1938436233097984486, -2042068370772302095, 6544372065657394631, -1538028653235323482, -7028428503345046258, 3539729448665287724, 3329029928511810290, 2318163353532257595, -4811739174086884353, -8486778243255411607, 4414777406879046868, -2234356615768408718, 697233312216780814, -3315033334460701974, -2485631619373484092, -2506748139506658411, 5415364107644511136, -2632815813567471499, 3557370417557600664, -8707067229112307075, -6792997377975599451, -2240764325472973009, -5036832492122505345, -4974592963969703047, -5039453950220634605, 5946191133796229632, -5675536295821670237, -5055742741157647965, 6307303170542371068, 5226862612474911028, -7820357969772956451, -5108739077317593020, 3025353856544890996, -5175986761212132784, -5182837721452714440, -7272445367738614846, -5275917684002970062, 645187494117391225, -5294591091655635865, 5557760440988230723, -7274974532709792331, -5486319354433601402, 4732416082259635498, -5766100130876015343, -6697095952020008061, -2288567973324737955, -6382705031769712030, 7516606599997922515, -6436442285650998845, -6607872232410918034, -2308845127873303635, -6308614672691338284, -663798280656578651, -2639260417329801524, -6777949777875490531, -6721585168859830518, 1479067709729585947, -4301703474222209226, -7138884826224297519, -703123416329340559, -7060607522019948478, 8536831266834569765, 7947958901175052235, -9050455807178718129, 2553010461858476534, -466563216826656342, -7166882758154477034, 7073178996115353810, -5726352218936538166, 1142057001848927943, 3899926504188369938, 3866290941183619179, -7504288436465968057, -2993954724134281777, -7641126727249421839, -9111724901351763839, -7734784279374049500, 1035531435881577277, 4064726124904265497, -781383598512561025, -7901718949335469819, -9015967997881658443, 1165603244797269517, -8041232312713362459, 30014572295844404, -8288889613432590572, 7409736441812882563, -996090741061670089, -861779399028909490, -8350442888488082225, 4740881017940774670, -5997825958963620666, -8481743127792987179, -8560790057504716878, 6072686050924683423, -8524385238978143234, 912553604800246338, -8619180440205333940, 7340722552613156666, -8901489408801186545}}}, repair_tables={}
[shard 0:stre] storage_service - aborting node operation ops_uuid=34b488af-6ad9-4065-8557-ec78861232e5
[shard 0:stre] storage_service - bootstrap[34b488af-6ad9-4065-8557-ec78861232e5]: Removed node=192.168.6.61 as bootstrap, coordinator=192.168.6.61
Ater about 15 minutes the shutdown process has completed and it outputs this error, seems to just be a repeat of the reason why it aborted.
[shard 0:main] init - Startup failed: std::runtime_error ({shard 3: std::runtime_error (repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-8976423482152603061,-8974123670229822782])), shard 10: std::runtime_error (repair[b56697b2-892e-4919-a78c-58121332d977]: 1 out of 33252 ranges failed, keyspace=jupiter, tables={build_parts, blob_index_v2, blob_incoming_references, blob_index, builds_by_name_index, block_context_by_blocks_index, builds, block_context, bucket_referenced_ref, content_id, object_last_access_v2, block_context_by_time_index, block_index, objects, bucket_referenced_blobs, buckets, buckets_v2}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=std::runtime_error (Failed to repair for keyspace=jupiter, cf=blob_incoming_references, range=(-9190837334122310944,-9164910947701726914]))})
@dor: It’s not an ‘official’ answer, just a guess - if you didn’t run repair, it’s the streaming that uses RBNO (repair based node operations). You can disable it in the config and add a node in the other way
@Joakim_Lindqvist: I can try that. I was not running a repair at the time.
@dor: RBNO is repair under the hood
@Joakim_Lindqvist: Thank you for your suggestion Dor, I can confirm that this was an issue when using RBNO, when I disabled using repair during bootstrap my nodes were able to bootstrap and join successfully. Thanks!