Scylladb return inconsistent data after node full rebuild

Hi! I am testing node rebuild after loosing the data volume in docker (like described here Rebuild a Node After Losing the Data Volume | Scylla Docs).

My steps

  1. Create new cluster with static ip’s(3 DC, 2 nodes in each DC)
  2. Create new namespace with test data (replication=NetworkTopologyStrategy,rf=1). I took data from scylladb repo (mutant_data - 3 rows and tracking_data - 12 rows)
  3. Stop one scylla node in docker (it’ simple node, not seed).
  4. Remove /var/lib/scylla/*
  5. Update scylla config (add set auto_bootstrap=true, replace_address_first_boot=x.x.x.x). Start scylla on node
  6. Run cqlsh on different node and check data consistency (simple select from tables, check row count). Row counts is OK on current step (3 and 12)
  7. Wait until test node became UN and run cqlsh on the node. Check row count in tables. In my case mutant_data had 1 row and tracking_data had 6 rows (must be 3 and 12 rows).
  8. Change consistency level in cqlsh to QUORUM and run check againt Row counters will be good.
  9. Manually running command ‘nodetool repair -pr’ after node became UN will fix problem. But I can’t do it on production, because node become UN after long period of time (and I don’t know when it;s happened) and node starts serving traffic with inconsitent state.

Questions:

  1. Why after full rebuild node returning inconsistent data with CL=ONE|LOCAL_ONE?
  2. How to avoid it’s?

Image version: scylladb/scylla:4.6.8

Hello, scylla replace operations fetches data from nodes in the same DC. In case RF = 1, it is expected the replacing node will miss some of the data after replace. You can run a cross DC repair to fix the data. In addition, I would recommend using RF more than 1 per DC.

2 Likes