Hi! I am testing node rebuild after loosing the data volume in docker (like described here Rebuild a Node After Losing the Data Volume | Scylla Docs).
My steps
- Create new cluster with static ip’s(3 DC, 2 nodes in each DC)
- Create new namespace with test data (replication=NetworkTopologyStrategy,rf=1). I took data from scylladb repo (mutant_data - 3 rows and tracking_data - 12 rows)
- Stop one scylla node in docker (it’ simple node, not seed).
- Remove /var/lib/scylla/*
- Update scylla config (add set auto_bootstrap=true, replace_address_first_boot=x.x.x.x). Start scylla on node
- Run cqlsh on different node and check data consistency (simple select from tables, check row count). Row counts is OK on current step (3 and 12)
- Wait until test node became UN and run cqlsh on the node. Check row count in tables. In my case mutant_data had 1 row and tracking_data had 6 rows (must be 3 and 12 rows).
- Change consistency level in cqlsh to QUORUM and run check againt Row counters will be good.
- Manually running command ‘nodetool repair -pr’ after node became UN will fix problem. But I can’t do it on production, because node become UN after long period of time (and I don’t know when it;s happened) and node starts serving traffic with inconsitent state.
Questions:
- Why after full rebuild node returning inconsistent data with CL=ONE|LOCAL_ONE?
- How to avoid it’s?
Image version: scylladb/scylla:4.6.8