What I don’t understand is if you are going to from 6 nodes(RF=3) to 4 nodes(RF=2), do you need to need to load data from all 6 nodes even if the replication factor in 6 node cluster is 3? If we do need to load data from all 6 nodes into 4 node cluster, are there any risk of running out of space in the new cluster?
if you are going to from 6 nodes(RF=3) to 4 nodes(RF=2), do you need to need to load data from all 6 nodes even if the replication factor in 6 node cluster is 3?
Copying all 6 nodes indeed seem an overkill. But the real answer is that it depends.
Are you dual-writing to both clusters? Do you expect all data present in the source cluster to match its target? Also, Do you use NetworkTopologyStrategy and spread the data to 3 AZs?
If yes, then you can start dual-writing, run a repair job and once that repair job finishes snapshot your data from a single AZ and copy it over. Both cluster should be in sync afterwards.
If we do need to load data from all 6 nodes into 4 node cluster, are there any risk of running out of space in the new cluster?
All SSTable data is going to get streamed to its replicas, so you may want to let compaction pick up as you go through each Load and Stream step.