Bootstrap repair of us-west nodes takes time in multi-dc cluster

Hi,

We have a Multi-DC ScyllaDB cluster. With 10 nodes in AWS us-east-1 and 10 nodes in us-west-2. We use scylla-ansible-role to bring up new clusters. We have observed that bootstrap of nodes in usw2 takes lot of time than in use1. Looks like table repair during bootstrap is taking long time. Any pointer on how to debug and fix this.

Thanks,
Swaroop

Please provide the relevant logs that show what is taking time.

Do the nodes in different DCs have differing shard count? Are you using RBNO based bootstrap?

Here are few lines from logs. Looks like the repair during bootstrap takes a lot of time in us-west-2. In US-EAST-1 it took 13 sec where as in US-WEST-2 it took about 27 min.

in US-EAST-1:
Feb 22 13:06:07 x.y.z.ec2.internal scylla[2966]: [shard 0:stre] repair - bootstrap_with_repair: started with keyspaces={system_traces, system_distributed_everywhere, system_distributed, system_auth}, nr_ranges_total=9179
Feb 22 13:06:20 x.y.z.ec2.internal scylla[2966]: [shard 0:stre] repair - bootstrap_with_repair: finished with keyspaces={system_traces, system_distributed_everywhere, system_distributed, system_auth}

in US-WEST-2:
Feb 22 13:07:34 a.b.c.ec2.internal scylla[3055]: [shard 0:stre] repair - bootstrap_with_repair: started with keyspaces={system_traces, system_distributed_everywhere, system_distributed, system_auth}, nr_ranges_total=9421
Feb 22 13:34:28 a.b.c.ec2.internal scylla[3055]: [shard 0:stre] repair - bootstrap_with_repair: finished with keyspaces={system_traces, system_distributed_everywhere, system_distributed, system_auth}

@avikivity ,
I have attached some log for reference in this thread. Can you provide some pointer please.

The attached logs do not contain any information w.r.t. to what might be the cause of the slowness.

I didn’t found any error in scylla-server logs. Let me know where else to check.

Can you please answer this? The answer might provide a lead.

Nodes in different DC are of same EC2 types. These nodes got same number of shards.
I think it is using RBNO based bootstrap. I am using scylla-ansible-roles git repo to create scylla cluster. In the repo scylla.yaml template is at scylla-ansible-roles/ansible-scylla-node/templates/scylla.yaml.j2 at master · scylladb/scylla-ansible-roles · GitHub
I don’t see any config for RBNO in this template. So I think it is RBNO(default approach).

Could this be because there is only 1 seed and this seed is in us-east-1?

Seeds are only used when joining the cluster, they are not used afterwards.

The fact that small tables take a lot of time to repair, much more than what would expect, is a known problem and we recently merged a pull request improving this.
That said, I don’t know why those small system tables take so much more time to stream in one DC, compared with the other.
How did you configure the replication of system_auth? This is a keyspace that the user is expected to adjust as the cluster is expanded?

I hope this new PR will reduce some time consumption.

After all nodes in cluster is up and running I run a script which

  1. changes RF of system_auth
  2. Add new roles and delete cassandra role
  3. run a nodetool repair. BTW, this repair also takes a lot of time to complete in a multi-DC cluster just like bootstrap.

Yes, it is repair that takes a long time for tiny tables. We have recently moved node-operations to use repair behind the scenes (hence the name RBNO) and now node operations are affected too.

For reference, this is the PR: repair: Introduce small table optimization by asias · Pull Request #15974 · scylladb/scylladb · GitHub