Repair by token ranges

Hello.
I’m trying to use repair API with token ranges and have a number of questions.

  1. Empty [start | end]Token:
    Consider the following result:
curl 10.0.32.70:10000/storage_service/range_to_endpoint_map/markb | jq -r '.[] | select (.key[0] == "" or .key[1] == "")'

{
  "key": [
    "9218948411409126567",
    ""
  ],
  "value": [
    "10.0.32.70",
    "10.0.32.74",
    "10.0.32.77",
    "10.0.32.78",
    "10.0.32.75",
    "10.0.32.71"
  ]
}
{
  "key": [
    "",
    "-9213378514646091305"
  ],
  "value": [
    "10.0.32.70",
    "10.0.32.74",
    "10.0.32.77",
    "10.0.32.78",
    "10.0.32.75",
    "10.0.32.71"
  ]
}

The following command returns: {"message": "boost::wrapexcept<boost::bad_lexical_cast> (bad lexical cast: source type value could not be interpreted as target)", "code": 500}.

curl -X POST '10.0.32.70:10000/storage_service/repair_async/markb?ranges=9218948411409126567%3A'

Q1.1: Is it possible to repair this token range using the ranges parameter?
Q1.2: If A1.1 is “no”, then what’s the proper command to repair such a token range (if I need to repair it at all)?

  1. Consider the following result:
curl 10.0.32.70:10000/storage_service/range_to_endpoint_map/markb | jq -r '.[] | select (.key[0] == "-1569604604390479271")'

{
  "key": [
    "-1569604604390479271",
    "-1519586698886497919"
  ],
  "value": [
    "10.0.32.70",
    "10.0.32.75",
    "10.0.32.72",
    "10.0.32.78",
    "10.0.32.76",
    "10.0.32.74"
  ]
}

Notice, that there is no 10.0.32.71 host in the endpoint list of this token range.

curl -X POST 'http://10.0.32.71:10000/storage_service/repair_async/markb?primaryRange=true&ranges=-1569604604390479271%3A-1519586698886497919'

But the command above “successfully” repairs this token range according to the result of the following command (with X returned by the above command).
The result doesn’t change, if I don’t use the primaryRange parameter.

curl 'http://10.0.32.71:10000/storage_service/repair_status/?id=X'

Q2: How is it possible to successfully run a repair command of a token range on a host which doesn’t present in the endpoint list of this token range?
Does this command really repairs a token range on other hosts, when a repair master host doesn’t have any replica of this data?

Thanks in advance.

Please share more info about your deployment (ScyllaDB Version, OS, etc.)

Why don’t you use Scylla Manager to manage cluster repair and backup?

It’s a test system.
Scylladb 5.4.4 Community
Debian 10
3 DC x 3 nodes each

The reason I cat’t use Scylla Manager is its 5-node limit for open-source editions.

But I believe, that results of these API calls are the same on all Scylla versions.
BTW, the nodetool repair [-pr] -st X -et Y keyspace commands return SUCCESSFUL as well on whatever X, Y values disregarding the host where I run it.
For example, the following command finishes “successfully” as well.

nodetool repair -pr -st 0 -et -1 keyspace

So, if all such commands return “success”, it would be good to know if they really do some useful work. And if not, that I need to know the correct algorithm of my cluster reparation by token ranges. I’m not able to construct such an algorithm based on available documentation at the moment, so I need some help here.

Lets presume, that I need to repair a keyspace markb in my cluster.
My algorithm with not clear enough info is below.

  1. Get range (start_token, end_token) to endpoint mapping with storage_service/range_to_endpoint_map API (like in Q2 of the initial post).
  2. Run the following command (or its API analogue with the same parameters) for all (start_token, end_token) pairs on the 1-st host of the endpoint list.
ssh 10.0.32.7x -- nodetool repair [-pr] -st <start_token> -et <end_token> markb

I believe, that a (start_token, end_token) pair must be provided to the nodetool utility as is from the storage_service/range_to_endpoint_map API call (both return/accept such an interval as exclusive-inclusive, so I don’t need any ±1 math here).
It’s not clear here what to do with token ranges returned by storage_service/range_to_endpoint_map without start_token or end_token. Should I ignore these ranges? Should I run some special repair command on them?
It’s not clear if I need to use the -pr parameter here. Seems, that it should be no difference, because 10.0.32.70 holds a primary range and I run the command namely on this host. I must (or could?) omit -pr running this command on, say, 10.0.32.74, because it holds replica of this token range.

Please, correct me if it’s not a correct algorithm to repair a keyspace by token ranges.

Seems, that the ScyllaDB’s implementation of nodetool repair with -st / -et is not compatible with the Cassandra’s one. ScyllaDB doesn’t check parameters at all.
Below is a couple of examples which must finish with an error as on, say, Cassandra 4.1.9.

  1. The host doesn’t have any replica of this token range:
# ssh 10.202.110.142 -- nodetool repair -st -3389892500827521034 -et -3359020880985253656 markb

[2025-07-10 16:38:15,028] Starting repair command #3 (46aff4a0-5dac-11f0-ac52-7d59c9ea8df9), repairing keyspace markb with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], previewKind: NONE, # of ranges: 1, pull repair: false, force repair: false, optimise streams: false, ignore unreplicated keyspaces: false, repairPaxos: true, paxosOnly: false)
[2025-07-10 16:38:15,028] Repair command #3 failed with error Nothing to repair for (-3389892500827521034,-3359020880985253656] in markb - aborting
[2025-07-10 16:38:15,029] Repair command #3 finished with error
error: Repair job has failed with the error message: Repair command #3 failed with error Nothing to repair for (-3389892500827521034,-3359020880985253656] in markb - aborting. Check the logs on the repair participants for further details
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: Repair command #3 failed with error Nothing to repair for (-3389892500827521034,-3359020880985253656] in markb - aborting. Check the logs on the repair participants for further details
        at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
        at org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
        at java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
        at java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
        at java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
        at java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)
        at java.base/java.lang.Thread.run(Thread.java:829)
  1. Wrong token range boundaries:
# ssh 10.202.110.144 -- nodetool repair -st -3359020880985253656 -et -3312316956446460988 markb

[2025-07-10 16:42:34,644] Starting repair command #5 (e16e84c0-5dac-11f0-9606-23fd3a216128), repairing keyspace markb with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], previewKind: NONE, # of ranges: 1, pull repair: false, force repair: false, optimise streams: false, ignore unreplicated keyspaces: false, repairPaxos: true, paxosOnly: false)
[2025-07-10 16:42:34,645] Repair command #5 failed with error Requested range (-3359020880985253656,-3312316956446460988] intersects a local range ((-3359020880985253656,-3335108523467758869]) but is not fully contained in one; this would lead to imprecise repair. keyspace: markb
[2025-07-10 16:42:34,646] Repair command #5 finished with error
error: Repair job has failed with the error message: Repair command #5 failed with error Requested range (-3359020880985253656,-3312316956446460988] intersects a local range ((-3359020880985253656,-3335108523467758869]) but is not fully contained in one; this would lead to imprecise repair. keyspace: markb. Check the logs on the repair participants for further details
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: Repair command #5 failed with error Requested range (-3359020880985253656,-3312316956446460988] intersects a local range ((-3359020880985253656,-3335108523467758869]) but is not fully contained in one; this would lead to imprecise repair. keyspace: markb. Check the logs on the repair participants for further details
        at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:137)
        at org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
        at java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:633)
        at java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:555)
        at java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:474)
        at java.management/com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor.lambda$execute$0(ClientNotifForwarder.java:108)
        at java.base/java.lang.Thread.run(Thread.java:829)

ScyllaDB returns “SUCCESSFUL” in both cases, which is incorrect in my opinion.

P.S.: The same behavior is on ScyllaDB 6.2.3 Community…

As explained under a different venue the SUCCESSFUL message indicates the non-replica node works only as a repair coordinator. In fact, if you shutdown one of the actual replicas involved in the repair, then its status should change to FAILED. With regards to tokens missing a <start_token> or <end_token>, this indicates the range wrap arounds on /range_to_endpoint_map.

That said, it is arguable whether the behavior your observe is incorrect or not, as repair is taking place and thus fulfilling its goal, though not in most efficient way. Forcing it to only accept a replica coordinator could be an option, warning the user about it another, or simply adding a boolean flag yet another.

Note that both /range_to_endpoint_map/{keyspace} as well as (my favorite) /describe_ring/{keyspace} include the relevant endpoints owning a particular range. That said, incorrectly triggering repair to a non-replica is unlikely to manifest in practice unless manually forced by the user as your example demonstrates.

Please file a GitHub issue and explain your use case should our APIs be missing on anything.

It becomes more interesting :slight_smile:
Did I get it right, that Scylla nodetool repair implementation can run on non-replica node to repair a full token range? This means, that it’s possible to run it on a single node to repair the whole cluster. Is that correct?

Well, it’s cool enough, but it should be clearly documented somewhere.
We all know, that it’s impossible to do such things in Cassandra, Datastax. Even ScyllaDB docs say, that we must run nodetool repair on all cluster hosts!

I’ll definitely open an issue on this, if I have time. Either with a request to clarify this incompatible behavior (for example, to clearly state, that a successful run on a non-replica node really repairs the corresponding range and not silently swallows the call doing nothing useful as one could expect), or to forbid such calls…

When you specify a token range to repair, yes. This alone, is a very particular use case, which on its own already require retrieving the token-to-replica mapping on its own. So the subsequent repair will often get executed on a replica node anyway.

You may check the scylla-server journalctl logs as you invoke it on a non-replica node and compare it versus a replica node. In the former case, all replicas plus the non-replica node will be involved during the repair. In the latter, only the actual replicas, hence why the latter is more efficient.

I don’t remember the logic around nodetool repair alone (-pr isn’t subject to this as it already assumed the primary ranges of the particular invoked replica), but if mind serves me well ScyllaDB would distribute the tasks accordingly across the natural endpoints. But best to check logs to confirm for a relatively sparse table.