Originally from the User Slack
@Mark_Barinstein: Hello.
I’m trying to use repair API with token ranges and have a number of questions.
- Empty [start | end]Token:
Consider the following result:
curl 10.0.32.70:10000/storage_service/range_to_endpoint_map/markb | jq -r '.[] | select (.key[0] == "" or .key[1] == "")'
{
"key": [
"9218948411409126567",
""
],
"value": [
"10.0.32.70",
"10.0.32.74",
"10.0.32.77",
"10.0.32.78",
"10.0.32.75",
"10.0.32.71"
]
}
{
"key": [
"",
"-9213378514646091305"
],
"value": [
"10.0.32.70",
"10.0.32.74",
"10.0.32.77",
"10.0.32.78",
"10.0.32.75",
"10.0.32.71"
]
}
The following command returns {"message": "boost::wrapexcept<boost::bad_lexical_cast> (bad lexical cast: source type value could not be interpreted as target)", "code": 500}
curl -X POST '10.0.32.70:10000/storage_service/repair_async/markb?ranges=9218948411409126567%3A'
Q1.1: Is it possible to repair this token range using the ranges parameter?
Q1.2: If A1.1 is “no”, then what’s the proper command to repair such a token range (if I need to repair it at all)?
- Consider the following result:
curl 10.0.32.70:10000/storage_service/range_to_endpoint_map/markb | jq -r '.[] | select (.key[0] == "-1569604604390479271")'
{
"key": [
"-1569604604390479271",
"-1519586698886497919"
],
"value": [
"10.0.32.70",
"10.0.32.75",
"10.0.32.72",
"10.0.32.78",
"10.0.32.76",
"10.0.32.74"
]
}
Notice, that there is no 10.0.32.71 in the endpoints list of this token range.
curl -X POST '<http://10.0.32.71:10000/storage_service/repair_async/markb?primaryRange=true&ranges=-1569604604390479271%3A-1519586698886497919>'
But the command above “successfully” repairs this token range according to the result of the following command (with X returned by the above command).
The result doesn’t change, if I don’t use the primaryRange parameter.
curl '<http://10.0.32.71:10000/storage_service/repair_status/?id=X>'
Q2: How is it possible to successfully run a repair command of a token range on a host which doesn’t present in the endpoints list of this token range?
Is it some “feature”?
Thanks in advance.
FYI:
nodetool repair [-pr] -st -1569604604390479271 -et -1519586698886497919 markb
The command above behaves the same: it always returns success disregarding the host where I run it.
Moreover, I even get success if I specify whatever start & end tokens like:
nodetool repair -st 0 -et -1 markb
How could one trust such a result?
@Felipe_Cardeneti_Mendes: well, your first repair_async command doesn’t contain an ending token, so clearly it fails. The former range_to_endpoint_map output means it wraps around, so you should apply the relevant values instead.
Personally, I find /storage_service/describe_ring/{keyspace} more convenient. Consider:
{
"start_token": "9160570400458636044",
"end_token": "-9185903984411353280",
"endpoints": [
"172.31.13.187",
"172.31.3.5"
],
"rpc_endpoints": [
"172.31.13.187",
"172.31.3.5"
],
"endpoint_details": [
{
"host": "172.31.13.187",
"datacenter": "datacenter1",
"rack": "rack2"
},
{
"host": "172.31.3.5",
"datacenter": "datacenter1",
"rack": "rack3"
}
]
}
And the following list of operations:
root@ip-172-31-4-30:/var/lib/scylla# curl -X POST '127.0.0.1:10000/storage_service/repair_async/system_traces?ranges=9160570400458636044:-9185903984411353280'
6
root@ip-172-31-4-30:/var/lib/scylla# curl 127.0.0.1:10000/storage_service/repair_async/system_auth?id=6
"SUCCESSFUL"
172.31.4.30 isn’t a replica and received the request to repair the range owned by 2 other replicas. In that sense, it is a repair coordinator, and you’ll observe the result of this coordination under its logs.
That said, you may call it a “feature”, but as you can imagine it is a bit pointless to start a repair task using a non-replica coordinator, as you’ll transfer data around unnecessarily.
Overall, Scylla Manager streamlines all this logic, including when tablets are used which have their own APIs for repairing.
@Mark_Barinstein: Indeed, the storage_service/describe_ring API result doesn’t have such a “feature” with empty key boundary unlike storage_service/range_to_endpoint_map:
$ curl -s ${host?}:10000/storage_service/describe_ring/${ks?} | jq -r ".[] | select ( .start_token == \"\" or .end_token == \"\" )"
$
$ curl -s ${host?}:10000/storage_service/describe_ring/${ks?} | jq -r ".[] | select ( (.start_token | tonumber) > (.end_token | tonumber) ) | {"start_token": .start_token, "endtoken": .end_token, "endpoints": .endpoints} "
{
"start_token": "9218948411409126567",
"endtoken": "-9213378514646091305",
"endpoints": [
"10.0.32.70",
"10.0.32.74",
"10.0.32.77",
"10.0.32.78",
"10.0.32.75",
"10.0.32.71"
]
}
$
Will use describe_ring instead. I started to use range_to_endpoint_map because of its smaller result set - w/o useless fields in my case.
Thanks!
BTW, more problems with the Scylla nodetool repair utility (and similar API).
Its result is not compatible with Cassandra’s one sometimes.
https://forum.scylladb.com/t/repair-by-token-ranges/4964
ScyllaDB Community NoSQL Forum: Repair by token ranges