Running CQL query on different nodes in the cluster gives different results

I have a cluster of 3 nodes in a single DC. I have followed the instructions given in https://docs.scylladb.com/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc/ to add 3 new nodes in a new DC. After the nodes in the new DC are started, I check nodetool status and ensure all are up and running. Now since all nodes are part of the same cluster, I assume the query results should be the same irrespective of which node I run the cql query on, isn’t it?
But I see that the data is different when the query is run on different nodes. In fact the query results are different when the query is run on different nodes of the same DC too! The following differences are observed (this is not a complete list though):

  1. On N1, there are no records for some of the primary keys, but on N2, there are a few records for the same primary key.
  2. On N1, the count of records shows a different value than on N2.
  3. This difference is observed among different nodes in the same DC as well as cross-DC.

My keyspace previously used SimpleStrategy with a replication factor of 2. While adding the new DC, as part of the steps described in the documentation, I have modified it to use NetworkTopologyStrategy with a replication factor of 2 in both DCs:

ALTER KEYSPACE ks WITH replication = { 'class' : 'NetworkTopologyStrategy', 'existing-dc' : 2, 'new-dc' : 2};

Why is this difference? What am I missing? This is a sample keyspace and table definition:

CREATE TABLE ks.cf (
    hourofyear int,
    operationtime bigint,
    action text,
    entityid text,
    entitytype text,
    operatorid text,
    PRIMARY KEY (hourofyear, operationtime)
) WITH CLUSTERING ORDER BY (operationtime DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'LeveledCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX auditid ON ks.cf (entityid);
CREATE INDEX agentid ON ks.cf (operatorid);
CREATE INDEX auditaction ON ks.cf (action);

A sample query:

select count(*) from ks.cf where hourofyear = 4444;

This query gives different results on different runs (even within the same minute). Sometimes it shows the same result on all nodes. The data is not getting written to this keyspace anymore. Why am I seeing this difference on multiple runs?

cqlsh> select count(*) from ks.cf where hourofyear = 4446;

 count
-------
  1072

cqlsh> select count(*) from ks.cf where hourofyear = 4446;

 count
-------
  1545

The 2 runs were just a few seconds apart. Why is there a difference? Can someone throw some light on this please?

*The question was asked on Stack Overflow by Shobhana Sriram

You didn’t mention when was the last time you ran nodetool repair (or using Scylla Manager to run repair) on this cluster.

ScyllaDB (as well as Cassandra) uses eventual consistency, which means your write request will be satisfied when the Consistency level (CL) of that request was achieved.

If you used CL=ONE for your writes then only 1 replica needs to ACK for the application to consider this successful. The replication to the 2nd replica will be done a-synchronically (and can also fail for various reasons).

Here comes the anti-entropy mechanism, which you can read more about here: https://docs.scylladb.com/architecture/anti-entropy/

You must make sure your cluster completed cluster-wide repair before the table’s gc_grace_seconds value (default 10 days)

You should have also fully repaired your cluster before adding the 2nd DC, or at least done it after you’ve added the 2nd DC. This is also written in our docs.

On top of all the above, you are running a CQL query, and unless you changed the CQL query’s CL, it’s using the default CL=ONE. This means that EVERY single replica in the cluster (from either of the 2 DCs) can respond to your read request, and as explained above, the data is most likely inconsistent currently.

Read more about Architecture → Ring Architecture / CL here: https://docs.scylladb.com/architecture/console-CL-full-demo/ https://docs.scylladb.com/architecture/ringarchitecture/

I highly recommend you to visit Scylla University and learn more about all that I wrote here and much more: https://university.scylladb.com/courses/scylla-essentials-overview/lessons/architecture/

*The answer was provided on Stack Overflow by TomerSan