We are a big data company writing over 300K to our cluster every second.
When we run a select query with “consistency 1” from the logs table we see the empty output. However, when I change the consistency to “all” then I receive the data.
What would be the reason for this?
QUERY: select * from (table name)
Furthermore, this usually happens when we select one of the logs from the logs table, process it and delete it after that and selecting another one.
When using “consistency 1”, data will be served by a single replica. If this replica happens to miss the data you want to query, then you will get empty result. When using “consistency all”, all replicas participate in the read and if at least one of them has the data, you will get it in the response. Furthermore, ScyllaDB will detect that there was a difference between the data different replicas provided and this will trigger a “read repair”, which will make sure all replicas hold the same data. After such a query, subsequent “consistency 1” requests should also return the data reliably.
I recommend running periodic repair on the cluster. Furthermore, if you want to reliably get back the data you wrote, both your reads and your writes should use “consistency quorum” at least. Anything less and your queries will be prone to differences like you describe above.