I am currently facing an issue while attempting to search the same string in two different columns using the IN operator in ScyllaDB. My aim is to achieve a query akin to column1 = 'c1' OR column2 = 'c1' in other databases. However, as I’ve learned, the OR operator isn’t supported in ScyllaDB.
I came across some discussions suggesting the use of the IN operator to circumvent this limitation. Following the guidance, I tried executing the following query:
WHERE (column1,column2) IN (('c1'),('c1'))
Unfortunately, I encountered an error stating:
Multi-column relations can only be applied to clustering columns but was applied to: column1
I have been trying to resolve this issue through various resources including the
ScyllaDB-Users slack, GitHub, and other online platforms, but haven’t found a clear solution yet. The responses on the community forum have been quite slow and I am in urgent need of resolving this.
I would immensely appreciate any guidance or alternative solutions to perform such a query in ScyllaDB. Has anyone faced a similar issue or have any insights on how to correctly use the IN operator for this scenario? Your help would expedite my project progress significantly.
As you realized from the error, you can’t do multi-column relations without specifying clustering columns in their order. If we allowed that, it would be very expensive to scan through all non-key components. Doing a multi-column relation on a composite key, however, is another story, and can be circumvented by using two AND clauses.
However, you are neither restricting your selection to a partition, nor a clustering column in your example, which makes me wonder what exactly you are trying to accomplish.
If your example query is realistic and you are doing a full table scan, then why don’t you use the token() function instead? Even better, combine it with a query engine which will allow you to run your OR clause you want. Spark and Presto can easily accomplish this.
If a full table scan is not what you want, then why don’t you simply issue two queries ? c1 || c2 can potentially retrieve: c1, c2 and (c1, c2). All you have to do is to ignore the latter if duplicates are a concern.
In summary, you can’t accomplish what you want with an IN clause like this.