Originally from the User Slack
@Ahmed: We’ve set up Elasticsearch for full-text search on a specific column. The challenge is retrieving full records from ScyllaDB in bulk. After finding results in Elasticsearch, querying ScyllaDB with multiple keys using the IN
operator requires ALLOW FILTERING
and doesn’t use the secondary index. Searching one by one is too slow. Is there a better way to fetch bulk data efficiently from ScyllaDB based on Elasticsearch results?
@avi: You should store the primary key in Elasticsearch along with the column you’re indexing
@Ahmed: Even with the primary key stored in Elasticsearch, retrieving thousands of records from ScyllaDB is still challenging due to the need for multiple queries.
@avi: If you fire them off in parallel (not using IN) the entire cluster bandwidth can be utilized.
@Ahmed: that’s what i’m doing right now using multiple threads to query prepared statement it’s work but i was looking for proper solution
@avi: Launching thousands of threads will be slow, better to use async
@Ahmed: i’m using threadpool and like 10 max threads
@avi: That’s not enough for thousands of keys. Better to use async.
@Ahmed: thanks it’s works great