Using Elasticsearch for full-text search on a specific column with ScyllaDB - performance

Guy · February 16, 2025, 4:41am

Originally from the User Slack

@Ahmed: We’ve set up Elasticsearch for full-text search on a specific column. The challenge is retrieving full records from ScyllaDB in bulk. After finding results in Elasticsearch, querying ScyllaDB with multiple keys using the IN operator requires ALLOW FILTERING and doesn’t use the secondary index. Searching one by one is too slow. Is there a better way to fetch bulk data efficiently from ScyllaDB based on Elasticsearch results?

@avi: You should store the primary key in Elasticsearch along with the column you’re indexing

@Ahmed: Even with the primary key stored in Elasticsearch, retrieving thousands of records from ScyllaDB is still challenging due to the need for multiple queries.

@avi: If you fire them off in parallel (not using IN) the entire cluster bandwidth can be utilized.

@Ahmed: that’s what i’m doing right now using multiple threads to query prepared statement it’s work but i was looking for proper solution

@avi: Launching thousands of threads will be slow, better to use async

@Ahmed: i’m using threadpool and like 10 max threads

@avi: That’s not enough for thousands of keys. Better to use async.

@Ahmed: thanks it’s works great

Topic		Replies	Views
How to implement full text search with ScyllaDB? ScyllaDB	0	553	June 23, 2024
How to bulk fetch data from ScyllaDB? ScyllaDB cdc , kafka , elasticsearch	2	526	December 13, 2022
Unable to do complex searches in ScyllaDB ScyllaDB data-model , cdc	1	45	August 26, 2024
Using IN in a query for a specific partition, is the entire partition fetched? ScyllaDB data-model , performance	0	30	July 23, 2024
Materialized Views and Indexing, filtering columns by range, ALLOW FILTERING ScyllaDB data-model , materialized-views , secondary-index	0	146	May 8, 2024

Using Elasticsearch for full-text search on a specific column with ScyllaDB - performance

Related topics