How to bulk fetch data from ScyllaDB?

Guy · December 13, 2022, 5:58am

In our use case, we’d like to fetch data from Scylladb and put it into Elasticsearch. if we take records one by one, it takes too much time.
I couldn’t find a ScyllaDB binlog.
What’s the right way to do this?

*The question was asked on Stack Overflow by tianzhenjiu

Guy · December 13, 2022, 6:00am

You might want to look at using Change Data Capture in Scylla, then using the CDC tables to feed a Kafka topic that will populate Elasticsearch.

ScyllaDB’s CDC connector for Kafka is built on Debezium. You can read more about it here.

*The answer was provided on Stack Overflow by Peter Corless

Guy · December 13, 2022, 6:04am

And if you want to read everything on top of live additions using CDC, you can just write a sample scala spark application that will just load everything needing a fulltext search from Scylla to Elastic (sample apps are on the internet or have a look at series of blogs around Scylla migrator, which explain how to properly leverage dataframes).

Fwiw, Scylla supports the operator LIKE, in case a simple search will cut it for you (assuming your partitions are not huge) instead of the Lucene query language and the inverted indexes Elastic uses.

Some useful links:

Not sure how useful this will be:

*The answer was provided on Stack Overflow by Lubos

Topic		Replies	Views
Unable to do complex searches in ScyllaDB ScyllaDB data-model , cdc	1	46	August 26, 2024
How to sync from cassandra/scylladb to elasticsearch without using CDC? ScyllaDB cassandra , cdc , kafka , elasticsearch	0	86	June 27, 2024
New Kafka and ScyllaDB lab on ScyllaDB University University and Training cdc , kafka , integration , scylladb-university	0	613	January 23, 2023
How to implement full text search with ScyllaDB? ScyllaDB	0	577	June 23, 2024
Copying ScyllaDB data to S3, using Spark, performance optimization ScyllaDB performance , sstable , backup-restore	0	62	November 17, 2024

How to bulk fetch data from ScyllaDB?

Related topics