Originally from the User Slack
@Dharun: Hi, is it okay to run scylladb with spark on the same node like how datastax has (Cassandra + spark) ?
If yes, will it impact any performance ?
@Felipe_Cardeneti_Mendes
@Felipe_Cardeneti_Mendes: tldr; it is possible, as long as you tune both services properly. The easiest thing to do is to simply run them separated.
Long answer:
Similarly as it happens with Cassandra, the main reason NOT to run both colocated into the same machine is to avoid Spark from consuming DB resources. With ScyllaDB, you want to ensure that your spark workers are pinned to CPUs not in use by the database to avoid performance problems.
@Dharun: is there any documentation for tuning,
we have dse setup, just thinking for similar setup with open source spark & scylladb
@Felipe_Cardeneti_Mendes: https://opensource.docs.scylladb.com/stable/getting-started/scylla-in-a-shared-environment.html
ScyllaDB in a Shared Environment | ScyllaDB Docs
this is ScyllaDB specific tuning. Check Spark documentation on how to pin them to specific cores
@Dharun: @Felipe_Cardeneti_Mendes
I see the following is mentioned
On Red Hat / CentOS, open a terminal and edit /etc/sysconfig/scylla-server, and add --memory 2G to restrict Scylla to 2 gigabytes of RAM.
but under /etc/sysconfig/scylla-server where do i set the args ?
SCYLLA_ARGS
?
@Felipe_Cardeneti_Mendes: or you can run scylla_memory_setup