Django + ScyllaDB at chat scale — is raw scylla-driver + prepared-statement registry the canonical pattern, or any thin abstraction the community recommends?

Installation details
#ScyllaDB version: planning ScyllaDB Cloud (latest GA) — greenfield, not yet deployed
#Cluster size: planned 3 nodes (32 vCPU each), RF=3, single region
os: ScyllaDB Cloud (managed) — Django app runs on Ubuntu 22.04


Hi! We are building a real-time chat on Django — this is our first ScyllaDB workload.

Scale target: ~30M messages/day, ~200k concurrent WebSocket users, ~1M DAU.

What we are doing today:

  • Using scylla-driver directly (shard-aware fork, NOT cassandra-driver)
  • One module-level registry of ~25 CQL strings (all with ? placeholders), prepared once at startup
  • TokenAwarePolicy(DCAwareRoundRobinPolicy) + ConsistencyLevel.LOCAL_QUORUM + dict_factory
  • Four small caller helpers: one(name, params), rows(name, params), all_pages(name, params), run(name, params) — nobody writes inline CQL
  • Bulk writes from Kafka consumers: group rows by full partition key (room_id, bucket=YYYYMM), one UNLOGGED BatchStatement per partition, fired with execute_concurrent(concurrency=128)
  • Schema is query-driven: messages clustered msg_id DESC inside (room_id, bucket); two-table inbox = user_inbox UPDATE-in-place + skinny user_inbox_index order index

What we deliberately did NOT use:

  • django-cassandra-engine / django-scylla — dynamic ORM CQL seems to defeat prepared statements + token-aware routing
  • cassandra.cqlengine Object Mapper — older versions miss shard-aware routing

My questions:

  1. Is this raw scylla-driver + prepared-statement registry the canonical pattern at chat scale, or is there a community-favored thin abstraction (typed wrappers / lightweight model layer) we should consider?
  2. Anything Django-specific we should add or avoid (connection lifecycle across WSGI/ASGI workers, settings handling, migration tooling)? Django ORM still owns our Postgres tables (users, wallet); Scylla is for chat only.
  3. For bulk writes — is concurrency=128 reasonable for a 3-node, 32-shard-per-node cluster, or should we tune by per-shard throughput?

Thanks for any guidance — want to make sure we’re not missing a pattern that production Scylla shops standardly use.

Hi there,

Sounds like an interesting use case! I think your approach should be mostly fine. The important thing is you match your queries in the app with the schema, but seems like you are on the right path,

Are you using any framework for orchestration (like LangGraph or similar)? We’re working on tigther integration with some of the popular AI tools, which might be helpful for you. Also, you haven’t mentioned vector search, but ScyllaDB has vector search support which might be relevant for you as well, docs here.

Thanks @Attila_Toth

No LangGraph / vector workload on our roadmap — durable record storage + a high-throughput write path is the whole reason we picked ScyllaDB.

Pressing on the three questions because they’re where I most want a Scylla SA opinion before we sign off on the design:

1) Can we use any ORM in Django with ScyllaDB, or is raw scylla-driver the only safe path at scale?
The Python options I’ve evaluated:

  • django-cassandra-engine — Django models → dynamic CQL. Concern: dynamic CQL defeats prepared statements + token-aware routing.
  • cassandra.cqlengine ObjectMapper — older versions missed shard-aware routing; not sure about the current state.
  • django-scylla — small community, looks abandoned.
  • aiocassandra / acsylla — async wrappers, not full ORMs.
  • Raw scylla-driver + a module-level prepared-statement registry — what we’re doing today.

What we have: one registry of ~25 CQL strings (all ? placeholders), prepared once at startup; four thin caller helpers (one / rows / all_pages / run); TokenAwarePolicy(DCAwareRoundRobinPolicy) + LOCAL_QUORUM + dict_factory. No ORM, no dynamic CQL anywhere.

Question: do any of your large Django + ScyllaDB customers actually run an ORM layer in prod and still preserve prepared-statement + token-aware-routing properties? If so, which ORM and what version? Or is raw-driver-with-prepared-registry the de-facto pattern your SA team recommends, and we should stop looking at ORMs entirely?

2) Django connection lifecycle (Gunicorn WSGI + Daphne ASGI in the same app)
Web traffic is Gunicorn -w N -k gthread; WebSocket consumers are Daphne. Same Django project, same settings.SCYLLA_* env. Two specific asks:

  • Should the Cluster + Session be lazy-initialised post-fork (so preforked Gunicorn workers don’t share file descriptors), or is there a recommended Scylla-side pattern for sharing a driver across forked workers?
  • For ASGI consumers: the driver is sync; do you have customers running database_sync_to_async (Channels) wrappers around session.execute_async(), or do they typically push the Scylla work onto sync background workers (Kafka consumers / Celery) and keep ASGI handlers Scylla-free? We’re doing the latter, but want to confirm it’s the recommended boundary.

3) execute_concurrent concurrency target for a 3-node × 32-shards-per-node × RF=3 cluster
Bulk-write pattern: rows grouped by full partition key → one BatchStatement(UNLOGGED) per partition (never cross-partition) → fired with execute_concurrent(concurrency=128, raise_on_first_error=False).

Is there a per-shard throughput rule-of-thumb your perf team uses to derive a starting concurrency= value, or is it purely empirical (client.requests.queue_size + p99 write latency observation)? Workload is well below cluster capacity at planned scale — we’re looking for a defensible starting number for the SA review, not raw throughput tuning.

Happy to share our connection-class code if seeing the actual implementation helps you give a concrete answer. Schema is small enough to include if useful (single-partition reads + writes only, no materialized views, no LWT, no ALLOW FILTERING, no secondary indexes).