Need help with modelling the post/comment architecture to support complex queries

Bloodpriest · September 3, 2023, 5:12am

I need threaded comments in my posts but I cant seem to figure out how to model the create table.
My original table-
CREATE TABLE IF NOT EXISTS posts (
id string,
parent_id string,
comments_count bigint,
shares_count bigint,
settings blob,
user_id uuid,
created_at timestamp,
updated_at timestamp,
body string,
PRIMARY KEY ((user_id, parent_id), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
This was my original idea but the problem with this is I cant write a query that loads all the comments to this post since, theres no way I can pass the user_id.

Any ideas? I need per user to be able to see the posts theyve posted on their timeline, & individual posts to hold all associated comments too.

Attila_Toth · September 3, 2023, 10:08pm

Hi there,

What is the exact query you want to run? Generally, once you know the queries it’s easier to come up with a model that works and performant

Bloodpriest · September 4, 2023, 2:02am

I need something along the lines of-

select * from posts where user_id = 3 and parent_id = :not_set;
(for list of posts made from their personal account)

select * from posts where parent_id = :not_set and created_at < ....;
(for posts to display on homepage, this can infact use user_id if I want to show relevant posts, but ill have to pass a list of user_ids I want that are say friends of the current user so he can view their recent posts etc.)

& now the most difficult problem-
parent_id :not_set means that this is a post from a user. if parent_id = <some_id>, then this some_id is the id of an existing post, & it means that this post is actually a comment to another post. Im trying to implement hierarchical comments using the same post model. I want to include threaded comments. But the problem with this is, theres no way for me to provide the user_id in the query that loads the comments.
select * from posts where parent_id= <post_id>;
where post_id is the post im trying to load the comments for

So for the first query (or maybe 2nd query too) to work efficiently, I need user_id, but I cant provide user_id for the 3rd query since idk which user has commented on the post. How do we solve this?

Attila_Toth · September 4, 2023, 6:21pm

Maybe an approach you can try is adding a Materialized View with parent_id as the key? So in this case you’d have a schema like this:

CREATE TABLE IF NOT EXISTS demo.posts (
	id text,
	parent_id text,
	comments_count bigint,
	shares_count bigint,
	settings blob,
	user_id uuid,
	created_at timestamp,
	updated_at timestamp,
	body text,
	PRIMARY KEY (user_id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);

This would work well with the following queries:

select * from posts where user_id = 3
select * from posts where user_id = 3 and created_at < some_timestamp

Then you’d have a MV (example):

CREATE MATERIALIZED VIEW demo.posts_by_parent AS
        SELECT * FROM demo.posts
        WHERE parent_id IS NOT NULL and created_at IS NOT NULL
        PRIMARY KEY(parent_id, user_id, created_at);

So if you need to query by the parent id you can use this view like this:

select * from posts_by_parent where parent_id= <post_id>

Bloodpriest · September 5, 2023, 2:29am

Hmm, this does look great. 3 questions though-

We cannot have a materialized view partition key given as a tables partition/clustering key?
Is this approach scalable for practical situations? Imagine a load equivalent to that of say- twitter
when 2 partition keys are mentioned, the 1st locates the node which holds the data via hash, & am I right to think that the 2nd key also does the same thing within that located node. Kind of like locating a virtual node within a node to more effectively pin point the location?

Attila_Toth · September 5, 2023, 9:50am

We cannot have a materialized view partition key given as a tables partition/clustering key?

All PRIMARY KEY columns in the base table need to be included in the PRIMARY KEY in the materialized view

Is this approach scalable for practical situations? Imagine a load equivalent to that of say- twitter

Yes, materialized views are scalable, but the DB will take up more space on disk because the data gets duplicated in the MV.

when 2 partition keys are mentioned, the 1st locates the node which holds the data via hash, & am I right to think that the 2nd key also does the same thing within that located node. Kind of like locating a virtual node within a node to more effectively pin point the location?

In a composite partition key (two or more columns) all fields in the partition key are used to generate the hash

Bloodpriest · September 5, 2023, 11:24am

thankyou so much for your time & help.
I appreciate it.

Bloodpriest · September 8, 2023, 9:57am

sorry, ive got a bit of a problem.

i created the post table & the MV. I needed to change the post table. So once I changed the create table query, I tried to delete the original post table but I couldnt due to the MV & it seems drop isnt supported on MVs. How do I drop/delete the MV?

Attila_Toth · September 8, 2023, 4:13pm

Can’t you do drop materialized view keyspace.mv ?

Bloodpriest · September 8, 2023, 5:04pm

oh yeah this works. thanks again.
I couldnt find any docs on the drop. My bad

Thankyou

Topic		Replies	Views
How to Optimize Tombstone Issues Caused by Updating Primary Keys in ScyllaDB Materialized Views? ScyllaDB data-model , materialized-views , tombstone	1	44	April 15, 2025
Issue in creating Mat view with aggregations in view itself ScyllaDB materialized-views	4	58	July 11, 2024
Number of Materialized Views and Impact on Performance ScyllaDB performance , materialized-views	0	182	April 22, 2024
Reddit like app - use single or multiple tables? ScyllaDB	2	251	December 14, 2022
Deleting Materialized View and affect on tombstone data, disk usage ScyllaDB materialized-views , tombstone	0	16	August 18, 2024

Need help with modelling the post/comment architecture to support complex queries

Related topics