Need help with modelling the post/comment architecture to support complex queries

I need threaded comments in my posts but I cant seem to figure out how to model the create table.
My original table-
CREATE TABLE IF NOT EXISTS posts (
id string,
parent_id string,
comments_count bigint,
shares_count bigint,
settings blob,
user_id uuid,
created_at timestamp,
updated_at timestamp,
body string,
PRIMARY KEY ((user_id, parent_id), created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
This was my original idea but the problem with this is I cant write a query that loads all the comments to this post since, theres no way I can pass the user_id.

Any ideas? I need per user to be able to see the posts theyve posted on their timeline, & individual posts to hold all associated comments too.

Hi there,

What is the exact query you want to run? Generally, once you know the queries it’s easier to come up with a model that works and performant

I need something along the lines of-

select * from posts where user_id = 3 and parent_id = :not_set;
(for list of posts made from their personal account)

select * from posts where parent_id = :not_set and created_at < ....;
(for posts to display on homepage, this can infact use user_id if I want to show relevant posts, but ill have to pass a list of user_ids I want that are say friends of the current user so he can view their recent posts etc.)

& now the most difficult problem-
parent_id :not_set means that this is a post from a user. if parent_id = <some_id>, then this some_id is the id of an existing post, & it means that this post is actually a comment to another post. Im trying to implement hierarchical comments using the same post model. I want to include threaded comments. But the problem with this is, theres no way for me to provide the user_id in the query that loads the comments.
select * from posts where parent_id= <post_id>;
where post_id is the post im trying to load the comments for

So for the first query (or maybe 2nd query too) to work efficiently, I need user_id, but I cant provide user_id for the 3rd query since idk which user has commented on the post. How do we solve this?

Maybe an approach you can try is adding a Materialized View with parent_id as the key? So in this case you’d have a schema like this:

CREATE TABLE IF NOT EXISTS demo.posts (
	id text,
	parent_id text,
	comments_count bigint,
	shares_count bigint,
	settings blob,
	user_id uuid,
	created_at timestamp,
	updated_at timestamp,
	body text,
	PRIMARY KEY (user_id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);

This would work well with the following queries:

select * from posts where user_id = 3
select * from posts where user_id = 3 and created_at < some_timestamp

Then you’d have a MV (example):

CREATE MATERIALIZED VIEW demo.posts_by_parent AS
        SELECT * FROM demo.posts
        WHERE parent_id IS NOT NULL and created_at IS NOT NULL
        PRIMARY KEY(parent_id, user_id, created_at);

So if you need to query by the parent id you can use this view like this:

select * from posts_by_parent where parent_id= <post_id>

Hmm, this does look great. 3 questions though-

  1. We cannot have a materialized view partition key given as a tables partition/clustering key?
  2. Is this approach scalable for practical situations? Imagine a load equivalent to that of say- twitter
  3. when 2 partition keys are mentioned, the 1st locates the node which holds the data via hash, & am I right to think that the 2nd key also does the same thing within that located node. Kind of like locating a virtual node within a node to more effectively pin point the location?

We cannot have a materialized view partition key given as a tables partition/clustering key?

All PRIMARY KEY columns in the base table need to be included in the PRIMARY KEY in the materialized view

Is this approach scalable for practical situations? Imagine a load equivalent to that of say- twitter

Yes, materialized views are scalable, but the DB will take up more space on disk because the data gets duplicated in the MV.

when 2 partition keys are mentioned, the 1st locates the node which holds the data via hash, & am I right to think that the 2nd key also does the same thing within that located node. Kind of like locating a virtual node within a node to more effectively pin point the location?

In a composite partition key (two or more columns) all fields in the partition key are used to generate the hash

thankyou so much for your time & help.
I appreciate it. :sparkling_heart:

1 Like

sorry, ive got a bit of a problem.

i created the post table & the MV. I needed to change the post table. So once I changed the create table query, I tried to delete the original post table but I couldnt due to the MV & it seems drop isnt supported on MVs. How do I drop/delete the MV?

Can’t you do drop materialized view keyspace.mv ?

oh yeah this works. thanks again.
I couldnt find any docs on the drop. My bad

Thankyou

1 Like