@Joseph_Stroman: The expected behavior is Alternator would return all the items it can and if there are leftovers then provide the last evaluated key (cursor) to continue “paginating” through the table. It looks like the cursor still works, because when I use it, it finally returns the orders but they should be returned initially (without using the cursor)
@Felipe_Cardeneti_Mendes: Interesting. I wonder if this has anything to do with tombstones — I would guess yes because the paging infrastructure is the same
Plus it’s an index so the likeliness of scanning many tombstones is higher than the base table
So … can you check it?
Look at a setting called query_tombstone_page_limit — increase it to see if you stop receiving an empty response
@Joseph_Stroman: It worked!
So I assume I should decrease the gc_grace_seconds for this table?
Anything I should be aware of with that? (Im guessing its more resource intensive)
@Felipe_Cardeneti_Mendes: Ok, so you just found an implementation detail. Dynamo doesn’t have a tombstone concept as they don’t use LSM — but we do.
The idea is to return a quick response back to the client so it doesn’t timeout the request and allow for concurrency of other queries to proceed
So I think the default setting is sane, we need just to document this finding. I will open an issue (and maybe send a PR)
Tombstones slow down your read path. Many tombstones can even cause timeouts.
So the balance is how many tombstones Scylla needs to scan through before it decides to save an empty page back to your client — as a resort to prevent it from timing out (like: I am still processing your query, which is expensive because of these tombstones)
Ideally you want to compact them away faster, so either tune compaction settings, or gc_grace or both — tombstone_gc=repair is also an option
@Joseph_Stroman: Thanks for the explanation, I’m reading over these docs links to get more familiar with it:
https://opensource.docs.scylladb.com/stable/kb/gc-grace-seconds.html
https://opensource.docs.scylladb.com/stable/architecture/compaction/compaction-strategies.html#which-strategy-is-best
https://opensource.docs.scylladb.com/stable/kb/compaction.html
Also this one but not sure if it’s relevant if im using the open source version:
https://enterprise.docs.scylladb.com/stable/kb/garbage-collection-ics.html
If you know any other useful docs let me know, and thanks again for the response
@Felipe_Cardeneti_Mendes: I guess before even reading docs you may want to first see “how bad” it is. When you read many tombstones typically you will see warnings in logs telling you about it. This is controlled by the tombstone_warn_threshold setting and is logged on a per page basis.
For example, if most of your reads are fine, but only a few ones are showing this, maybe you dont need to bother
If most of the reads are empty paging, then that’s when you would perhaps optimize it a bit
in general - these docs work - see also https://opensource.docs.scylladb.com/stable/cql/ddl.html#ddl-tombstones-gc
at the end of the day, it is about wisely choosing your battles. eg: Dynamo doesn’t have tombstones, but writes are super expensive.
@Joseph_Stroman: Ok so just for reference, we have are storing an “orderbook”, where we expect many entries and in our case each entry will be definitely be updated once, and in rare cases updated twice.
I’m seeing two different tombstone related logs for the same “index”:
Sep 06 12:10:58 **** scylla[4420]: [shard 1:stmt] querier - Read 0 live rows and 2056 tombstones for ****:orderStatus-createdAt-all partition key "open" {{-8692956017313256067, pk{00046f70656e}}} (see tombstone_warn_threshold)
Sep 06 12:11:01 **** scylla[4420]: [shard 1:stmt] querier - Read 0 live rows and 10000 tombstones for ****:orderStatus-createdAt-all partition key "open" {{-8692956017313256067, pk{00046f70656e}}} (see tombstone_warn_threshold)
I know there is a difference in how indexing is handled compared to dynamodb, but not sure if that’s affecting this
I’m guessing there are two logs because one partition reached the tombstone limit (10000)
I set it back to the default so I can figure out the correct strategy, and in my case once it reaches that limit I think it should just be compacted away because we don’t care at all about data that was there before an update
I also wouldn’t mind immediate tombstone deletion, just unsure of the consequences of that
@Felipe_Cardeneti_Mendes: oh, well … this is probably going to take long for me to advise - even worse via slack
If you’d like -
Note it isn’t a support channel, but a place for you to openly discuss your use case, ask questions, strategies, and such.
https://github.com/scylladb/scylladb/issues/20474
@Joseph_Stroman: Ok I think I found a fix that works for us for now, but if not I will schedule a strategy session.
Since we are only using one node and need to iterate fast I think this will suffice:
I moved grace period to 1 hr for all tables and indexes (and set default in the yaml file to 1 hr)
And set the tombstone page limit to 1000000, which may be overkill but it prevents the client from needing to paginate, and if the tombstones ever accumulate that much we have bigger problems I assume.
And I’ll be following the issue to see what you guys recommend, and I can add any details if you need
@Felipe_Cardeneti_Mendes: question - are you receiving all items in the initial pages and then nothing in the subsequent ones? Or are you getting an empty page at the beginning until you reach “live data” ?
in my short-lived test, for whatever reason I dont yet understand I get all live items, and subsequent empty pages after
so I am trying to understand if what I am seeing aligns to what you are reporting.
Nvm, I figured it out.
@Joseph_Stroman: cool cool, but just for clarity yes it was the first thing, initial pages were empty (i assume because there were tombstones at the top) also sorting by creation time and descending made it return what we wanted which makes sense
we just have indexes that sort on other keys so we couldn’t just do that as a fix