Latency issue and data retrieval, page size and number of threads (Java Driver)

Guy · December 23, 2024, 4:43am

Originally from the User Slack

@Shresth_Jain: Hi, is there a size limit for partitions cached in Scylla version 2024.1.12-0.20241023.6140bb5b2d0a?
I have a partition with around 24k records, and despite running 100-200 RPS queries for the same key, it seems like data is still being fetched from disk. The cache usage in Scylla Monitor isn’t spiking, and this is causing latency to reach several seconds.
Any suggestions?

@dor: There is no limit. Do you retrieve the entire partitions, or just several rows in each query? Anyway, 24k rows is more work. Best is to work closely with our solution architect

@Shresth_Jain: I retrieve an aggregate of 60 columns ( min, max, sum, count). So, all i Retreive is a single row which is the aggregation for a particular partition key.
FYi: I am running the exact same query each time.
Also, 24k is the max number of rows for a partition. The average is less than 2-3k.

@dor: We cache rows but not the aggregation values. You can turn own cql trace to see what’s happening under the hood

@Shresth_Jain: Will surely check this. Thanks.
Also, is 24k rows too much for a single partition? Say, If I do need to read all this data for a query then would it still be a better decision to try to look for ways to separate out this data across multiple partitions ( Maybe by changing the table schema )

@dor: Well, not necessary, we do have parallel aggregations that read and count partitions in parallel. Again, best to work with the SA who support you to figure out the cons/pros of your data model

@Shresth_Jain: Got it. I checked the trace for the query and found that data is indeed being fetched from the cache but it is happening in pages. Each time about 1.1k records are being fetched either from disk or cache and then the next range is being fetched. I hope this page size is configurable.
eg:
  8 - /10.161.0.187 - read_data: message received from /10.161.0.172
  17 - /10.161.0.187 - Start querying singular range {{-1489087050843527006, 000738333733383437}}
  22 - /10.161.0.187 - Found cached querier for key 1bbdae86-c1ae-4af3-888d-f28226a579e1 and range(s) {{{-1489087050843527006, 000738333733383437}}}
  28 - /10.161.0.187 - Reusing querier
  30 - /10.161.0.187 - Continuing paged query, previous page's trace session is 440edff0-b081-11ef-bed7-6bf6b765cac2
  149 - /10.161.0.187 - [reader concurrency semaphore] executing read
  6322 - /10.161.0.187 - Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1164 clustering row(s) (1164 live, 0 dead) and 0 range tombstone(s)
  6329 - /10.161.0.187 - Caching querier with key 1bbdae86-c1ae-4af3-888d-f28226a579e1
  6332 - /10.161.0.187 - Querying is done
  6339 - /10.161.0.187 - read_data handling is done, sending a response to /10.161.0.172
  55361 - /10.161.0.172 - read_data: got response from /10.161.0.187
  72169 - /10.161.0.172 - Creating read executor for token -1489087050843527006 with all: {10.161.0.172, 10.161.0.68, 10.161.0.187} targets: {10.161.0.187} repair decision: NONE
  72170 - /10.161.0.172 - Added extra target 10.161.0.172 for speculative read
  72170 - /10.161.0.172 - Creating speculating_read_executor
  72172 - /10.161.0.172 - read_data: sending a message to /10.161.0.187
  3 - /10.161.0.187 - read_data: message received from /10.161.0.172
  11 - /10.161.0.187 - Start querying singular range {{-1489087050843527006, 000738333733383437}}
  15 - /10.161.0.187 - Found cached querier for key 1bbdae86-c1ae-4af3-888d-f28226a579e1 and range(s) {{{-1489087050843527006, 000738333733383437}}}
  19 - /10.161.0.187 - Reusing querier
  20 - /10.161.0.187 - Continuing paged query, previous page's trace session is 440edff0-b081-11ef-bed7-6bf6b765cac2
  113 - /10.161.0.187 - [reader concurrency semaphore] executing read
  6130 - /10.161.0.187 - Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1165 clustering row(s) (1165 live, 0 dead) and 0 range tombstone(s)
  6134 - /10.161.0.187 - Caching querier with key 1bbdae86-c1ae-4af3-888d-f28226a579e1
  6136 - /10.161.0.187 - Querying is done
  6140 - /10.161.0.187 - read_data handling is done, sending a response to /10.161.0.172
  78915 - /10.161.0.172 - read_data: got response from /10.161.0.187
  95919 - /10.161.0.172 - Creating read executor for token -1489087050843527006 with all: {10.161.0.172, 10.161.0.68, 10.161.0.187} targets: {10.161.0.187} repair decision: NONE
  95920 - /10.161.0.172 - Added extra target 10.161.0.172 for speculative read
This also seems to me to reason for why the request served by coordinator were around 6ops but the read requests were around 150ops. As there are total of 24k records so it is approx making around 24 read requests per request received at coordinator giving total reads = 24* 6 ~150. Am I correct? Also, Is this good/expected?

@dor: Regarding the page size, it’s configurable, on the client side

@Shresth_Jain: Could you please help me regarding this?
I tried .setPageSize(2000) in java driver but still the page details in trace is same.
Reference:
SimpleStatement query =
        SimpleStatement.newInstance(Constants.SCYLLA_GET_USER_DEAL_DETAILS_QUERY).setTracing(true).setPageSize(2000);
@avi: ScyllaDB will limit itself to 1MB pages, so increasing the page size won’t help.

If I understand correctly, you’re reading the entire partition at a rate of 6 partitions/sec, and since it has 24k rows you’re reading 144k rows/sec. That’s fine for a single-threaded workload. If you add more threads, reading other partitions, you’ll get a much higher row rate.

From the tracing, the partition is cached. You can also check monitoring, there’s cache statistics in the Detailed dashboard.

@Shresth_Jain: For reading more threads, the only option would be horizontal or vertical scaling. RIght?
Can we configure the 1MB limit?

@avi: More threads = change the application to have more threads that read data independently

What problem are you trying to solve?

@Shresth_Jain: I have an API created using java spring. This API will receive a userId as its parameter and then run a range aggregate query on scyllaDB and then return that aggregated row response.
I am using the scyllaDB driver for java.

@avi: It’s working fine. Changing parameters won’t help.

If your application has a single thread of execution, it won’t be able to utilize all the nodes and all the CPUs that ScyllaDB runs on.

@Shresth_Jain: So, you are saying that using more number of thread at the application level will improve this performance?
Is there any particular configuration in scylladb Java driver for the same?

@avi: You need to make the application parallel. More threads, each consuming a different partition.

@Shresth_Jain: The spring api creates a new thread for every new api request so would’t this create the application parallel?

@Guy: Hey @Shresth_Jain, were you able to solve this?

Topic		Replies	Views
ScyllaDB returning a smaller page than expected to maintain low latency, handling of tombstones ScyllaDB performance , drivers , rust-driver , go-driver , paging	0	111	June 9, 2024
Data model with small partitions ScyllaDB data-model	0	94	March 24, 2024
Knowing ScyllaDB Limitations ScyllaDB data-model	1	2855	January 9, 2023
What is the maximum number of records that a scylla table can carry? ScyllaDB	3	1454	June 8, 2023
Limitations of tiny partitions ScyllaDB data-model , sizing , materialized-views , secondary-index , hot-partition	3	35	October 28, 2024

Latency issue and data retrieval, page size and number of threads (Java Driver)

Related topics