How to parse the commitlog?

bo_li · March 5, 2024, 7:02am

Directly view the file CommitLog-2-8901636397.log as garbled code. How to make the file readable by humans? Is there any parsing tool available? Thanks!

Botond_Denes · March 5, 2024, 9:16am

Currently there are no end-user tools to parse commit logs. I do plan to add such a tool in the near future, so stay tuned.

Out of curiosity, why would you like to dump the content of this commit-log file?

Botond_Denes · March 5, 2024, 9:19am

There is an indirect way to dump the commit-log:

Start up a single scylla instance, wait until it finishes booting then stop it. Make sure to start this instance, such that it doesn’t join any existing cluster!!!
Copy over the contents of `/var/lib/data/system_schema/* to the new Scylla instance, overwriting everything that is already there.
Copy over the commitlog file to /var/lib/commitlog.
Start the node. It will replay the commit log. As soon as it finishes starting-up, stop it.
The data will be in sstables, you can use scylla-sstable to examine it.

Note that this method will make the data go through insertion to memtable, then flush. Some (or even all) of the data might be compacted away in the process.

bo_li · March 6, 2024, 6:19am

Thank you, I look forward to it.
In order to facilitate the statistics of certain data, I would like to migrate the data to other databases through commitlog. Just an idea.

tzach · March 6, 2024, 7:34am

This sounds like a risky reimplementation of CDC.

You will need to deduplicate updates from each replica and lose the pre and post-images CDC provides.

bo_li · April 8, 2024, 7:48am

Hi, I use the pretty_printer function of frozen_mutation to print the
content of commitlog, making it easy for humans to read.

In order to be able to call it offline, I added a parsing tool. Now I have a question: how to initialize db and sys_ks? May I ask if you have any suggestions? Thank you.

Botond_Denes · April 11, 2024, 6:00am

Starting those two services is quite risky in the sense that they might try to create files. Worse still, if you run this tool on a live ScyllaDB node, they might access and mutate the files of the ScyllaDB node.
So I think the way to go forward is to cut the dependency between the commitlog replayer and these services, by creating an interface and two implementations. The interface would have all the methods that the commitlog replayer requires from these services, e.g. find_column_family(), extensions(), etc. One implementation would use the db and sys_ks to implement these methods, just like the current code. Another implementation would be in the tools, which would try to do just the minimum amount of implementation possible, to allow the commitlog replayer to replay the commitlog. This might need some experimenting to get it working.

Topic		Replies	Views
On startup: commitlog - cannot parse the version of the file: Commitlog-2-xxxxxxxxx.log ScyllaDB	1	304	March 29, 2023
Last week in scylladb.git master (issue #208; 2023-12-10) ScyllaDB git-news	0	196	December 10, 2023
Commit Log Schema Separation ScyllaDB	3	748	September 19, 2023
Last week in scylladb.git master (issue #206; 2023-11-26) ScyllaDB git-news , open-source	0	220	November 26, 2023
Last week in scylladb.git master (issue #200; 2023-10-08) ScyllaDB git-news	0	228	October 8, 2023

How to parse the commitlog?

Related topics