How to parse the commitlog?

Directly view the file CommitLog-2-8901636397.log as garbled code. How to make the file readable by humans? Is there any parsing tool available? Thanks!

Currently there are no end-user tools to parse commit logs. I do plan to add such a tool in the near future, so stay tuned.

Out of curiosity, why would you like to dump the content of this commit-log file?

There is an indirect way to dump the commit-log:

  1. Start up a single scylla instance, wait until it finishes booting then stop it. Make sure to start this instance, such that it doesn’t join any existing cluster!!!
  2. Copy over the contents of `/var/lib/data/system_schema/* to the new Scylla instance, overwriting everything that is already there.
  3. Copy over the commitlog file to /var/lib/commitlog.
  4. Start the node. It will replay the commit log. As soon as it finishes starting-up, stop it.
  5. The data will be in sstables, you can use scylla-sstable to examine it.

Note that this method will make the data go through insertion to memtable, then flush. Some (or even all) of the data might be compacted away in the process.

2 Likes

Thank you, I look forward to it.
In order to facilitate the statistics of certain data, I would like to migrate the data to other databases through commitlog. Just an idea.

This sounds like a risky reimplementation of CDC.

You will need to deduplicate updates from each replica and lose the pre and post-images CDC provides.

Hi, I use the pretty_printer function of frozen_mutation to print the
content of commitlog, making it easy for humans to read.

In order to be able to call it offline, I added a parsing tool. Now I have a question: how to initialize db and sys_ks? May I ask if you have any suggestions? Thank you.

Starting those two services is quite risky in the sense that they might try to create files. Worse still, if you run this tool on a live ScyllaDB node, they might access and mutate the files of the ScyllaDB node.
So I think the way to go forward is to cut the dependency between the commitlog replayer and these services, by creating an interface and two implementations. The interface would have all the methods that the commitlog replayer requires from these services, e.g. find_column_family(), extensions(), etc. One implementation would use the db and sys_ks to implement these methods, just like the current code. Another implementation would be in the tools, which would try to do just the minimum amount of implementation possible, to allow the commitlog replayer to replay the commitlog. This might need some experimenting to get it working.

1 Like