Tools to analyse slow queries in ClickHouse
At Contentsquare, we rely a lot on ClickHouse. We use it as the main storage of all our analytics products. We also use it for deeper analysis done our Data Analysis and the Data Science teams. Therefore a lot of people need to write ClickHouse queries. If ClickHouse is blazing fast, it can also be slow when badly used. In this article we will show you some tools that help us understand why a query is slow.
This article is a bit advanced and is meant for people already familiar with ClickHouse. Although some parts are linked you can read each of them independently.
Setup
Note that this article was written with ClickHouse v21.9.2 in mind, but most features have been available since v20. The common thread of all the tools we will see is the analysis of the following query:
This query contains a subquery and a few filters. Both tables contains 30 million rows.
If you want to reproduce what is done in the article, here are the commands that create the data needed for the query:
This will create 2 tables end each contains 30 million rows.
Queries can be slow for a lot of external reasons (network congestion, limited disk space, high CPU usage, etc.). To be able to reliably assert that a query is slow, you should try to run it several times. Moreover, you should do this test on an isolated system so that another workload can’t bias the results.
Benchmarking queries
ClickHouse provides clickhouse-benchmark
, a utility to run a query multiple times and get some statistics.
It allows you to:
- Run one (or many) query multiple times and gets percentiles of the speed
- Run queries concurrently
- Put restriction on queries (for example: max 1G of memory used, or only one thread processing the queries)
For more information, look at the official ClickHouse documentation.
In this example we will use clickhouse-benchmark
to play the slow query 20 times and get the median time.
We saw that the median time of the query is 2.9 seconds and the speed is quite stable.
Query plan
Our previous query took 2.9 seconds to process 30 millions rows on 4 threads. We consider that slow. Why is that?
We first need to understand how the query is processed by ClickHouse. This is where the EXPLAIN
(or EXPLAIN PLAN
) keyword becomes handy. With this keyword, you can ask ClickHouse to generate a query plan that describes all the logical operators that need to be applied on the data to process the query.
Let’s ask ClickHouse to generate this plan:
And here comes the answer from ClickHouse:
This plan allows us to understand how ClickHouse processes the query. With experience you can guess what step of the plan is linked to what SQL operator. But if you don’t want to guess, you can ask ClickHouse to show a much more verbose plan with the following command:
actions = 1
lets you see the corresponding actions (and therefore SQL expressions) done by each logical operatorindexes = 1
lets you see the number of block of data filtered by aWHERE
clause on the primary key. In our case nothing is filtered (granule = 3663/3663 & part = 8/8) since we didn’t use the primary key.
This detailed view allows us to see that our subquery requires to create a Set
(the CreatingSet (Create set for subquery)
part). This Set contains the result
of our SubQuery
operations. In our case this is all the rand_val
that matches
the filter condition (the Filter in the WHERE
part).
The creation of a Set is an expensive operation in most programming language. A Set in ClickHouse is actually a HashSet which means that when we ask it to create 30 million elements, it has to compute hashes. This is important for later.
Execution plan
The query plan gave us an idea of the main logical operators in the query. Let’s see how the query would be executed by ClickHouse using the EXPLAIN PIPELINE
.
This plan shows the real operators (called processors and located in the src/Processors
folder in ClickHouse codebase) that will manage the query such as ExpressionTransform
, Resize
, or AggregatingTransform
.
Moreover, the plan showed us the number of threads for each operation. We could see the subquery (defined by CreatingSet
) starts by reading data on disk and filtering them using 4 threads. But the creation of the Set (CreatingSetsTransform
) is done by only 1 thread. This is a major bottleneck as the creation of a Set is slow by nature.
It is then likely that the reason of the slowness is the Set creation needed for the subquery as we suspected.
Temporary verbose log
At this stage, the suspected cause of our slowness is the Set (needed for the subquery). Now we needed to confirm it, by looking at the logs.
We ran the query and asked ClickHouse to be as verbose as possible.
Once connected with our clickhouse-client
, we sent the following command to enable tracing (note that this might work using a JDBC client).
We re-ran our query and checked the logs:
This is what we learned by carefully reading the logs:
- time taken to load, decompress and deserialize the data from disk
- how many threads were used for a specific operation
- how much data was pruned due to a filter on the primary key
Unfortunately, the logs wouldn’t let us know how long the filter operations took (like SELECT ... WHERE field1='A' or field2='B'
) since they’re included in the aggregation operation.
Nevertheless, those logs showed that our query took 3 seconds. Almost 80% of the time (2.3 seconds) was spent for the Set creation! We now had proof of the bottleneck.
Query log & trace log
Instead of manually enabling debug logs while the cluster is running, we could have used standard ClickHouse core tables that contain this information.
Here are the 2 most useful tables to understand what is happening in ClickHouse.
Query log
Any default ClickHouse installation have a system.query_log
table enabled by default. Whenever a query starts or finishes, it is logged into that table along with its metadata (including its duration).
If you know approximately when you did your query, you can find it by filtering on the fields event_time
and query
.
For example:
The interesting fields of the query_log
table are:
query_duration_ms
: our best indicator for slow queriesread_rows
,read_bytes
,result_bytes
,memory_usage
: to get an idea of the amount of dataquery
: the raw text of the query that we can use to filter onSettings.Names
&Settings.Values
: if you want to reproduce the query with the settings used during the slow query (for example if the query was configured with themax_threads
settings to use only 1 core instead of half the server CPUs)is_initial_query
: when using a cluster, this boolean differentiates queries sent by a client from queries sent by another server of the cluster.query_id
: if you know exactly the query you’re looking for because you run it manually withclickhouse-client
and got the id
This table will help you check your query history and see if similar queries were always slow or if it was a temporary glitch on a specific one.
Trace log
The trace_log
table is another valuable source of information. It let us profile time spent in the actual C++ functions for any given query.
While it requires a deeper understanding of ClickHouse source code, it does provide invaluable insights while benchmarking or testing new ClickHouse features.
Note that it is not enabled by default, as it has an impact on the overall performance.
To activate it, you need to put the following XML block in the settings.xml
of ClickHouse (the path is /etc/clickhouse-server/config.xml
by default).
You also need to install the clickhouse-common-static-dbg
package in order to demangle the symbols in this table. Without it, you’ll only see a bunch of integers instead of function call stacks.
Note that if you are not using Ubuntu, you can download and install the package from GitHub.
The interesting fields of the trace_log
table are:
timestamp_ns
: the timestamp in nanoseconds at the moment the sample was takentrace_type
: the metric the sample represents out of 4 types: CPU, Real, Memory, MemorySampletrace
: the stack trace at the moment the sample was takenquery_id
: the id of the query sampled
Once the trace_log
is activated:
- Connect to ClickHouse with
clickhouse-client
- Set
query_profiler_real_time_period_ns
andquery_profiler_cpu_time_period_ns
at10000000
to ask ClickHouse to trace the C++ functions that are running. It will be done every 10 milliseconds. - Run the query you want to monitor
The result from clickhouse-client
provides you with the query id:
Note: You see that this query that took 2.9 seconds in the previous parts now takes 4.7 seconds because of the overhead of the trace_log
table.
Then:
- run another connection to ClickHouse (don’t use the previous connection in order to avoid adding noise into the
trace_log
) - allow ClickHouse to demangle
- query the
trace_log
table to see the 2 functions that took most of the time - specify the type of trace you’re looking for (‘Real’ = 0, ‘CPU’ = 1, ‘Memory’ = 2, ‘MemorySample’ = 3)
- specify the
query_id
you’re looking for
In this example, we’re looking for the CPU trace. If a C++ function has 8 CPU traces it means more or less that ClickHouse spent 8 * 10 milliseconds on it.
The results are a bit verbose because the stack calls are very deep. Here is a summary of what we found:
We can see that:
- 137 traces (so 137*10 milliseconds spent) are from the
Set::insertFromBlockImplCase
which, as per the source code, corresponds to the creation of the Set - 195 traces (so 195*10 milliseconds spent) are from the
Set::executeImplCase
which is also linked to the Set Overall, 3.32 seconds were spent on the Set out of the 4.7 seconds of the query. The Set represents 70% of the query duration.
Flamegraph
Reading such a trace_log
is not easy. We found that a flamegraph was a much better visualisation. We used clickhouse-flamegraph
for that job.
To install it, run:
By default clickhouse-framegraph
reads all the data in the trace_log
table and prints a flamegraph for each query and each type of metric (CPU, Real, Memory, MemorySample).
The tool accepts filters, allowing us to narrow our window to a specific time frame, keeping only CPU traces for our specific query.
For example, here we’re printing a flamegraph of CPU usage for the query done in the trace_log
part:
And here is the result:
This provided a much more visual way to see that ClickHouse was spending most of its time in DB::SET::insertFromBlockImpl
and DB::Set::executeImpl
.
The flamegraph is interactive: click on a specific block to zoom in the execution stack.
To sum up
We’ve seen how to identify a slow query and understand why it is so slow. These tools are specific to ClickHouse but you should also look for standard sysadmin tools like sar to look for CPU, memory and disk usage. Even if it doesn’t give you a solution to speed up the query you can spot the root cause which is a good start. If you want to continue on this topic, you can look at this presentation made by Altinity.