CI Profiling Using collectl
Continuous integration of software is a best practice and helps to enable rapid delivery of software. The faster the CI runs, the better. How can CI be effectively profiled? The collectl
package is a good utility for recording system performance on Linux/Unix based systems for every CI run. It is simple to use and has little overhead while collecting data. This post will show you how to use it in your CI builds.
Installation
Your package manager may have the collectl
package available. Debian Stretch is one Linux distribution that has it.
If you are using Docker, you can try your base image package manager or use a multi-stage build to get it from the pdouble16/collectl-utils
image.
FROM pdouble16/collectl-utils:latest FROM debian:stretch-slim COPY --from=0 /usr/bin/collectl /usr/bin/collectl
Collecting Data
Collecting data in CI consists of running the collectl
process in the background and saving the data into one or more files. The files will be processed later by collectl
or plotted using colplot
. You’ll want to include the data files as part of the artifacts saved by your CI tool.
At the beginning of your CI commands, run something similar to the following:
mkdir collectl nohup collectl --ALL -f collectl/ci-collectl &
collectl
will run in the background collecting data from all systems that it knows about. This may be overkill, but the overhead is small and you never know what data you may need. The -f
option doesn’t specify an example file for data, but a prefix. The host name and timestamp will be appended to the filename. The files will look like:
- collectl/ci-collectl-b147b761927e-20180217-003928.raw
- collectl/ci-collectl-b5595de7e57a-20180425-192552.raw
At the end of the CI process, stop collecting by sending the collectl
process a SIGTERM
signal:
pkill collectl sleep 3s
The sleep command may be necessary to allow collectl
to finish writing data before exiting. Ensure your CI tool archives the directory where the data files are generated.
AWS CodeBuild Example
Here is an example using AWS CodeBuild. It is assumed that the Docker image used for the build has collectl
installed. Otherwise commands may be added to the install
section to install it.
version: 0.2 phases: install: commands: - mkdir -p collectl - nohup collectl --ALL -f collectl/ci-collectl & build: commands: - [build commands] post_build: commands: - pkill collectl ; sleep 3s artifacts: files: - collectl/**/*
After the build is complete, you can grab the collectl files from your artifact S3 bucket. Other CI systems can be configured similarly.
Analysis
collectl
has support for text based output and plot files. Plot files can be read by several visualization packages to produce graphs. One of them, specific to collectl
, is colplot
.
The number of output options for collectl
is large. This post will only touch on the starting points to get useful information. There are several help options in collectl
: collectl -h
, collectl -x
, collectl -X
, and more.
In these examples we assume a data file is present in the current directory inside a directory named input
. This is important because we need to mount the directory as a volume in the Docker container so collectl
can read it.
Text Based
The collectl
command can playback any of the files it has generated with lots of options for selecting the kind of data, the time periods, etc.
Top Output
A simulation of the top
command can be produced, including sorting by various metrics. Use collectl --showtopopts
to see all of the options and sort fields.
# docker run -v $(pwd)/input:/input pdouble16/collectl-utils:latest -p /input/ci-collectl-b5595de7e57a-20180425-192552.raw --top ### RECORD 1 >>> b5595de7e57a <<< (1524684413.001) (Wed Apr 25 19:26:53 2018) ### # TOP PROCESSES sorted by time (counters are /sec) 19:26:53 # PID User PR PPID THRD S VSZ RSS CP SysT UsrT Pct AccuTime RKB WKB MajF MinF Command 241 root 20 203 63 S 3G 635M 1 1.88 53.91 92 00:55.79 272 429 0 2747 /usr/lib/jvm/java-8-openjdk-amd64/bin/java 319 root 20 241 28 S 3G 541M 1 0.82 39.09 66 00:39.91 118 75 0 2357 /usr/lib/jvm/java-8-openjdk-amd64/bin/java 336 root 20 241 15 S 3G 196M 1 0.24 6.16 10 00:06.40 0 2 0 912 /usr/lib/jvm/java-8-openjdk-amd64/bin/java 203 root 20 201 14 S 3G 63M 1 0.10 1.32 2 00:01.42 397 1 2 208 /docker-java-home/bin/java 21 root 20 1 0 R 65M 26M 0 0.03 0.26 0 00:00.56 0 0 0 1 /usr/bin/perl 39 root 20 24 7 S 270M 13M 0 0.05 0.10 0 00:00.16 0 0 0 4 docker-containerd 29 root 20 28 5 S 39M 25M 0 0.01 0.11 0 00:00.12 411 0 4 37 docker 24 root 20 1 8 S 250M 40M 1 0.01 0.06 0 00:00.19 3 0 0 10 /usr/local/bin/dockerd 5 root 20 1 7 S 50M 46M 0 0.01 0.00 0 00:23.61 412 0 0 0 /codebuild/readonly/bin/executor 1 root 20 0 0 S 4M 652K 0 0.00 0.00 0 00:00.00 0 0 0 0 sh ### RECORD 2 >>> b5595de7e57a <<< (1524684473.001) (Wed Apr 25 19:27:53 2018) ### # TOP PROCESSES sorted by time (counters are /sec) 19:27:53 # PID User PR PPID THRD S VSZ RSS CP SysT UsrT Pct AccuTime RKB WKB MajF MinF Command 241 root 20 203 52 S 3G 713M 1 0.99 34.09 58 01:30.87 281 3517 0 341 /usr/lib/jvm/java-8-openjdk-amd64/bin/java 24 root 20 1 10 S 349M 59M 1 1.14 5.54 11 00:06.87 38 3020 0 90 /usr/local/bin/dockerd 21 root 20 1 0 R 65M 26M 0 0.03 0.28 0 00:00.88 0 1 0 1 /usr/bin/perl 39 root 20 24 9 S 351M 15M 0 0.01 0.16 0 00:00.33 88 20 0 7 docker-containerd 203 root 20 201 13 S 3G 63M 1 0.00 0.10 0 00:01.52 0 1 0 1 /docker-java-home/bin/java 336 root 20 241 15 S 3G 196M 1 0.00 0.03 0 00:06.43 0 1 0 0 /usr/lib/jvm/java-8-openjdk-amd64/bin/java 1 root 20 0 0 S 4M 652K 0 0.00 0.00 0 00:00.00 0 0 0 0 sh 5 root 20 1 7 S 50M 46M 0 0.00 0.00 0 00:23.61 0 0 0 0 /codebuild/readonly/bin/executor 201 root 20 5 0 S 4M 724K 1 0.00 0.00 0 00:00.00 0 0 0 0 /bin/sh
Subsystem Metrics
Metrics from CPU, disk, network, etc. can be individually selected by specifying the approach subsystem.
# docker run pdouble16/collectl-utils:latest --showsubsys These generate summary, which is the total of ALL data for a particular type b - buddy info (memory fragmentation) c - cpu d - disk f - nfs i - inodes j - interrupts by CPU m - memory n - network s - sockets t - tcp x - interconnect (currently supported: OFED/Infiniband) y - slabs These generate detail data, typically but not limited to the device level C - individual CPUs, including interrupts if -sj or -sJ D - individual Disks E - environmental (fan, power, temp) [requires ipmitool] F - nfs data J - interrupts by CPU by interrupt number M - memory numa/node N - individual Networks T - tcp details (lots of data!) X - interconnect ports/rails (Infiniband/Quadrics) Y - slabs/slubs Z - processes An alternative format lets you add and/or subtract subsystems to the defaults by immediately following -s with a + and/or - eg: -s+YZ-x adds slabs & processes and removes interconnet summary data -s-n removes network summary data -s-all removes ALL subsystems, something that can handy when playing back data collected with --import and you ONLY want to see that data
Here is c
– CPU:
# docker run -v $(pwd)/input:/input pdouble16/collectl-utils:latest -p /input/ci-collectl-b5595de7e57a-20180425-192552.raw --subsys c #<----CPU[HYPER]-----> #cpu sys inter ctxsw 82 11 4058 9527 67 4 2072 2561 32 4 3853 4983 13 5 3077 4252
Memory m
:
# docker run -v $(pwd)/input:/input pdouble16/collectl-utils:latest -p /input/ci-collectl-b5595de7e57a-20180425-192552.raw --subsys m #<-----------Memory-----------> #Free Buff Cach Inac Slab Map 126M 111M 1G 888M 177M 2G 899M 111M 1G 875M 177M 1G 905M 111M 1G 875M 178M 1G 900M 113M 1G 876M 180M 1G 899M 113M 1G 875M 180M 1G 898M 113M 1G 875M 181M 1G
The subsystems can be combined. This shows CPU, Memory, Disk and Network together:
# docker run -v $(pwd)/input:/input pdouble16/collectl-utils:latest -p /input/ci-collectl-b5595de7e57a-20180425-192552.raw --subsys cmdn #<----CPU[HYPER]-----><-----------Memory-----------><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 82 11 4058 9527 126M 111M 1G 888M 177M 2G 4 1 3956 127 41 29 2 30 67 4 2072 2561 899M 111M 1G 875M 177M 1G 0 0 208 16 735 333 12 189 32 4 3853 4983 905M 111M 1G 875M 178M 1G 0 0 392 40 845 285 11 158 13 5 3077 4252 900M 113M 1G 876M 180M 1G 0 0 5020 92 0 0 0 0 10 2 2000 2958 899M 113M 1G 875M 180M 1G 0 0 3300 60 0 0 0 0 7 2 2250 3253 898M 113M 1G 875M 181M 1G 0 0 3324 61 0 0 0 0 44 10 3368 3792 874M 114M 1G 872M 181M 1G 1448 66 2436 48 0 0 0 0 68 18 4906 9175 879M 116M 1G 857M 184M 1G 2624 42 3632 153 0 0 0 0 61 14 3145 4551 852M 117M 1G 857M 185M 1G 0 0 2780 56 0 0 0 0 59 14 3620 4941 863M 117M 1G 857M 185M 1G 0 0 3320 60 0 0 0 0 82 21 4964 16063 769M 117M 1G 938M 187M 1G 0 0 2920 42 0 0 0 0 58 5 3972 3893 875M 117M 1G 857M 184M 1G 0 0 260 11 115 556 3705 648 63 4 9565 6729 875M 117M 1G 857M 184M 1G 0 0 820 52 194 2277 20274 2567 42 3 6730 5186 875M 117M 1G 857M 184M 1G 0 0 0 0 94 1463 14313 1708 0 0 876 1489 875M 117M 1G 857M 184M 1G 0 0 0 0 0 0 0 0 1 0 1054 1794 875M 117M 1G 857M 184M 1G 0 0 0 0 0 0 0 0 0 0 948 1595 875M 117M 1G 857M 184M 1G 0 0 0 0 0 0 0 2 0 0 819 1449 875M 117M 1G 857M 184M 1G 0 0 0 0 0 0 0 0 3 1 1814 2411 874M 117M 1G 856M 184M 1G 104 17 140 8 40 83 37 95 44 3 2883 3419 873M 117M 1G 854M 184M 1G 0 0 52 3 75 247 143 226
Graphical Output
The -P
option outputs data as a plot file, which is a text file with each sample on a line, separated by a space. The file can be imported into a spreadsheet and graphed. We’re going to look at the colplot
tool to do this for us.
The pdouble16/collectl-utils
Docker image has both the collectl
and colplot
programs installed. If the container is started without arguments, there is a web interface for colplot
running on container port 80. Issue the following command and point your browser to http://localhost:8080/colplot/.
$ mkdir -p input $ docker run -d --rm -p 8080:80 -v $(pwd)/input:/input pdouble16/collectl-utils
The user interface allows you to choose the types of graphs and level of detail. The UI is designed primarly for continuous system monitoring, so you’ll need to set the time range to only include your build time.
The output below is summary charts for CPU, Disk, Memory, Network and Swap.
Conclusion
Performance of builds can sometimes be difficult to diagnose. collectl
records metrics at the system-level that can always be collected and analyzed later when necessary. It is easy to use and lightweight. I hope this improves your continuous delivery efforts!