May 16, 2018

CI Profiling Using collectl

Continuous integration of software is a best practice and helps to enable rapid delivery of software. The faster the CI runs, the better. How can CI be effectively profiled? The collectl package is a good utility for recording system performance on Linux/Unix based systems for every CI run. It is simple to use and has little overhead while collecting data. This post will show you how to use it in your CI builds.

Installation

Your package manager may have the collectl package available. Debian Stretch is one Linux distribution that has it.

If you are using Docker, you can try your base image package manager or use a multi-stage build to get it from the pdouble16/collectl-utils image.

FROM pdouble16/collectl-utils:latest
FROM debian:stretch-slim
COPY --from=0 /usr/bin/collectl /usr/bin/collectl

Collecting Data

Collecting data in CI consists of running the collectl process in the background and saving the data into one or more files. The files will be processed later by collectl or plotted using colplot. You’ll want to include the data files as part of the artifacts saved by your CI tool.

At the beginning of your CI commands, run something similar to the following:

mkdir collectl
nohup collectl --ALL -f collectl/ci-collectl &

collectl will run in the background collecting data from all systems that it knows about. This may be overkill, but the overhead is small and you never know what data you may need. The -f option doesn’t specify an example file for data, but a prefix. The host name and timestamp will be appended to the filename. The files will look like:

  • collectl/ci-collectl-b147b761927e-20180217-003928.raw
  • collectl/ci-collectl-b5595de7e57a-20180425-192552.raw

At the end of the CI process, stop collecting by sending the collectl process a SIGTERM signal:

pkill collectl
sleep 3s

The sleep command may be necessary to allow collectl to finish writing data before exiting. Ensure your CI tool archives the directory where the data files are generated.

AWS CodeBuild Example

Here is an example using AWS CodeBuild. It is assumed that the Docker image used for the build has collectl installed. Otherwise commands may be added to the install section to install it.

version: 0.2
phases:
  install:
    commands:
      - mkdir -p collectl
      - nohup collectl --ALL -f collectl/ci-collectl &
  build:
    commands:
      - [build commands]
  post_build:
    commands:
      - pkill collectl ; sleep 3s
artifacts:
  files:
    - collectl/**/*

After the build is complete, you can grab the collectl files from your artifact S3 bucket. Other CI systems can be configured similarly.

Analysis

collectl has support for text based output and plot files. Plot files can be read by several visualization packages to produce graphs. One of them, specific to collectl, is colplot.

The number of output options for collectl is large. This post will only touch on the starting points to get useful information. There are several help options in collectl: collectl -h, collectl -x, collectl -X, and more.

In these examples we assume a data file is present in the current directory inside a directory named input. This is important because we need to mount the directory as a volume in the Docker container so collectl can read it.

Text Based

The collectl command can playback any of the files it has generated with lots of options for selecting the kind of data, the time periods, etc.

Top Output

A simulation of the top command can be produced, including sorting by various metrics. Use collectl --showtopopts to see all of the options and sort fields.

# docker run -v $(pwd)/input:/input pdouble16/collectl-utils:latest -p /input/ci-collectl-b5595de7e57a-20180425-192552.raw --top
 
### RECORD    1 >>> b5595de7e57a <<< (1524684413.001) (Wed Apr 25 19:26:53 2018) ###
 
# TOP PROCESSES sorted by time (counters are /sec) 19:26:53
# PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
  241  root     20   203   63 S    3G  635M  1  1.88 53.91  92  00:55.79  272  429    0 2747 /usr/lib/jvm/java-8-openjdk-amd64/bin/java
  319  root     20   241   28 S    3G  541M  1  0.82 39.09  66  00:39.91  118   75    0 2357 /usr/lib/jvm/java-8-openjdk-amd64/bin/java
  336  root     20   241   15 S    3G  196M  1  0.24  6.16  10  00:06.40    0    2    0  912 /usr/lib/jvm/java-8-openjdk-amd64/bin/java
  203  root     20   201   14 S    3G   63M  1  0.10  1.32   2  00:01.42  397    1    2  208 /docker-java-home/bin/java
   21  root     20     1    0 R   65M   26M  0  0.03  0.26   0  00:00.56    0    0    0    1 /usr/bin/perl
   39  root     20    24    7 S  270M   13M  0  0.05  0.10   0  00:00.16    0    0    0    4 docker-containerd
   29  root     20    28    5 S   39M   25M  0  0.01  0.11   0  00:00.12  411    0    4   37 docker
   24  root     20     1    8 S  250M   40M  1  0.01  0.06   0  00:00.19    3    0    0   10 /usr/local/bin/dockerd
    5  root     20     1    7 S   50M   46M  0  0.01  0.00   0  00:23.61  412    0    0    0 /codebuild/readonly/bin/executor
    1  root     20     0    0 S    4M  652K  0  0.00  0.00   0  00:00.00    0    0    0    0 sh
 
### RECORD    2 >>> b5595de7e57a <<< (1524684473.001) (Wed Apr 25 19:27:53 2018) ###
 
# TOP PROCESSES sorted by time (counters are /sec) 19:27:53
# PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
  241  root     20   203   52 S    3G  713M  1  0.99 34.09  58  01:30.87  281 3517    0  341 /usr/lib/jvm/java-8-openjdk-amd64/bin/java
   24  root     20     1   10 S  349M   59M  1  1.14  5.54  11  00:06.87   38 3020    0   90 /usr/local/bin/dockerd
   21  root     20     1    0 R   65M   26M  0  0.03  0.28   0  00:00.88    0    1    0    1 /usr/bin/perl
   39  root     20    24    9 S  351M   15M  0  0.01  0.16   0  00:00.33   88   20    0    7 docker-containerd
  203  root     20   201   13 S    3G   63M  1  0.00  0.10   0  00:01.52    0    1    0    1 /docker-java-home/bin/java
  336  root     20   241   15 S    3G  196M  1  0.00  0.03   0  00:06.43    0    1    0    0 /usr/lib/jvm/java-8-openjdk-amd64/bin/java
    1  root     20     0    0 S    4M  652K  0  0.00  0.00   0  00:00.00    0    0    0    0 sh
    5  root     20     1    7 S   50M   46M  0  0.00  0.00   0  00:23.61    0    0    0    0 /codebuild/readonly/bin/executor
  201  root     20     5    0 S    4M  724K  1  0.00  0.00   0  00:00.00    0    0    0    0 /bin/sh

Subsystem Metrics

Metrics from CPU, disk, network, etc. can be individually selected by specifying the approach subsystem.

# docker run pdouble16/collectl-utils:latest --showsubsys
 
These generate summary, which is the total of ALL data for a particular type
  b - buddy info (memory fragmentation)
  c - cpu
  d - disk
  f - nfs
  i - inodes
  j - interrupts by CPU
  m - memory
  n - network
  s - sockets
  t - tcp
  x - interconnect (currently supported: OFED/Infiniband)
  y - slabs
 
These generate detail data, typically but not limited to the device level
 
  C -  individual CPUs, including interrupts if -sj or -sJ
  D -  individual Disks
  E -  environmental (fan, power, temp) [requires ipmitool]
  F -  nfs data
  J -  interrupts by CPU by interrupt number
  M -  memory numa/node
  N -  individual Networks
  T -  tcp details (lots of data!)
  X -  interconnect ports/rails (Infiniband/Quadrics)
  Y -  slabs/slubs
  Z -  processes
 
An alternative format lets you add and/or subtract subsystems to the defaults by
immediately following -s with a + and/or -
  eg: -s+YZ-x adds slabs & processes and removes interconnet summary data
      -s-n removes network summary data
      -s-all removes ALL subsystems, something that can handy when playing back
             data collected with --import and you ONLY want to see that data

Here is c – CPU:

# docker run -v $(pwd)/input:/input pdouble16/collectl-utils:latest -p /input/ci-collectl-b5595de7e57a-20180425-192552.raw --subsys c
 
#<----CPU[HYPER]----->
#cpu sys inter  ctxsw
  82  11  4058   9527
  67   4  2072   2561
  32   4  3853   4983
  13   5  3077   4252

Memory m:

# docker run -v $(pwd)/input:/input pdouble16/collectl-utils:latest -p /input/ci-collectl-b5595de7e57a-20180425-192552.raw --subsys m
 
#<-----------Memory----------->
#Free Buff Cach Inac Slab  Map
 126M 111M   1G 888M 177M   2G
 899M 111M   1G 875M 177M   1G
 905M 111M   1G 875M 178M   1G
 900M 113M   1G 876M 180M   1G
 899M 113M   1G 875M 180M   1G
 898M 113M   1G 875M 181M   1G

The subsystems can be combined. This shows CPU, Memory, Disk and Network together:

# docker run -v $(pwd)/input:/input pdouble16/collectl-utils:latest -p /input/ci-collectl-b5595de7e57a-20180425-192552.raw --subsys cmdn
 
#<----CPU[HYPER]-----><-----------Memory-----------><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
  82  11  4058   9527 126M 111M   1G 888M 177M   2G      4      1   3956    127     41     29      2      30
  67   4  2072   2561 899M 111M   1G 875M 177M   1G      0      0    208     16    735    333     12     189
  32   4  3853   4983 905M 111M   1G 875M 178M   1G      0      0    392     40    845    285     11     158
  13   5  3077   4252 900M 113M   1G 876M 180M   1G      0      0   5020     92      0      0      0       0
  10   2  2000   2958 899M 113M   1G 875M 180M   1G      0      0   3300     60      0      0      0       0
   7   2  2250   3253 898M 113M   1G 875M 181M   1G      0      0   3324     61      0      0      0       0
  44  10  3368   3792 874M 114M   1G 872M 181M   1G   1448     66   2436     48      0      0      0       0
  68  18  4906   9175 879M 116M   1G 857M 184M   1G   2624     42   3632    153      0      0      0       0
  61  14  3145   4551 852M 117M   1G 857M 185M   1G      0      0   2780     56      0      0      0       0
  59  14  3620   4941 863M 117M   1G 857M 185M   1G      0      0   3320     60      0      0      0       0
  82  21  4964  16063 769M 117M   1G 938M 187M   1G      0      0   2920     42      0      0      0       0
  58   5  3972   3893 875M 117M   1G 857M 184M   1G      0      0    260     11    115    556   3705     648
  63   4  9565   6729 875M 117M   1G 857M 184M   1G      0      0    820     52    194   2277  20274    2567
  42   3  6730   5186 875M 117M   1G 857M 184M   1G      0      0      0      0     94   1463  14313    1708
   0   0   876   1489 875M 117M   1G 857M 184M   1G      0      0      0      0      0      0      0       0
   1   0  1054   1794 875M 117M   1G 857M 184M   1G      0      0      0      0      0      0      0       0
   0   0   948   1595 875M 117M   1G 857M 184M   1G      0      0      0      0      0      0      0       2
   0   0   819   1449 875M 117M   1G 857M 184M   1G      0      0      0      0      0      0      0       0
   3   1  1814   2411 874M 117M   1G 856M 184M   1G    104     17    140      8     40     83     37      95
  44   3  2883   3419 873M 117M   1G 854M 184M   1G      0      0     52      3     75    247    143     226

Graphical Output

The -P option outputs data as a plot file, which is a text file with each sample on a line, separated by a space. The file can be imported into a spreadsheet and graphed. We’re going to look at the colplot tool to do this for us.

The pdouble16/collectl-utils Docker image has both the collectl and colplot programs installed. If the container is started without arguments, there is a web interface for colplot running on container port 80. Issue the following command and point your browser to http://localhost:8080/colplot/.

$ mkdir -p input
$ docker run -d --rm -p 8080:80 -v $(pwd)/input:/input pdouble16/collectl-utils

The user interface allows you to choose the types of graphs and level of detail. The UI is designed primarly for continuous system monitoring, so you’ll need to set the time range to only include your build time.

The output below is summary charts for CPU, Disk, Memory, Network and Swap.

Conclusion

Performance of builds can sometimes be difficult to diagnose. collectl records metrics at the system-level that can always be collected and analyzed later when necessary. It is easy to use and lightweight. I hope this improves your continuous delivery efforts!

About the Author

Patrick Double profile.

Patrick Double

Principal Technologist

I have been coding since 6th grade, circa 1986, professionally (i.e. college graduate) since 1998 when I graduated from the University of Nebraska-Lincoln. Most of my career has been in web applications using JEE. I work the entire stack from user interface to database.   I especially like solving application security and high availability problems.

Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]