May 4, 2016

Graylog with Spring Boot: Open source log and event analysis

One of the most difficult things to do in any application is find a way to make use of the mountains of logs that are generated, especially in a Production environment. Over the years a number of tools have emerged to tackle this problem. Designed as massive data stores and analysis engines, tools like Splunk allow you to collate and process logs from a vast array of sources and manipulate the data they contain into usable alerts, dashboards, and automated actions.

I’m a huge fan of Splunk and have seen it used to great effect in large-scale deployments. It’s user friendly and extremely powerful, but has a huge downside: price. This makes it unattractive for small projects and impractical for personal/hobby development. But all is not lost – Graylog is an open-source, free-to-use alternative that has has many of the top features supplied by Splunk. In this article I’ll talk about how I familiarized myself with the basics of Graylog’s recently released version 2 and show off an example SpringBoot app that uses it.

What is it?

The simplest way to describe Graylog is as an event archive and analysis toolkit. The basic idea is that you publish events that happen inside your application and Graylog stores them into a MongoDB instance. Later, these events can be searched via an ElasticSearch cluster and then combined to generate trends, charts, tables, and other views of that data.

How does data get to it?

A lot of different mechanisms exist. In the example application I show off HTTP and UDP endpoints that accept GELF (Graylog Extended Log Format) data, but you can also configure ways to send plaintext or even hook up your OS’s syslog. The real magic comes in how you can interface those endpoints into your application. One of the most powerful is to hook into your existing logging solution which gives it access to all significant events that you’re already capturing (most notably errors) and then dispatch them in the background to Graylog; this approach allows you to transparently add Graylog to your application without tightly coupling your code to it.

How do I use it once data is there?

At it’s core Graylog is a facade over an ElasticSearch cluster backed by MongoDB. Dynamic queries can be written to find any permutation of event data. From a user interface perspective, most of the focus is on Dashboards. A Dashboard is a collection of widgets, each of which is a chart/table/trend backed by a query. By crafting queries and widgets you can keep track of ongoing events in your application, keep an eye on performance and system load, monitor suspicious events, or track down details on an error one of your users encountered.

You mentioned an example app?

Out on GitHub there is a SpringBoot application that fires up a (as much as possible) self-contained Graylog instance, configures it, and publishes events to it. I say ‘as much as possible’ for the following reasons:

  1. There is no pure-Java MongoDB: I’ve configured an embedded Mongo provider to download binaries at runtime in a way that doesn’t pollute your computer.
  2. Graylog is intended to run as a standalone process – it ships with a control script to launch it within its own JVM instance. I’ve created the necessary Spring configs and beans to get it to launch within the SpringBoot JVM but this comes with its own limitations (namely memory and classpath separation) which may make this configuration impractical for other setups.

 
When the application starts it launches an ElasticSearch node, downloads and launches MongoDB, and initializes the Graylog instance. Once those are all up and running it makes a number of ReST calls to the Graylog service interface to establish a basic configuration and a customized dashboard.

Maven

Normally I prefer to use Gradle but in this case I needed to use the perfectly-capable Maven to handle dependencies. Some of Graylog’s dependencies have malformed POM files which Gradle does not handle as gracefully as Maven.

The Maven build, other than handling the download of dependencies, also takes care of some minor setup by extracting a ZIP archive containing some native libraries needed by Graylog – more on these a bit later.

Spring Boot

The only major tweak I had to make related to SpringBoot was to use a Milestone version of the as-yet-unreleased version 1.4.0. This was primarily to get the benefit of it upgrading to a newer version of ElasticSearch that is compatible with Graylog. Other than that all of the Spring code should be pretty familiar to most Java developers.

One item that may be of interest is that I’ve set up a Swagger Rest API for my application (in addition to the one Graylog supplies) that allows you to view and exercise the very simple Rest service the app provides. It uses a library called SpringFox which I highly recommend taking a look at if you develop Rest services in Spring.

MongoDB

I’ve chosen to use an excellent embedded MongoDB library and configure it using a Singleton Spring bean. Most of this configuration isn’t strictly necessary but helps in keeping all the files that get downloaded in a single place so they get deleted once you’re done with the example.

    // Download Mongo artifacts into the project directory to ease cleanup
    IDownloadConfig downloadConfig = new DownloadConfigBuilder()
            .defaultsForCommand(Command.MongoD)
            .artifactStorePath(ARTIFACT_STORE_PATH)
            .build();
 
    // Extract Mongo artifacts into the project directory to ease cleanup
    IRuntimeConfig runtimeConfig = new RuntimeConfigBuilder()
            .defaults(Command.MongoD)
            .artifactStore(new ExtractedArtifactStoreBuilder()
                    .defaults(Command.MongoD)
                    .download(downloadConfig)
                    .extractDir(EXTRACTED_STORE_PATH)
            )
            .build();
 
    // Store Mongo data into the project directory to ease cleanup
    Storage replication = new Storage("./data/mongodb/data", null, 0);
 
    MongodStarter starter = MongodStarter.getInstance(runtimeConfig);
 
    IMongodConfig mongodConfig = new MongodConfigBuilder()
            .version(Version.Main.PRODUCTION)
            .cmdOptions(new MongoCmdOptionsBuilder()
                    .useNoJournal(false)
                    .useSmallFiles(true)
                    .build())
            .net(new Net(MongoProperties.DEFAULT_PORT, Network.localhostIsIPv6()))
            .replication(replication)
            .build();
 
    mongo = starter.prepare(mongodConfig);
    process = mongo.start();

This starts up a Mongo instance that can be interacted with as if it were a standalone instance. When the SpringBoot app stops it stops as well.

ElasticSearch

Nothing is needed to get ElasticSearch up and running other than including the `spring-boot-starter-data-elasticsearch` dependency in the project’s build file. SpringBoot auto-detects its presence on the classpath and starts up the ElasticSearch node. A bit of configuration is necessary in the application config file to get the appropriate type of node set up so that Graylog can work with it:

spring:
  data:
    elasticsearch:
      clusterName: "graylog"
      properties:
        node:
          local: false
          data: true
        http:
          enabled: true

Graylog

This is the trickiest part – Graylog is a Java application but its API isn’t really designed to be started as part of another program. The first trick is to extract as much as possible from Graylog’s normal boot path and embed it in a Singleton bean, and the easiest way to make this work is to piggyback on Graylog’s built-in launcher that expects to be triggered from the command line.

    // Use the CommandLine launch parser which takes care of triggering all the sub-modules
    // Graylog needs to launch.
    final Cli.CliBuilder builder = Cli.builder("graylog")
            .withDefaultCommand(Help.class)
            .withCommands(Help.class, ShowVersion.class);
    ServiceLoader commandsProviders = ServiceLoader.load(CliCommandsProvider.class);
    // Load all Graylog command handlers so we can process the CLI command
    for (Object commandsProvider : commandsProviders) {
        CliCommandsProvider command = (CliCommandsProvider) commandsProvider;
        command.addTopLevelCommandsOrGroups(builder);
    }
 
    final Cli cli = builder.build();
    // Run the actual CLI command through the parser
    final Runnable command = cli.parse("server", "-f", graylogConfigFile);
    // Build the command (which is a Runnable) and start into another Thread - necessary
    // since the runnable blocks on the Graylog server process
    final Thread graylogThread = new Thread(command);
    graylogThread.start();

Next, I had to configure the SIGAR library – this is a set of native libraries that Graylog accesses via JNI so it has to be put onto the special Java library path. In case you aren’t familiar with JNI this library path is very different than the standard Java classpath. For more information on JNI take a look at my post on the subject.

    System.setProperty("java.library.path", "./sigar-libs");

The last thing I had to do was the only really tricky part. Fortunately, Graylog supplies a live Swagger specification for its ReST API, but unfortunately it’s configured to use the System classloader rather than the Thread Context classloader. This causes issues when run within a container like SpringBoot. To work around this problem I had to use some very inadvisable Reflection to swap out the System classloader with a custom one that would delegate to the Thread Context when a Swagger resource was being loaded. I’ll be submitting a PR to Graylog to hopefully resolve this issue in the near future.

    try {
        Field sclField = ReflectionUtils.findField(ClassLoader.class, "scl");
        ReflectionUtils.makeAccessible(sclField);
        ReflectionUtils.setField(sclField, null, new DecoratingClassLoader() {
            @Override
            public URL getResource(String name) {
                if (name.startsWith("swagger/")) {
                    return Thread.currentThread().getContextClassLoader().getResource(name);
                }
                return super.getResource(name);
            }
        });
    } catch(Exception e) {
        LOGGER.error("Failed to replace SystemClassLoader, this means Graylog's Swagger won't work.", e);
    }

Graylog works just fine without doing this, but for this example I wanted to make sure the documentation was functional.

Publishing data to Graylog

Now that we have Graylog configured we need to start pushing data to it. I set up two different paths: a UDP interface that publishes anything sent to a specific Logback logger, and a ReST endpoint. To use these I set up one Aspect that captures calls to a specific service and another that captures errors thrown by that service.

    @Inject
    protected ObjectMapper mapper;
 
    @Inject
    protected GraylogRestInterface graylog;
 
    @Around("execution(* com.objectpartners.plummer.graylog.service.QuoteService.*(..))")
    public QuoteResource errorThrown(ProceedingJoinPoint joinPoint) throws Throwable {
        Stopwatch timer = Stopwatch.createStarted();
 
        QuoteResource quote = (QuoteResource) joinPoint.proceed();
 
        // Log message via Graylog UDP Logback Appender
        LOGGER.info("Generated quote - {}@{}", quote.getSymbol(), quote.getPrice());
 
        // Log message via Graylog HTTP Input
        GelfMessage message = new GelfMessage();
        message.setShortMessage(String.format("Quote %s@%2.2f", quote.getSymbol(), quote.getPrice()));
        message.setFullMessage(mapper.writeValueAsString(quote));
        message.getAdditionalProperties().put("elapsed_time", timer.stop().elapsed(TimeUnit.MICROSECONDS));
        graylog.logEvent(message);
 
        return quote;
    }

Graylog in Action

So, after all this heavy lifting what is the end result?

Example Graylog dashboard

The image above is of the customized dashboard the example sets up. This is a fairly simple example but you can see four different widgets each showing off a different method of tracking and displaying data to the user.

  • Quote count: Counts the total number of Quote objects the SpringBoot app has generated
  • Symbol Breakdown: Tracks the unique stock symbols that have been requested, broken down by most common
  • Generation time: Tracks the average time it took to generate quotes over time
  • Errors: All exceptions that are logged get displayed here for easy perusal

Dashboards are built from ElasticSearch queries that can be easily constructed using the Graylog web GUI.

Every message that is sent to Graylog is stored into MongoDB and can be inspected and queried which aids debugging and troubleshooting in the future. Assuming you send the appropriate data into Graylog this means you may never have to dig through application logs again. Below you can see an example of an archived event in Graylog which was automatically parsed from JSON and put into an easy-to-read and easy-to-search format.

Graylog JSON message inspector

 

What would it take to use Graylog in my project?

For the purposes of the example I’m doing a lot that you shouldn’t do in a production application. If all you need is a prototyping setup or a ready-to-go development configuration the example can be used to get up and running quickly. Beyond that, Graylog has a ready-to-go Docker image that you can easily combine with the Mongo and ElasticSearch images for a more stable and realistic Graylog setup if that is desired. In a true production environment it would be advisable to use a standalone MongoDB instance for better performance as well as a true ElasticSearch cluster to aid distributed processing and storage.

Wrap Up

If Graylog sounds interesting to you I highly recommend taking a look. They’ve got excellent documentation and adding it to your application will definitely make your life easier in the future. If you have any questions about the example application or the article please leave them below. Otherwise, happy coding!

About the Author

Object Partners profile.

One thought on “Graylog with Spring Boot: Open source log and event analysis

  1. Mustafa says:

    Thanks for the article. I am wondering that why you did not install graylog separately and only send data to it in your spring application?

    1. Mike Plummer says:

      Hi Mustafa! You make a very good point – normally you would stand up separate Graylog, ElasticSearch, and MongoDB instances. I chose to combine them here as a very simple demonstration of Graylog in action and a prototype for a minimally-invasive way to experiment with the tool. Separating out the example app into multiple deployables would have made it very complex to run and required a lot of dev environment setup. I did experiment a little bit with using Docker to split the difference but I decided that it wasn’t viable at the time since Docker hadn’t yet gained full supported on Mac & Windows.

      Thanks for reading! Let me know if you have any other questions.

  2. dkopack says:

    Using something like docker-compose would make more sense here. It would be minimally invasive for experimentation and it would actually be an architecture you could deploy to production.

    1. Mike Plummer says:

      Excellent point. Unfortunately, at the time I wrote this post (well over a year ago) Docker & Docker Compose weren’t 100% stable on Mac & Windows, thus I had to make do with this solution.

      Adding Docker support would be really great if you think it’s finally stable across the major platforms. Feel free to submit a PR to the GitHub repo if you’d like to contribute!

  3. Koman says:

    Fantastic article and example. Thank you.

    Quick question – I cannot for the life of me work out where the logs are for the Graylog HTTP input. Any suggestions would be wonderful.

    I am busy implementing your ideas for our go-live next week. Wish me luck 🙂

    1. Mike Plummer says:

      In the example Graylog isn’t configured to generate it’s own audit logfiles. You might try taking a look at this page for info on how to enable those – Graylog is technically a separate application we’re shoving into Spring Boot for the purposes of this example so it doesn’t integrate with the Spring Boot logback capability but instead uses it’s own Log4J system. It’s quite a bit simpler if you install Graylog and use it as designed as a standalone system using a DEB or RPM package since those come with default file locations pre-configured (see here).

      Good luck with your go-live!

  4. tianfuan says:

    Hello, I want to ask you, in your GelfMessage class, attribute host, shortMessage and fullMessage do not have get and set methods. How do other classes and themselves call?

    1. Mike Plummer says:

      This project uses theLombok library which dynamically constructs bytecode for common boilerplate like getters, setters, and constructors. See the documentation for the “@Data” annotation on the class you reference.

  5. tianfuan says:

    thank you! If I wanna connect my own MongoDB database, how should I change it?

  6. tianfuan says:

    What should I do if I want to change the shape of the pie chart?

  7. Baliram Babar says:

    I am using Linux I configure mongodb,elastic search,graylog,add dependency,write logback.xml but still log data not coming to graylog .I am using spring boot with micro service.plase help me .

  8. Baliram Babar says:

    Graylog implementation

  9. Baliram Babar says:

    i install mongodb,elasticsearch,graylog into ubuntu,add dependency,add logback.xml into microsevice but still i am faceing problem of log data not sending to graylog.
    please help me how to use graylog into spring boot with microservice.

    1. Mike Plummer says:

      If you could post a link to a GitHub repo I can take a quick look

Leave a Reply to Baliram Babar Cancel reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]