Sep 26, 2017

Masking sensitive data in Log4j 2

A growing practice across many organizations is to log as much information as is feasible, to allow for better debugging and auditing. Tools like Splunk and ELK may it even easier to index the logs, treating the them almost like databases.  However, with PCI and HIPAA standards, those same organizations may want to mask much of the data to prevent unauthorized or unprotected access to sensitive data.  In this blog post I’ll detail one potential approach to masking that data, so developers do not need to worry about filtering individual log statements.

Prerequisites

You’re going to need to use Log4j 2 (potentially with SLF4J as well).  A sample pom.xml for just these dependencies would include the lines:

If you’re using a different logging framework, then I imagine this guide may not be very helpful.

Setup

I’m going to dive right in, as there are a few different files we need to create or modify to get log masking to work. The first file we will create is a pretty basic one. It’s going to hold all of our logging markers, so that we can tell Log4J to only run the masking on the log statements that need it. Masking our logs means we’re taking a performance hit, so we should not do it any more than we need to:

I’ve got two basic Markers in that class, one for JSON, and one for XML. You can define as many as you need — for different content types, data types, etc. For this tutorial we’re only going to be using the JSON marker.

Let’s continue by extending the LogEventPatternConverter:

Ok, let’s stop and analyze the important bits. The “ConverterKeys” value and the “NAME” field we pass to the LogEventPatternConverter define the pattern that we will include in our “log4j2.xml” config. It’s what we need to include to ever see our masking at work. I believe you cannot override the default “%m”, so we are defining our own custom pattern “cm”. In fact, we will call “cm” INSTEAD of the default “m” in our configuration.

Next, the constructor and the “newInstance()” methods are required for our converter to be properly invoked by Log4j. The “format()” method holds the crux of our work. You can see that it takes the formatted message, and returns it if we do not have any Markers for the current logging statement. If we DO have markers (like for example our JSON one), then and only then will we attempt to mask the message.

I’ve implemented a simple JSON regex replacement for the mask method, but there are many different approaches you can take: you can hydrate the JSON and replace the values based on name/path, you can inspect an object to see if it’s annotated with a “DoNotMask” annotation, or you can even define simple regex values to replace (e.g. credit cards, SSNs). The implementation I provide is meant as a proof-of-concept example, and is not prod-ready. Also, if you DO decide to implement multiple strategies for different markers, it makes sense to move that logic into specific classes (I have included everything in one file for simplicity).

As a simple demonstration of this class, let’s also include the tests:

At this point however, we are still not ready to use our class, as Log4j does not know to look for it. For this, we need to update the log4j2.xml file:

The key parts here are to update “Configuration packages” attribute to include the package (or parent) of your LogEventPatternConverter, and to replace, or append “cm” rather than “m” in the pattern. If your logs should be filtering but are instead prefixed by a “c”, then Log4j has not picked up your converter, and you should make sure that the names are correct, and that the package is included in the “Configuration” node!

Usage

So hopefully now we have everything hooked up so that our log statements can be masked. In order to take advantage of our converter, we need to log our statements with the appropriate Marker:

If all went well, you should now see your sensitive data being replaced with your mask. As a final note, if you are using Spring Boot, by default Log4J is configured BEFORE Spring Boot components and @Value fields, so if you put your fields-to-mask into a properties file, it may take some extra configuration to make sure Log4J picks them up.

Igor Shults

About the Author

Object Partners profile.

One thought on “Masking sensitive data in Log4j 2

  1. Manjunath says:

    Thank you so much, your blog really helped me.

  2. Namrata Dakua says:

    this is good example.

    Can you please also share the complete log4j2.xml file?

  3. Kannan says:

    Thanks Igor!

  4. Bineeth Baburaj says:

    Your explanation looks very helpful. Could you please provide one example on how to implement this?

  5. Fred says:

    Useless because not full. If you take all the files it doesnt run….
    Where is newIstance called????

Leave a Reply to Namrata Dakua Cancel reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]