Feb 23, 2009

Another Failure To Use Code Coverage Numbers Correctly

I’ve been approached recently by members of various projects about how to improve code coverage numbers. In an abuse of the numbers or a misunderstanding of their purpose, there’s mandates to keep code coverage at 80%. Other team’s have been closer to 18%. After much discussion, I’ve thought to revisit the flaw in the interpretation of the tool’s usefulness.

The executive summary is that 100% code coverage means nothing helpful.

To be sure we’re thinking of the same thing, code coverage in this case refers to a report generated by a software development tool to try to quantify the lines of “production” code that are touched by “test” code. “Production” code is, of course, the source of the software intended to be released, while “test” code is software written to automatically test the production code. In our Java world, we’re usually meaning JUnit or something like it.

The implication is that lines hit with automated tests are going to help find flaws before they’re encountered when the software is running. The idea behind the code coverage reports is that if 100% of the lines are covered, then the software must be good.

This is total garbage. This is terribly wrong. It is dangerously misleading.

Consider this (paraphrased but otherwise) actual test found in software.

public void testFooMethod(){
  } catch(Exception e){}

Yes, this is an actual test case. What it “tests” is nothing. What it will do, however, is cover some or all of the lines in the method, giving some points to the code coverage report. Sadly, that is the test’s intent: bump the score.

That’s a failure to understand the usefulness of code coverage. Let’s pause for a moment and reflect on why this test sucks.

How can we tell, without knowing anything about foo or its method, that this tests nothing?

Well, nothing is harsh; the test class won’t compile if foo hasn’t been previously defined, or if it doesn’t have a method named method, or if that method is not visible to our test case. So, with that, it does test existence and accessibility. Since this won’t even compile, these problems can be assumed to have been fixed if it’s actually running.

All this tests is that we can call it. The try/catch block even prevents the test from failing if the method throws an Exception. Bad because the test will always pass. Bad because nothing is validated. Bad test. Bad.

What’s missing from this test is any preparation of foo, any interrogation of its initial state, any interrogation of its final state, or possibly any external changes.

What does this method do? Certainly we can expect the method (at least eventually) to do something. The test should reflect at least a little bit of what it does, while the aggregate of all of the tests that cover the method should allow us to tell what it does entirely. From this test, all we can tell is that the following will suffice as a replacement in our class foo:

public void method()throws Exception{}

Methods like those should be removed anyway, unless they’re only there to satisfy an interface, and even then some consideration needs to be made, but that’s a different article all together.

So, back to how to solve our peers’ code coverage problem.

The real answer, of course, is to write better tests. When someone tries to get me to buy into “no tests for setters and getters” as an argument for not counting them in the coverage totals, and therefore buying into a smaller number, I try to counter with an argument that proper testing will use all of the accessors. In that way, everything will get counted in the coverage reports. Do comprehensive tests, and the code coverage will take care of itself.

Consider again our silly foo.method() call from before. Let us suppose that foo has a property that should remain unmodified, and one that should get modified. This simple test will cover both of those, actually adding value to the test.

public void testFooMethodModifies(){
  Object unmodifiable = new Object();
  Object modifiable = new Object();
  assertSame(unmodifiable, foo.getUnmodifiable());
  assertNotSame(modifiable, foo.getModifiable());

Not too much different, and suddenly our coverage has gone from one of five methods (since we need to accept that the accessors were there to begin with) to five of five! Wow, we’re at a 100% for what we know about foo.method().

If we fill our test coffers with such tests, will we have satisfied our test coverage? Well, technically, we will probably hit 100% of all of the methods if we do it like that, especially in the beans, but we still fall way short.

In the just previous example, we’re satisfied that one of our objects was left alone, but we only confirmed that the other was changed, not that it was changed correctly. Additionally, and more precisely, we know only that the object was replaced, but we are not certain that it was replaced correctly.

What we can assume from these tests is that our method could like this:

public void method() throws Exception{
  this.modifiable = new Object();

While less useless than the previous understanding of the method, it’s not particularly useful, as we still have a method that may or may not be functioning correctly.

The trouble with code coverage and tests like this is that either of the faux tests shown so far will satisfy the requirement for code coverage for this method or class, but neither actually tests it. Therein lies the deception of complete code coverage from the code coverage report.

We can hit 100%, but it doesn’t mean anything.

I maintain that code coverage tools are useful for helping us identify weak spots. It’s easy to see when a line of code isn’t hit, and that’s a helpful bit of information to start with.

The reports are almost as much problem as solution since they very easily hide poor testing. In this article alone, the first test suffices to ensure that coverage is met for some or all of our single method. The second goes a little farther to ensure that some of the other parts of the object are also tested. The only good test in the examples, however, is that our expected unmodified value remained unmodified.

I frequently try and convince my peers to stop trying to hit the coverage number. Look at the report and see what isn’t being tested. See what the method is supposed to do, or the lines missed in the method, and conjure some tests for that. Some, is the key part of that, I try to warn. With this approach, which I’m coining as Reactive Testing, we can be sure that methods are being tested, not simply covered.

Take another (reworked for privacy) method that has a score showing 2/3 coverage.

public String noNull(final String string){
  if(string == null) 
    return "";
  return string;

The test that’s “covering” the method? Of course, it calls it with only a string.

public void testNoNull(){
  foo.noNull("not null");

Again, a bad test because it’s only testing existence, access, and invocation. Thankfully, this one isn’t wrapped with a try/catch. The two of three lines that get covered are the “if()” and the “return string” lines. The lines that get tested are many less, however.

Easy to correct? Sure. How about comparing the inputs and outputs to see if they match? What about the other case in the condition, when the input is null? Two test lines (or methods if you’re looking to up the count), and we can not only rest assured that the coverage is there, but also that the method is actually tested.

public void testNoNull(){
  assertEquals("", foo.testNoNull(null));
  assertEquals("input", foo.testNoNull("input"));

There we have 100% coverage, and valid tests that cover all of the possible inputs and expected outputs for the method. Additionally, we can take even this one test method and completely rebuild our tested method. Of course, it’s a simple example, but building on that idea, we should be able to take all of the test cases and rebuild our methods if we’ve got good coverage.

Here, I tell my friend, is how to solve his problem.

For every method that does some work, create tests that set up the object state before the call and investigate it afterwards, create tests that try different combinations of parameters and evaluate them afterwards, and tests that test the output of the method being tested whether that is a returned object, external work, or both.

This is easier, sometimes, when starting to write tests from scratch rather than correct tests that are passing, so I suggest just starting new tests, optionally removing the old tests.

Look for, or think of, conditions and branches, and add a test case or variable swap for each. Look for calls using other services and try good and bad. Look for loops and iterations and try null, empty, and small sets that feed into them. Make tests that put valid and invalid values in each parameter passed to a method, or each object property that is used by the method. As these permutations are imagined, create a descriptive test stub. When it seems like the pondering is done, start filling in the stubs. If more cases are thought of as the tests are written, add more stubs. If it turns out that one test can do the work of other stubs, remove the excess stubs. It can seem impossible to think of the permutations, but really, if it’s too complex, break the method apart. And then make tests for the different pieces instead of the whole.

Note, none of that addresses coverage. It only addresses complete testing.

Using code coverage to ensure that testing is getting done is a recipe for disaster. Targeting code coverage for the reasons to have tests is a tremendous part of the formula for failure. Believing that complete code coverage is the same as good test coverage is akin to blind faith.

I’m working with my peers to try to improve their understanding of unit testing. I’m trying hard to remove the desire to have 100% coverage, as nice as it sounds. It is achievable, and when you’re testing completely, you’ll be hitting that 100% mark. The converse, however, doesn’t mean anything.

Simply having 100% code coverage means nothing.

About the Author

Object Partners profile.
Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]