Converting a single-threaded application to a many-threaded application is often times a tricky proposition. There is more to just the challenge of refactoring the code to work in a concurrent fashion. Tricky little unexpected details from the deep, dark depths start to bubble up. Would you believe an innocuous library like C3P0 can cause major performance problems?
I was recently given the opportunity to enhance the performance of a program written in Groovy used to synchronize data from one system to another. The initial execution of this program was to process 28 million records and subsequent runs were estimated to process 4 million records. Projections based on the state of the program at the time indicated the initial load would take 15 days and the subsequent runs would take 2 days. This obviously needed improvement.
The first major improvement was to implement multi-threading. The Groovy GPars library is very easy to use. Overall projected execution time dropped as the number of threads introduced increased. This was great progress as the estimated execution time decreased to 10 days with only a small amount of refactoring to accommodate splitting up the data to be processed concurrently. Not exactly mind blowing performance gains, but things were moving in the right direction.
Something was amiss at this point, however. Ramping up past 16 threads did not increase performance, and in fact started to make performance worse. This was not expected.
Additional code tweaks were made to try to take advantage of batching up statements, caching some data, and other reorganization. This gained perhaps 2 more days for an estimated total of 8 days of processing. Still, the threading issue was causing problems. Why wouldn’t this program run faster with more threads? It just didn’t make sense.
There were enough connections in the connection pool. The database accepted all of those connections, so there wasn’t an issue of waiting for free connections. The program barely moved the CPU needle on the 8 core, 8GB machine it was running on. The queries generated by Hibernate were sensible. Using VisualVM, I could see garbage collection was high, which is to be expected from a program of this type, but even that wasn’t holding anything back.
The program was initially written to be run on multiple machines. So, we set it up to run on two machines with 16 threads each. Execution time was cut in half! We tried it on three machines. Execution time was cut to about a third. So, it probably couldn’t be related to the database connections or free connections in the pool or even the database itself. I set it up so that I could run two instances of the program on one machine. Execution time was half. It couldn’t be the limitations of one machine, because again, the CPU barely lifted a finger.
There is a plugin available for VisualVM called Threads Inspector. So I opened up VisualVM again and set up a single run of 32 threads. I attached to the running instance of my program and clicked on the Threads tab to see if I could glean any information from it. The threads were waiting about 75% of the time. That seemed unusual. They were not blocking, they were waiting. Waiting on what? You can do thread dumps of individual threads so while one of the threads was in a wait state, I dumped it. It was waiting on the connection pool library, C3P0. Here is part of the offending thread dump:
java.lang.Object.wait() com.mchange.v2.c3p0.stmt.GooGooStatementCache.acquireStatement com.mchange.v2.c3p0.stmt.GooGooStatementCache.checkoutStatement
Apparently, it was waiting on acquiring a statement from the statement cache. C3P0 has two settings for the statement cache: maxStatements and maxStatementsPerConnection. maxStatements is a global setting over the entire pool, where maxStatementsPerConnection is a maximum per connection. If maxStatements is enabled, the number of statements in the pool is shared between all of the connections. As the number of threads increased, this pool became more and more stretched between the connections. I changed the value of this setting to 0 and set maxStatementsPerConnection to 50 and reran the program.
Thank the maker! Processing times immediately plummeted to an estimated two days with 32 threads and ramping up to 128 threads brought that number to about 16 hours. At this point, the database was becoming the bottleneck, but processing times came down to a much more manageable number.
Lessons learned? Configuration settings for a single-threaded application may not make sense for a multi-threaded application. But who would have thought something like the C3P0 library could cause so much grief?