Java Synchronization Method Performance Comparison

Some highly concurrent work I recently did got me thinking about the various methods Java provides for synchronizing access to data. I decided to do some testing to see how they compare. The test code is at the bottom of the post.

I want to start with a few caveats:

  • This level of detail is great to understand but you should not choose your methods of synchronization based on performance considerations. Use the methods that result in the cleanest code. In most cases where there are synchronization related performance issues the problem is with what the developer synchronized, not the synchronization method. Critical sections (code that is synchronized) should be as short as possible without resulting in code that is difficult to maintain.
  • This functionality is likely to be very implementation dependent. You may get very different results with a different JVM vendor or version, or on different hardware.

I tested the following synchronization methods: synchronized keyword, volatile field, AtomicInteger, Lock, and fair Lock. The test is fairly simple. I ran various numbers of threads that increment counters using the different synchronization methods. Each thread enters a critical section, increments a counter, and exits the critical section in a tight loop so contention is very high. Each synchronization method has its own counter. The result is the time, in milliseconds, it takes the counter to increment to 10 million. I used the following scenarios with various numbers of threads:

  • All threads incrementing a single counter. Each counter has a column in the table. The fair lock method is only tested with 1 and 2 threads and not included in the serial or concurrent tests.
  • Each thread sequentially incrementing each counter. This is the “Serial” column.
  • Each thread incrementing a single counter with the counters distributed among the threads. This is the “Concurrent” column. Since there are 4 methods tested this column is only populated when the thread count is a multiple of 4 so the counters can be evenly distributed among the threads.

The test results are below. My hardware and JVM version are shown in the results. I have a single CPU with 4 hyper-threaded cores so up to 8 threads may be running at a time.

Initializing test...

Mac OS X (i386) version 10.6.8 

java.version "1.6.0_29"
Java(TM) SE Runtime Environment (1.6.0_29-b11-402-10M3527)
Java HotSpot(TM) Client VM (build 20.4-b02-402, mixed mode)

JVM Stabilization Count Limit: 10000000
Test Count Limit: 10000000

Threads Syncronized    Volatile      Atomic        Lock    FairLock      Serial  Concurrent
      1         236          91         116         289         321         734
      2        2195         604         577         922       29686        4461
      3        2197         603         876         564                    4676
      4        2451         965        1071         567                    5093         864
      5        2401        1006        1118         588                    5199
      6        2341        1037        1221         592                    5113
      7        2398        1038        1343         592                    5389
      8        2378        1048        1451         600                    5527        2283
      9        2399        1163        1449         605                    5699
     12        2364        1465        1383         618                    6128        2537
     24        2464        2957        1431         605                    7817        2557
     48        2486        5604        1355         611                   10008        2755
     96        2487       10849        1182         596                   16793        3192

 Test complete

The results are not hard to interpret so I won’t go through them in detail. Here are the main things I noted:

  • Volatile performs better than other methods under minimal contention but degrades as contention increases.
  • Synchronized and lock are both pretty consistent regardless of contention.
  • Lock is the best choice for performance and consistency
  • Adding threads does not necessarily improve performance. The fastest time incrementing all the counters was the singe serial thread. The amount of contention, the number of cores available, and the amount of time threads spend waiting on resources all affect the point where adding threads reduces performance.
  • Fair lock is extremely slow. It continues to degrade rapidly as contention increases. It was impractical to include it in more tests. You should only use this where fairness is really required.
  • The serial vs concurrent columns show expected behavior but its worth noting in more detail. Where they are both present they use the same number of threads to do the same work. The threads in the serial column are each working with all the counters incrementing them sequentially in a loop. If there are 4 threads then each thread is incrementing every counter. The threads in the concurrent column are each working with one counter. If there are 4 threads then each thread is incrementing a different counter, minimizing contention; if 8 threads then 2 threads are incrementing each counter; and so on. The results clearly demonstrate how important it is to minimize thread contention in highly concurrent applications.

If you find different performance on other JVMs or hardware please post a comment. Also let me know if you find a bug in the test code.

Here’s the test code:

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

public class SyncTest {

	private enum TestType {
		SYNCHRONIZED, VOLATILE, ATOMIC, LOCK, FAIR_LOCK
	}

	private final Lock lock = new ReentrantLock(false);
	private final Lock fairLock = new ReentrantLock(true);
	private int synchronizedCounter;
	private int fairLockCounter;
	private int lockCounter;
	private volatile int volatileCounter;
	private final AtomicInteger atomicCounter = new AtomicInteger();

	public static void main(String[] args) throws Exception {
		new SyncTest().run();
	}

	public void run() throws Exception {

		System.out.println("Initializing test...");
		System.out.println();
		System.out.printf("%s (%s) version %s n", System.getProperty("os.name"), System.getProperty("os.arch"),
				System.getProperty("os.version"));
		System.out.println();
		System.out.printf("java.version "%s"n", System.getProperty("java.version"));
		System.out.printf("%s (%s)n", System.getProperty("java.runtime.name"),
				System.getProperty("java.runtime.version"));
		System.out.printf("%s (build %s, %s)n", System.getProperty("java.vm.name"),
				System.getProperty("java.vm.version"), System.getProperty("java.vm.info"));
		System.out.println();

		int jvmStabilizationEndValue = 100000;
		int endValue = 10000000;
		System.out.println("JVM Stabilization Count Limit: " + endValue);
		System.out.println("Test Count Limit: " + endValue);
		System.out.println();

		// run to let JVM do any optimizations and stabilize
		runIndividualTest(TestType.SYNCHRONIZED, 1, jvmStabilizationEndValue);
		runIndividualTest(TestType.VOLATILE, 1, jvmStabilizationEndValue);
		runIndividualTest(TestType.ATOMIC, 1, jvmStabilizationEndValue);
		runIndividualTest(TestType.LOCK, 1, jvmStabilizationEndValue);
		runTestsConcurrently(1, jvmStabilizationEndValue);
		runTestsSerially(1, jvmStabilizationEndValue);

		System.out
				.printf("Threads Syncronized    Volatile      Atomic        Lock    FairLock      Serial  Concurrentn");

		runAllTests(1, endValue);
		runAllTests(2, endValue);
		runAllTests(3, endValue);
		runAllTests(4, endValue);
		runAllTests(5, endValue);
		runAllTests(6, endValue);
		runAllTests(7, endValue);
		runAllTests(8, endValue);
		runAllTests(9, endValue);
		runAllTests(12, endValue);
		runAllTests(24, endValue);
		runAllTests(48, endValue);
		runAllTests(96, endValue);

		System.out.println("n Test complete");
	}

	private void runAllTests(int threadCount, int endValue) throws Exception {

		long synchronizedElapsed = runIndividualTest(TestType.SYNCHRONIZED, threadCount, endValue);
		long volatileElapsed = runIndividualTest(TestType.VOLATILE, threadCount, endValue);
		long atomicElapsed = runIndividualTest(TestType.ATOMIC, threadCount, endValue);
		long lockElapsed = runIndividualTest(TestType.LOCK, threadCount, endValue);

		long serialElapsed = runTestsSerially(threadCount, endValue);
		long concurrenteElapsed = runTestsConcurrently(threadCount, endValue);

		if (concurrenteElapsed > 0) {

			System.out.printf("%7d %11d %11d %11d %11d %11s %11d %11dn", threadCount, synchronizedElapsed,
					volatileElapsed, atomicElapsed, lockElapsed, "", serialElapsed, concurrenteElapsed);

		} else if (threadCount <= 2) {

			long fairLockElapsed = runIndividualTest(TestType.FAIR_LOCK, threadCount, endValue);
			System.out.printf("%7d %11d %11d %11d %11d %11d %11dn", threadCount, synchronizedElapsed, volatileElapsed,
					atomicElapsed, lockElapsed, fairLockElapsed, serialElapsed);
		} else {

			System.out.printf("%7d %11d %11d %11d %11d %11s %11dn", threadCount, synchronizedElapsed, volatileElapsed,
					atomicElapsed, lockElapsed, "", serialElapsed);
		}
	}

	private long runIndividualTest(final TestType testType, int threadCount, final int endValue) throws Exception {

		final CyclicBarrier testsStarted = new CyclicBarrier(threadCount + 1);
		final CountDownLatch testsComplete = new CountDownLatch(threadCount);

		for (int i = 0; i < threadCount; i++) {
			startTestThread(testType, testsStarted, testsComplete, endValue);
		}

		return waitForTests(testsStarted, testsComplete);
	}

	private long runTestsSerially(int threadCount, final int endValue) throws Exception {

		final CyclicBarrier testsStarted = new CyclicBarrier(threadCount + 1);
		final CountDownLatch testsComplete = new CountDownLatch(threadCount);

		for (int i = 0; i < threadCount; i++) {
			Thread t = new Thread() {
				public void run() {

					try {
						testsStarted.await();

						runSynchronizedTest(endValue);
						runVolatileTest(endValue);
						runAtomicTest(endValue);
						runLockTest(endValue);

					} catch (Throwable t) {
						t.printStackTrace();
					} finally {

						testsComplete.countDown();
					}
				}

			};
			t.start();
		}

		return waitForTests(testsStarted, testsComplete);
	}

	private long runTestsConcurrently(int threadCount, int endValue) throws Exception {

		if (threadCount % 4 != 0) {
			return -1;
		}

		final CyclicBarrier testsStarted = new CyclicBarrier(threadCount + 1);
		final CountDownLatch testsComplete = new CountDownLatch(threadCount);

		threadCount /= 4;
		for (int i = 0; i < threadCount; i++) {
			startTestThread(TestType.SYNCHRONIZED, testsStarted, testsComplete, endValue);
			startTestThread(TestType.VOLATILE, testsStarted, testsComplete, endValue);
			startTestThread(TestType.ATOMIC, testsStarted, testsComplete, endValue);
			startTestThread(TestType.LOCK, testsStarted, testsComplete, endValue);
		}

		return waitForTests(testsStarted, testsComplete);
	}

	private void startTestThread(final TestType testType, final CyclicBarrier testsStarted,
			final CountDownLatch testsComplete, final int endValue) {

		Thread t = new Thread() {
			public void run() {

				try {
					testsStarted.await();

					switch (testType) {
					case SYNCHRONIZED:
						runSynchronizedTest(endValue);
						break;
					case VOLATILE:
						runVolatileTest(endValue);
						break;
					case ATOMIC:
						runAtomicTest(endValue);
						break;
					case LOCK:
						runLockTest(endValue);
						break;
					case FAIR_LOCK:
						runFairLockTest(endValue);
						break;
					}

				} catch (Throwable t) {
					t.printStackTrace();
				} finally {

					testsComplete.countDown();
				}

			}
		};
		t.start();
	}

	private long waitForTests(CyclicBarrier testsStarted, CountDownLatch testsComplete) throws Exception {

		testsStarted.await();
		long startTime = System.currentTimeMillis();

		testsComplete.await();
		long endTime = System.currentTimeMillis();
		reset();

		return endTime - startTime;
	}

	private void reset() {

		synchronized (this) {
			synchronizedCounter = 0;
		}

		volatileCounter = 0;
		atomicCounter.set(0);

		lock.lock();
		try {
			lockCounter = 0;
		} finally {
			lock.unlock();
		}

		fairLock.lock();
		try {
			fairLockCounter = 0;
		} finally {
			fairLock.unlock();
		}
	}

	private void runSynchronizedTest(long endValue) {

		boolean run = true;
		while (run) {
			run = incrementSynchronizedCounter(endValue);
		}
	}

	private synchronized boolean incrementSynchronizedCounter(long endValue) {

		return ++synchronizedCounter < endValue;
	}

	private void runVolatileTest(long endValue) {
		boolean run = true;
		while (run) {
			run = ++volatileCounter < endValue;
		}
	}

	private void runAtomicTest(long endValue) {
		boolean run = true;
		while (run) {
			run = atomicCounter.incrementAndGet() < endValue;
		}
	}

	private void runLockTest(long endValue) {

		boolean run = true;
		while (run) {
			lock.lock();
			try {
				run = ++lockCounter < endValue;
			} finally {
				lock.unlock();
			}
		}
	}

	private void runFairLockTest(long endValue) {

		boolean run = true;
		while (run) {
			fairLock.lock();
			try {
				run = ++fairLockCounter < endValue;
			} finally {
				fairLock.unlock();
			}
		}
	}
}

One thought on “Java Synchronization Method Performance Comparison

  1. Nataly says:

    My results

    Initializing test…

    Windows Vista (x86) version 6.1

    java.version “1.6.0_05”
    Java(TM) SE Runtime Environment (1.6.0_05-b13)
    Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode)

    JVM Stabilization Count Limit: 10000000
    Test Count Limit: 10000000

    Threads Syncronized Volatile Atomic Lock FairLock Serial Concurrent
    1 243 128 112 381 450 866
    2 1530 139 204 833 16670 2746
    3 2213 184 292 625 3401
    4 2569 167 342 612 3504 1057
    5 2633 267 341 604 3549
    6 2321 205 305 609 3901
    7 2644 265 304 610 3846
    8 2375 275 294 608 3576 2057
    9 2556 381 261 610 3543
    12 2302 395 331 610 3809 2049
    24 2385 578 179 604 4106 2008
    48 2295 751 2 602 4409 1698
    96 2587 1166 0 596 3732 1326

    Test complete

  2. 3bdullelah says:

    Good Job
    this is my test :

    Initializing test…

    Windows 7 (x86) version 6.1

    java.version “1.7.0_01”
    Java(TM) SE Runtime Environment (1.7.0_01-b08)
    Java HotSpot(TM) Client VM (build 21.1-b02, mixed mode, sharing)

    JVM Stabilization Count Limit: 10000000
    Test Count Limit: 10000000

    Threads Syncronized Volatile Atomic Lock FairLock Serial Concurrent
    1 288 110 135 308 345 836
    2 1800 295 518 733 29562 3104
    3 1569 550 762 593 3498
    4 1759 516 977 541 3760 1172
    5 1651 566 914 538 3822
    6 1792 734 937 537 3916
    7 1774 728 874 536 4124
    8 1707 755 947 531 4089 2134
    9 1710 914 945 536 4258
    12 1718 1207 883 534 4694 2066
    24 1710 989 739 538 5477 1976
    48 1717 3990 424 562 6437 1751
    96 1747 6735 51 532 8552 668

    Test complete
    ———-
    thanks

  3. 3bdullelah says:

    why the result like this??

  4. Javier (@jbbarquero) says:

    Thanks for the good job.

    Note: executed using a STS 2.8.1 (eclipse 3.7.1)

    Initializing test…

    Windows Vista (amd64) version 6.0

    java.version “1.7.0”
    Java(TM) SE Runtime Environment (1.7.0-b147)
    Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

    JVM Stabilization Count Limit: 10000000
    Test Count Limit: 10000000

    Threads Syncronized Volatile Atomic Lock FairLock Serial Concurrent
    1 376 149 177 413 471 1123
    2 565 673 721 1222 28143 3241
    3 1191 2192 1535 720 5615
    4 1160 2965 2403 732 7585 3518
    5 1126 3504 2715 733 7573
    6 1160 3804 2648 688 8276
    7 1140 4281 2648 702 8549
    8 1194 4012 2615 716 8877 3629
    9 1143 4720 2668 705 9502
    12 1127 5741 2544 695 10466 3712
    24 1185 11187 2547 706 16032 5088
    48 1201 15893 2263 710 20615 4506
    96 1179 38910 1871 706 42475 7399

    Test complete

  5. Dan Gradl says:

    I’m not sure this test/article is entirely fair to fair locks (pun intended). I don’t dispute that fair locks perform significantly slower than the other forms of locking. But the use of fair locks in the example is partly at fault. You have threads that are repeatedly doing the same very short processing (<0ms probably) continuously, and they are more or less circling each other trying to get in line to execute for a brief moment.

    A more realistic scenario (one in which I used it) might be to handle a series of requests coming in rapid succession updating a shared object (session in my case) that need to be processed in the order received. The differences here are that the each thread executes a bit longer (maybe 50ms), they only do the processing once and they go back to a pool of threads. So more or less in this case the fair locking provides a queue that operates across multiple threads.

    I think the bottom line is that yea you don't want to use it for a very high contention scenario, yea it's going to have less throughput. But there are use cases where it is essential and the performance hit is less of a concern. I may try to throw together a sample of this scenario and post it.

  6. Linuxhippy says:

    Those results are bogus – with locks you have a single thread at a time running (or just a few, however most will be in waiting state for the lock to be released).

    With volatile you have all threads always in runnable state – and as volatile will not guard your read-modify-write operation (increment) most of the produced values will be redundant – while putting servre stress on the cache coherency logic propagating the own modified value to the other cores.

    Bottom line: Your tests are comparing apples with peaches. Your volatile test does result in a lot more work performed compared to the lock based tests.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*