Immutable Data Structures in Concurrent Java Applications

Concurrent applications have multiple threads running simultaneously. Access to data shared by multiple threads requires synchronization which is often a source of fragile and hard to maintain code, hard to find bugs, and performance issues.¬†You can minimize synchronization and the headaches that go with it using immutable data structures. In this article I’ll demonstrate how to create and use immutable data structures to reduce and localize the need for synchronization.

Consider an application that uses a set of contacts. The contacts are stored in a map with the contact name as the key and a Contact object as the value. Various threads in the application access the contacts. Both the Contact class and the map require synchronization. Here is thread safe implementation of the Contact class:

public class Contact {
    private volatile String name;
    private volatile String email;
    private volatile String phone;

    public String getName() {return name;}
    public void setName(String name) {this.name = name;}

    public String getEmail() {return email;}
    public void setEmail(String email) {this.email = email;}

    public String getPhone() {return phone;}
    public void setPhone(String phone) {this.phone = phone;}
}

I could have chosen to synchronize all the setters and getters but making the fields volatile is a simpler solution. Using this implementation for the contact objects and ConcurrentHashMap for the contact map provides a thread safe solution. Usually thread safety at the class level alone is not sufficient to support application logic as demonstrated in the following method:

public void handleGmailNotification(Contact contact) {
    if (contact.getEmail().endsWith("gmail.com")) {
        sendGmailNotification(contact.getEmail());
    }
}

This method is supposed to send a special email to contacts with a GMail address. If another thread changes the contact between the execution of the first two lines of code the GMail specific email may be sent to a non-GMail account. That may only happen once a month and will likely be a difficult bug to find especially if the code that changed the email was added by a developer unaware of the use above. More course grained synchronization can be used to fix the problem but that requires that all code that deals with the Contact’s email address be correctly synchronized. Multiply that by the many different pieces of data being shared by multiple threads in a highly concurrent application and you get fragile, bug ridden, and hard to maintain code. Making Contact immutable alleviates these issues:

public final class Contact {
    private final String name;
    private final String email;
    private final phone;

    public Contact(String name, String email, String phone) {
        this.name = name;
        this.email = email;
        this.phone = phone;
    }

    public String getName() {return name;}
    public String getEmail() {return email;}
    public String getPhone() {return phone;}
}

Using the immutable version I can be sure that the Contact object I’m interacting with will not change. If I need to make a change to the email address I create a new Contact object and put it in the contact map in place of the existing one.

The ConcurrentHashMap, while thread safe, can cause similar issues. This code is supposed to send a text message and an email to every contact:

public void sendMessages(Map contactMap) {
    sendTextToPhone(contactMap.values();
    sendEmail(contactMap.values();
}

If another thread updates the contact map while this is executing we will have another hard to find bug. You could make the collection unmodifiable using Collections.unmodifiableMap but its not truly immutable. The underlying map can still be modified. Also, you have no way to change the contact list. A map implementation that creates a copy when its modified is a nice alternative. Here’s a simple implementation:

public class ImmutableMap implements Map {
    private final Map map;

    public ImmutableMap() {
        this(Collections.EMPTY_MAP);
    }

    public ImmutableMap immutablePut(K key, V value) {
        Map newMap = new HashMap(map);
        newMap.put(key, value);
        return new ImmutableMap(newMap);
    }

    public ImmutableMap immutableRemove(K key) {
        Map newMap = new HashMap(map);
        newMap.remove(key);
        return new ImmutableMap(newMap);
    }

    private ImmutableMap(Map delegate) {
        this.map = Collections.unmodifiableMap(delegate);
    }

    // All the map methods are simply delegated to the map field.
    // To conserve space they are not shown here.
}

The immutablePut and immutableRemove methods return a copy of the map with the changes, leaving the original map unchanged. All the methods from the Map interface delegate to the underlying HashMap. Since it is unmodifiable any mutating methods will throw UnsupportedOperationException. Immutable versions of other mutating map methods can be implemented using the same pattern as immutablePut and immutableRemove. You can safely use any instance of this type in any way with no concern about causing issues in other parts of the application or vice-versa.

Its likely the application we’ve been using as an example will require a central reference to the contact list. Here’s a contact service implementation using the ImmutableMap:

public class ContactService {
    private final ReentrantLock lock = new ReentrantLock();
    private volatile ImmutableMap contacts = new ImmutableMap();

    public void addContact(Contact contact) {
        lock.lock();
        try {
            contacts = contacts.immutablePut(contact.getName(), contact);
        } finally {
            lock.unlock();
        }
    }

    public ImmutableMap getContacts() {
        return contacts;
    }
}

There is some synchronization required to ensure that if multiple threads want to add a contact to the central list all additions are maintained correctly. Without the lock, if two threads called addContact at the same time they would each add a different contact to the existing map and the last one assigned would be the only one saved. The other would be lost. The contact must be volatile to ensure all threads calling getContacts will get the updated reference. The synchronization required in this case is very localized and easy to maintain. Any code that gets the contacts from the service can interact with them without synchronization concerns.

This paradigm works best in cases where updates are infrequent and the data set is relatively small. If updates are frequent or the data set is very large then the copying required can become a performance issue. Don’t let that deter you from adopting this paradigm where you can. Immutable data structures make concurrent programming a much simpler and safer endeavor. It is a proven approach that is fundamental in functional programming languages such as Scala and gaining traction in the object oriented world.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*