Every Java programmer has heard this advice before: “Prefer ArrayList over Vector. Vector is fully synchronized, and as such you’re paying the synchronization penalty even when you don’t need it.”
ArrayList is not synchronized, so when you need it you need to perform synchronization yourself, or alternatively, as the ArrayList javadoc entry says: “… the list should be “wrapped” using the Collections.synchronizedList method.” Something like this:
- List list = Collections.synchronizedList(new ArrayList());
The resulting List will be synchronized, and therefore can be considered safe.
Or is it?
Not really. Consider the very contrived example below.
- final List
list = Collections.synchronizedList( new ArrayList()); - final int nThreads = 1;
- ExecutorService es = Executors.newFixedThreadPool(nThreads);
- for (int i = 0; i <>
- es.execute(new Runnable() {
- public void run() {
- while(true) {
- try {
- list.clear();
- list.add("888");
- list.remove(0);
- } catch(IndexOutOfBoundsException ioobe) {
- ioobe.printStackTrace();
- }
- }
- }
- });
- }
As long nThreads is 1, everything runs just fine. However, increase the number of nThreads to 2, and you start getting this:
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.remove(Unknown Source)
at java.util.Collections$SynchronizedList.remove(Unknown Source)
Changing the synchronized List to Vector doesn’t help either. What happened here? Well, individual method calls of synchronized List and Vector are synchronized. But list.add() and list.remove() can be called in any order between the 2 threads. So if you print list.size() after list.add(), the output is not always 1. Sometimes it’s 0, sometimes it’s 2. Likewise, thread 1 may call list.add(), but before it gets a chance to call list.remove(), thread 2 gets into action and calls list.clear(). Boom, you get IndexOutOfBoundsException.
In that example above, the 3 calls to List’s methods have to be atomic. They must happen together as one unit, no interference from other threads, or else we’ll get the IndexOutOfBoundsException again. The fact that the individual methods are synchronized is irrelevant. In fact, we can go back to using the non-synchronized ArrayList, and the program will work, as long as we synchronize properly to make the 3 calls happen as one atomic, indivisible unit of execution:
- synchronized (list) {
- list.clear();
- list.add("888");
- list.remove(0);
- }
The moral of the story is that just because a class is fully synchronized, doesn’t mean it’s threadsafe (UPDATE: as in doesn’t mean your code will be threadsafe from using it–thanks Alex). You still have to be on the look for those sequence of method calls that have to occur atomically, because method level synchronization won’t help in this regard. In other words, watch what you’re doing. (And yes, we should still prefer ArrayList over Vector.)
Threadsafe Iteration & ConcurrentModificationException
Sometimes it’s not so obvious when exactly we’re supposed to synchronize our use of Collections. Ever encountered a ConcurrentModificationException before? I bet it’s probably because your code looks something like this (a.k.a.: why the for-each loop isn’t such a great idea actually):
- final List
list = new ArrayList(); - list.add("Test1");
- list.add("Test2");
- list.add("Test3");
- for(String s : list) {
- if(s.equals("Test1")) {
- list.remove(s);
- }
- }
ConcurrentModificationException will be thrown in this case, even when there’s only a single thread running. To fix this problem, we can’t use the for-each loop since we have to use the remove() method of the iterator, which is not accessible within the for-each loop. Instead we have to do this:
- for(Iterator
iter = list.iterator(); iter.hasNext();) { - String s = iter.next();
- if(s.equals("Test1")) {
- iter.remove();
- }
- }
The point is that iteration is something that we’d probably want to happen atomically–that is, while we’re iterating over a collection, we don’t want other threads to be modifying that collection. If it happens most probably something is wrong with our design.
This is why if you look into the JDK source code, the implementation for Iterator usually checks the expected modification count (i.e.: how many times is this collection supposed to have been modified?) against the collection’s current modification count (how many times this collection has been modified). If they don’t tally, the Iterator assumes that another thread has modified the collection while the iteration is going on, so it throws the ConcurrentModificationException. (This is exactly what happens in our single-threaded case about too by the way–the call list.remove() increases the modification count such that it no longer tallies with the one that the iterator holds (whereas iter.remove() resets the mod count so they still tally.) ConcurrentModificationException is a useful exception–it informs us of a probable fault in our design.
When Volatile Fails
However, ConcurrentModificationException is not 100% reliable. The implementation of Iterator.next() may look something like this:
- public E next() {
- if (modCount != expectedModCount)
- throw new ConcurrentModificationException();
- // other stuff
That is, ConcurrentModificationException is supposed to be thrown when the mod counts don’t tally. But it may not get thrown because the thread on which the modCount check is running is seeing a stale value of modCount. That is, let’s say you have thread A iterating through a collection. Whenever you call iter.next(), it checks that modCount == expectedModCount. But modCount may have been modified by thread B, and yet A is still seeing the unmodified value. If you remember, this is what the volatile keyword is about–it is to guarantee that a thread will always see the most recent value of a variable marked as such.
So why didn’t Joshua Bloch (or whoever took his place in Sun to take care of the Collections API) mark the modCount volatile? That would at least make the concurrent modification detection mechanism more reliable, yes? Well… no. Actually marking modCount volatile won’t help, because although volatile guarantees uptodateness, it doesn’t guarantee atomicity.
What does that mean? Well, if you examine the implementation of ArrayList, you’ll see that methods that modify the list increment the modCount (non-volatile) variable by one (i.e.: modCount++). So theoretically, if we mark modCount as volatile, whenever thread B says modCount++, thread A should always immediately see the value and throws ConcurrentModificationException.
But there is a problem: the increment operator (++) is not atomic. It is actually a compound read-modify-write sequence. So while thread B is in the middle of executing modCount++, it’s entirely possible that the thread scheduler will kick thread B out and decide to run thread A, which then checks for modCount before B has a chance to write back the new value of modCount.
Hidden Iteration
As if things aren’t hairy enough as they are, it’s not always obvious when an iteration over a collection is happening. Sure, it’s probably pretty easy to spot iteration code we’ve written ourselves, so we can synchronize those. However, much less obvious are the iterations that happen within the harmless-looking methods of the Collections API. If you examine the source code of java.util.AbstractCollection class, for example, you’ll see that methods like contains(), containsAll(), addAll(), removeAll(), retainAll(), clear()… practically almost all of them trigger an iteration over the collection. Iteration suddenly becomes a LOT harder to spot!
So What Do We Do?
Sounds pretty hopeless, isn’t it? Well… nah. A very, very smart CS professor named Doug Lea has figured it out for the rest of us. He came up with concurrent collection classes, which handle the problems listed above for the most common cases. These concurrent collections have been part of the standard Java API in java.util.concurrent package. For most cases, they are drop-in replacement for the corresponding non-threadsafe classes in java.util package, and if you haven’t taken a good look at them, it’s time that you do!
No comments:
Post a Comment