Alexandra's Tech: Java 8: Parallel Streams - Performance with ArrayList vs LinkedList

Thursday, 14 April 2016

Java 8: Parallel Streams - Performance with ArrayList vs LinkedList

Consider the following function:

 public static int sum(List<Integer> values) {
    return values.parallelStream().mapToInt(i -> i).sum();
 }

It's using java 8 parallel streams to sum a list of Integers.

In this post we will demonstrate and explain the difference in the performance of this function when passing in a LinkedList vs an ArrayList.

Generate the lists to be tested

The following code creates an IntStream of random Integers.

 Random random = new Random();
 int n = 10000000;
 Stream<Integer> stream = IntStream.range(0, n)
        .map(i -> random.nextInt())
        .boxed();

Collect it into a list:

 List<Integer> list = stream.collect(Collectors.toList());

Here there are no guarantees about type (though it's probably an ArrayList) so let's create two lists, one LinkedList and one ArrayList.

 ArrayList<Integer> al = new ArrayList<>(list);
 LinkedList<Integer> ll = new LinkedList<>(list);

Benchmark

Cheeky little wrapper function to add the timing:

 public static long timedSum(List<Integer> values){
    long startTime = System.nanoTime();
    sum(values);
    long endTime = System.nanoTime();
    return endTime - startTime;
 }

Note: Do not do benchmarking like this. Instead use JMH. I'm just a bad bad person.

Running the Code

Let's run the code a few times for each type of list and take the average time (after a warm up, of course).

 System.out.println("ArrayList " + 
        new Double(LongStream.range(0, 5)
                .map(i -> timedSum(al))
                .average()
                .getAsDouble()).intValue());
 System.out.println("LinkedList " + 
        new Double(LongStream.range(0, 5)
                .map(i -> timedSum(ll))
                .average()
                .getAsDouble()).intValue());

Results

ArrayList 17907140
LinkedList 160238673

The ArrayList performs about x10 better than the LinkedList.

Analysis of Results

Parallel streams use a fork/join model when they execute. This means that the task is split into subtasks. Each subtask is executed and the results combined. In this example, the list is partitioned and each partition is summed. These partitions are then summed until we have one final answer.

Clearly, this partitioning will be most effective when the partitions are roughly equal. An ArrayList supports random access, so partitioning is much quicker and easier than with a LinkedList; decomposing the list into these partitions is O(n).

Alexandra's Tech

Thursday, 14 April 2016

Java 8: Parallel Streams - Performance with ArrayList vs LinkedList

Generate the lists to be tested

Benchmark

Running the Code

Results

Analysis of Results

No comments:

Post a Comment

Scala with Cats: Answers to revision questions

Report Abuse

Labels