Consider the following function:
It's using java 8 parallel streams to sum a list of Integers.
In this post we will demonstrate and explain the difference in the performance of this function when passing in a LinkedList vs an ArrayList.
The following code creates an IntStream of random Integers.
Collect it into a list:
Here there are no guarantees about type (though it's probably an ArrayList) so let's create two lists, one LinkedList and one ArrayList.
Cheeky little wrapper function to add the timing:
public static int sum(List<Integer> values) {
return values.parallelStream().mapToInt(i -> i).sum();
}
It's using java 8 parallel streams to sum a list of Integers.
In this post we will demonstrate and explain the difference in the performance of this function when passing in a LinkedList vs an ArrayList.
Generate the lists to be tested
Random random = new Random();
int n = 10000000;
Stream<Integer> stream = IntStream.range(0, n)
.map(i -> random.nextInt())
.boxed();
Collect it into a list:
List<Integer> list = stream.collect(Collectors.toList());
Here there are no guarantees about type (though it's probably an ArrayList) so let's create two lists, one LinkedList and one ArrayList.
ArrayList<Integer> al = new ArrayList<>(list);
LinkedList<Integer> ll = new LinkedList<>(list);
Benchmark
Cheeky little wrapper function to add the timing:
public static long timedSum(List<Integer> values){
long startTime = System.nanoTime();
sum(values);
long endTime = System.nanoTime();
return endTime - startTime;
}
Note: Do not do benchmarking like this. Instead use JMH. I'm just a bad bad person.
Running the Code
Let's run the code a few times for each type of list and take the average time (after a warm up, of course).
System.out.println("ArrayList " +
new Double(LongStream.range(0, 5)
.map(i -> timedSum(al))
.average()
.getAsDouble()).intValue());
System.out.println("LinkedList " +
new Double(LongStream.range(0, 5)
.map(i -> timedSum(ll))
.average()
.getAsDouble()).intValue());
Results
ArrayList 17907140
LinkedList 160238673
The ArrayList performs about x10 better than the LinkedList.
LinkedList 160238673
The ArrayList performs about x10 better than the LinkedList.
Analysis of Results
Parallel streams use a fork/join model when they execute. This means that the task is split into subtasks. Each subtask is executed and the results combined. In this example, the list is partitioned and each partition is summed. These partitions are then summed until we have one final answer.
Clearly, this partitioning will be most effective when the partitions are roughly equal. An ArrayList supports random access, so partitioning is much quicker and easier than with a LinkedList; decomposing the list into these partitions is O(n).
No comments:
Post a Comment