Alexandra's Tech: April 2016

VCSH

Vcsh allows you to maintain multiple git repos in one directory. I use this for having separate git repos for dot config files that live in my home directory.

Setting up a new vcsh git repo for a a dot config file (eg .vimrc)

vcsh init vim

vcsh vim add ~/.vimrc

vcsh vim commit -m "first commit"

vcsh vim remote add origin git@github-personal:polyglotpiglet/vim.git

On github created a new repo called 'vim'

vcsh vim push -u origin master

vcsh vim push

MyRepos (MR)

I have several such vcsh repos. Wouldn't it be nice to bulk pull updates for all these repos? Or bulk check them out?

With mr you define a ~/.mrconfig file which lists all the repos to manage.

I added the following configuration parameter to my git config:

git config --global push.default matching

This means that 'mr push' works. Mr push does 'git push' on each of the repos and by setting this config param it means that you push to the remote branch with the same name as the local branch by default.

Consider the following function:

 public static int sum(List<Integer> values) {
    return values.parallelStream().mapToInt(i -> i).sum();
 }

It's using java 8 parallel streams to sum a list of Integers.

In this post we will demonstrate and explain the difference in the performance of this function when passing in a LinkedList vs an ArrayList.

Generate the lists to be tested

The following code creates an IntStream of random Integers.

 Random random = new Random();
 int n = 10000000;
 Stream<Integer> stream = IntStream.range(0, n)
        .map(i -> random.nextInt())
        .boxed();

Collect it into a list:

 List<Integer> list = stream.collect(Collectors.toList());

Here there are no guarantees about type (though it's probably an ArrayList) so let's create two lists, one LinkedList and one ArrayList.

 ArrayList<Integer> al = new ArrayList<>(list);
 LinkedList<Integer> ll = new LinkedList<>(list);

Benchmark

Cheeky little wrapper function to add the timing:

 public static long timedSum(List<Integer> values){
    long startTime = System.nanoTime();
    sum(values);
    long endTime = System.nanoTime();
    return endTime - startTime;
 }

Note: Do not do benchmarking like this. Instead use JMH. I'm just a bad bad person.

Running the Code

Let's run the code a few times for each type of list and take the average time (after a warm up, of course).

 System.out.println("ArrayList " + 
        new Double(LongStream.range(0, 5)
                .map(i -> timedSum(al))
                .average()
                .getAsDouble()).intValue());
 System.out.println("LinkedList " + 
        new Double(LongStream.range(0, 5)
                .map(i -> timedSum(ll))
                .average()
                .getAsDouble()).intValue());

Results

ArrayList 17907140
LinkedList 160238673

The ArrayList performs about x10 better than the LinkedList.

Analysis of Results

Parallel streams use a fork/join model when they execute. This means that the task is split into subtasks. Each subtask is executed and the results combined. In this example, the list is partitioned and each partition is summed. These partitions are then summed until we have one final answer.

Clearly, this partitioning will be most effective when the partitions are roughly equal. An ArrayList supports random access, so partitioning is much quicker and easier than with a LinkedList; decomposing the list into these partitions is O(n).

Alexandra's Tech

Thursday, 14 April 2016

vcsh with myrepos (mr)

VCSH

Setting up a new vcsh git repo for a a dot config file (eg .vimrc)

MyRepos (MR)

Java 8: Parallel Streams - Performance with ArrayList vs LinkedList

Generate the lists to be tested

Benchmark

Running the Code

Results

Analysis of Results

Scala with Cats: Answers to revision questions

Report Abuse

Labels