Thursday, 14 April 2016

vcsh with myrepos (mr)

VCSH


Vcsh allows you to maintain multiple git repos in one directory. I use this for having separate git repos for dot config files that live in my home directory.

Setting up a new vcsh git repo for a a dot config file (eg .vimrc)


vcsh init vim
vcsh vim add ~/.vimrc
vcsh vim commit -m "first commit"
vcsh vim remote add origin git@github-personal:polyglotpiglet/vim.git

On github created a new repo called 'vim'

vcsh vim push -u origin master
vcsh vim push

MyRepos (MR)


I have several such vcsh repos. Wouldn't it be nice to bulk pull updates for all these repos? Or bulk check them out? 

With mr you define a ~/.mrconfig file which lists all the repos to manage.


I added the following configuration parameter to my git config:

git config --global push.default matching

This means that 'mr push' works. Mr push does 'git push' on each of the repos and by setting this config param it means that you push to the remote branch with the same name as the local branch by default. 




Java 8: Parallel Streams - Performance with ArrayList vs LinkedList

Consider the following function:
 public static int sum(List<Integer> values) {
    return values.parallelStream().mapToInt(i -> i).sum();
 }

It's using java 8 parallel streams to sum a list of Integers.

In this post we will demonstrate and explain the difference in the performance of this function when passing in a LinkedList vs an ArrayList.

Generate the lists to be tested


The following code creates an IntStream of random Integers.

 Random random = new Random();
 int n = 10000000;
 Stream<Integer> stream = IntStream.range(0, n)
        .map(i -> random.nextInt())
        .boxed();

Collect it into a list:
 List<Integer> list = stream.collect(Collectors.toList());

Here there are no guarantees about type (though it's probably an ArrayList) so let's create two lists, one LinkedList and one ArrayList.
 ArrayList<Integer> al = new ArrayList<>(list);
 LinkedList<Integer> ll = new LinkedList<>(list);

Benchmark


Cheeky little wrapper function to add the timing: 
 public static long timedSum(List<Integer> values){
    long startTime = System.nanoTime();
    sum(values);
    long endTime = System.nanoTime();
    return endTime - startTime;
 }

Note: Do not do benchmarking like this. Instead use JMH. I'm just a bad bad person. 

Running the Code


Let's run the code a few times for each type of list and take the average time (after a warm up, of course). 

 System.out.println("ArrayList " + 
        new Double(LongStream.range(0, 5)
                .map(i -> timedSum(al))
                .average()
                .getAsDouble()).intValue());
 System.out.println("LinkedList " + 
        new Double(LongStream.range(0, 5)
                .map(i -> timedSum(ll))
                .average()
                .getAsDouble()).intValue());

Results

ArrayList   17907140
LinkedList 160238673

The ArrayList performs about x10 better than the LinkedList. 

Analysis of Results


Parallel streams use a fork/join model when they execute. This means that the task is split into subtasks. Each subtask is executed and the results combined. In this example, the list is partitioned and each partition is summed. These partitions are then summed until we have one final answer. 

Clearly, this partitioning will be most effective when the partitions are roughly equal.  An ArrayList supports random access, so partitioning is much quicker and easier than with a LinkedList; decomposing the list into these partitions is O(n). 


Scala with Cats: Answers to revision questions

I'm studying the 'Scala with Cats' book. I want the information to stick so I am applying a technique from 'Ultralearning...