Saturday, 26 November 2016

Memory alignment: Oops and viewing mem structure with Jol

Why are 32-bit machines limited to 4GB memory? 

On a 32-bit machine there are only 2^32 (~= 4 billion) distinct memory addresses.

Each memory address can hold 8 bits (= 1 byte).

That means that in total we can reference ~ 4 billion bytes which is approximately equal to 4 GB.

In practice, all of this memory won't be available to the JVM. Some of it will be used by the OS and any other processes running on the machine. Even less will be available to the heap because the JVM needs to store lots of other things including thread stacks, GC info, native memory, compiled code etc.

So 64-bit machines are the solution? 

64 bit machines can reference 2^64 memory addresses which is more than 16 million terabytes of data. Therefore, it should be fine for a Java heap right?

The problem with this addressing is that you end up with huge pointers to objects that you have to store in the heap! The addresses suddenly take twice the amount of memory, which isn't terribly efficient.

What if you don't actually want a multi-terabyte heap? Is there some kind of middle ground?

The middle ground: Compressed Oops

Imagine we have 35-bit addressing. That would mean that we could store up to 2^35 (~= 32GB) memory addresses in heap.

Wouldn't that be great?! Obviously it isn't going to be easy because there aren't any 35-bit machines with 35-bit registers in which to hold memory addresses.

However, the JVM can do something very clever, where it assumes that is has a 35 bit number but the last 3 bits are all zero. When it writes to the memory address it adds the three zeros to lookup the address correctly but when it's just holding the memory address, it only stores 32 bits.

However this means that the JVM can only access every 8th memory address:

000 (0), 1_000 (8), 10_000 (16), 11_000 (24), 100_000 (32), 101_000 (40) etc

This means that the JVM needs to allocate memory in 8-byte chunks (because each memory address holds 1 byte and we are dealing with the memory addresses in chunks of 8).

However this is how the JVM works anyway, so it's all fine and nothing is lost. Convenient, eh?

There is a little bit of fragmentation. If you allocate an object that is 7 bytes then you have 1 empty byte but the impact isn't too big. Though this is probably why we don't try and use 2^36 addresses. That would mean being 16-byte aligned which would likely have much worse fragmentation and wasted memory gaps.

When does the JVM use Compressed Oops? 

From Java 7 onwards, by default if the heap size is >4GB but <32GB then the flag,  +XX:UseCompressedOops is switched on.

Jol

To see how objects are aligned in memory, Jol is a cute little tool.

Example 1:

import org.openjdk.jol.info.ClassLayout;import org.openjdk.jol.vm.VM;import static java.lang.System.out;
public class Jol {
    public static void main(String[] args) throws Exception {
        out.println(VM.current().details());        out.println(ClassLayout.parseClass(A.class).toPrintable());    }
    public static class A {
        boolean f;    }
}

Gives the following output:

# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

com.ojha.Jol$A object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0    12         (object header)                N/A
     12     1 boolean A.f                            N/A
     13     3         (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total

We can see that the object header took 12 bytes, the boolean took 1 byte and 3 bytes were wasted

Example 2 (Same as above but add in a little integer)

import org.openjdk.jol.info.ClassLayout;import org.openjdk.jol.vm.VM;import static java.lang.System.out;
public class Jol {
    public static void main(String[] args) throws Exception {
        out.println(VM.current().details());        out.println(ClassLayout.parseClass(A.class).toPrintable());    }
    public static class A {
        boolean f;        int j;    }
}

Output:

# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

com.ojha.Jol$A object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0    12         (object header)                N/A
     12     4     int A.j                            N/A
     16     1 boolean A.f                            N/A
     17     7         (loss due to the next object alignment)
Instance size: 24 bytes
Space losses: 0 bytes internal + 7 bytes external = 7 bytes total





Sunday, 26 June 2016

JIT Fun Part 4: Intrinsics and Inlining

Definition: Intrinsic


Intrinsic methods in Java are ones that have a native implementation by default in the JDK. They will have a Java version but most of the time it will get inlined and and the intrinsic implementation will be called instead.

Note that what methods are made intrinsic may depend on the platform.

Examples of Intrinsic methods include Math functions (sin, cos, min, max). The full list is here.

Example


Code:

We run this method with the following jvm flags:

-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

In the output we see the following:

 @ 97   java.lang.Math::max (11 bytes)   (intrinsic)

Woohoo! It got inlined with the intrinsic code. 


JIT Fun Part 3: -XX:PrintCompilation

Ok enough guessing about these graphs. Let's look at what's really happening.

But first...

Compiler levels


Level Compiler
0 Interpreter
1 C1 and destined to stay in C1 forever
2 C1 but only pays attention to loop/method counters
3 C1 but gathers details for C2 eg counters, percentage of time conditional evaluates to true
4 C2

Paths through these levels:

0 -> 3 -> 4

  • This is the most common case. Method initially sent to C1 level 3 after a fair few calls, then if it is called a lot or contains a loop with lots of iterations it gets promoted to the super fast level 4 (= C2).

0 -> 3 -> 1

  • This happens if the method is really tiny. It gets sent to level 3 where it is analysed and we realise that it will never go to C2 so we stick it into C1 level 1 forever.


0 -> 2 -> 3 -> 4

  • If C2 is busy at the time of promotion from level 0 then we know we won't get promoted for a while so we go hang out in level 2 rather than hopping off to 3 directly. Then we may get promoted to 4 if we deserve it. 

Example


I've got the following code:

And the times (endTime - startTime) look like this:




I ran the code with -XX:PrintCompilation and I'm using Java 8 so tiered complation is enabled by default.

Let's see what is happening to the rawr method:

90
    164  171 %     3       com.ojha.Rawr::main91 @ 50 (131 bytes)

What does this mean?

i (ith iteration of my for loop which is on the x axis) = 90
timestamp since vm start = 164
this method was 171st in the queue to be compiled
% indicates on stack replacement (method contains a loop)
3: This is the most important bit! This means we are moving into C1, ie doing the first compilation.
50 is the bytecode index of the loop.

More output

159
    167  176       3       com.ojha.Rawr::main (131 bytes)

231
    169  172       3       java.io.PrintStream::ensureOpen (18 bytes)
    169  182 %     4       com.ojha.Rawr::main @ 50 (131 bytes)

At i = 231, the loop has been called enough times for the Rawr main method to be compiled at C2 (that's what the 4 means). Note that at the same time java.io.PrintStream.ensureOpen has been called enough times to be compiled at C1 level 3.

1818
    3       com.ojha.Rawr::main @ -2 (131 bytes)   made not entrant

This is basically saying that the level 3 version of the rawr method shouldn't be used any more (correct because we should now use the level 4 version). We can see this dip in the graph at x = 1818.

1999
    193  234   !   3       java.io.PrintStream::println (24 bytes)
    193  182 %     4       com.ojha.Rawr::main @ -2 (131 bytes)   made not entrant

At the very end of the method Rawr main is over so the level 4 version of the compiled code is made not entrant (ie nothing should 'enter' or call that compiled code).

Note that the ! indicated that the method contains a try catch loop. 

JIT Fun Part 2: Throwing the JIT a curve ball

This post follows on from the previous post about visualising JIT.

Let's start with a simple, silly main method:

Plotting the elapsed time in ns:




We can see that:

Up to about the 70th run it is running in the interpreter
Then it drops into C1 until 200
At 200 it optimises again into C2

At 600 it hits our curve ball and deoptimises the method.
Method eventually drops back in C2 at around 800.

Curve ball:

The compiler thought that our String curveBall was always going to be null and it added that into the compiler. However, when we set it to something other than null the compiler realised that it has made a wrong assumption and had to deoptimise that compiled method. 

JIT Fun Part 1: Quick JIT visualisation of tiered compliation

See the following code:
It's not doing anything exciting, just running the same inner loop 1000 times.

I have taken those times and put them in a file called 'output.txt'.

See the following python function which reads in the numbers from output.txt and plots them on a graph:

Here is the graph that it produces:



At first the code is running in the interperter, then the C1 JIT, then it settles into the C2 JIT. A couple of spikes probably indicate GC.




Thursday, 14 April 2016

vcsh with myrepos (mr)

VCSH


Vcsh allows you to maintain multiple git repos in one directory. I use this for having separate git repos for dot config files that live in my home directory.

Setting up a new vcsh git repo for a a dot config file (eg .vimrc)


vcsh init vim
vcsh vim add ~/.vimrc
vcsh vim commit -m "first commit"
vcsh vim remote add origin git@github-personal:polyglotpiglet/vim.git

On github created a new repo called 'vim'

vcsh vim push -u origin master
vcsh vim push

MyRepos (MR)


I have several such vcsh repos. Wouldn't it be nice to bulk pull updates for all these repos? Or bulk check them out? 

With mr you define a ~/.mrconfig file which lists all the repos to manage.


I added the following configuration parameter to my git config:

git config --global push.default matching

This means that 'mr push' works. Mr push does 'git push' on each of the repos and by setting this config param it means that you push to the remote branch with the same name as the local branch by default. 




Java 8: Parallel Streams - Performance with ArrayList vs LinkedList

Consider the following function:
 public static int sum(List<Integer> values) {
    return values.parallelStream().mapToInt(i -> i).sum();
 }

It's using java 8 parallel streams to sum a list of Integers.

In this post we will demonstrate and explain the difference in the performance of this function when passing in a LinkedList vs an ArrayList.

Generate the lists to be tested


The following code creates an IntStream of random Integers.

 Random random = new Random();
 int n = 10000000;
 Stream<Integer> stream = IntStream.range(0, n)
        .map(i -> random.nextInt())
        .boxed();

Collect it into a list:
 List<Integer> list = stream.collect(Collectors.toList());

Here there are no guarantees about type (though it's probably an ArrayList) so let's create two lists, one LinkedList and one ArrayList.
 ArrayList<Integer> al = new ArrayList<>(list);
 LinkedList<Integer> ll = new LinkedList<>(list);

Benchmark


Cheeky little wrapper function to add the timing: 
 public static long timedSum(List<Integer> values){
    long startTime = System.nanoTime();
    sum(values);
    long endTime = System.nanoTime();
    return endTime - startTime;
 }

Note: Do not do benchmarking like this. Instead use JMH. I'm just a bad bad person. 

Running the Code


Let's run the code a few times for each type of list and take the average time (after a warm up, of course). 

 System.out.println("ArrayList " + 
        new Double(LongStream.range(0, 5)
                .map(i -> timedSum(al))
                .average()
                .getAsDouble()).intValue());
 System.out.println("LinkedList " + 
        new Double(LongStream.range(0, 5)
                .map(i -> timedSum(ll))
                .average()
                .getAsDouble()).intValue());

Results

ArrayList   17907140
LinkedList 160238673

The ArrayList performs about x10 better than the LinkedList. 

Analysis of Results


Parallel streams use a fork/join model when they execute. This means that the task is split into subtasks. Each subtask is executed and the results combined. In this example, the list is partitioned and each partition is summed. These partitions are then summed until we have one final answer. 

Clearly, this partitioning will be most effective when the partitions are roughly equal.  An ArrayList supports random access, so partitioning is much quicker and easier than with a LinkedList; decomposing the list into these partitions is O(n). 


Thursday, 31 March 2016

Java 8: Diamond Inference for methods

As of Java 8, we don't need to specify the types when we pass in typed parameters.


 public static void main(String... args) {
    operateOnHashMap(new HashMap<>()); // empty diamonds :D 
 }
 static void operateOnHashMap(Map<String, Integer> map) {
    // do stuff with map
 }

Java 8: Functional Interface

Definition: 

A function interface in java is an interface with a single abstract method. They are used to type lambda expressions.

Example: 

In the following example, 'Cat' is a functional interface.

 interface Cat {
    void meow();
 }

 public static void main(String... args) {
    Cat cat = () -> System.out.println("Meowwwwww");    
    cat.meow();
 }

It is good practice to annotate functional interfaces with the annotation @FunctionalInterface .

Java 8: Runnables with Lambdas

Old (anonymous inner classes)

 Runnable meow = new Runnable() {
    @Override    
    public void run() {
        System.out.println("meow");    
    }
 };
 meow.run();

Output: 
    meow

    Process finished with exit code 0


New.1 (lambda)

 Runnable rawr = () -> System.out.println("rawr");
 rawr.run();

Output: 
    rawr

    Process finished with exit code 0


New.2 (passing function call)

 public static void main(String... args) {
    Runnable lalala = Main::LaLaLa;    
    lalala.run();
 }

 public static void LaLaLa() {
    System.out.println("lalala");
 }

Output: 
    lalala

    Process finished with exit code 0


Thursday, 25 February 2016

Algorithm to rotate a matrix using transpose

This is a horrible question but there is a neat trick.

To rotate 90 degrees just take transpose then reverse the rows. To rotate -90 (270) degrees take the transpose then reverse the columns.

val matrix = Array(Array(1,2), Array(3,4), Array(5,6))
print(matrix)

1 2
3 4
5 6

Transpose

def transpose(matrix: Array[Array[Int]]): Array[Array[Int]] = {
  matrix.head.indices.map(i => matrix.map(_(i))).toArray
}

print(transpose(matrix))

1 3 5
2 4 6

Rotate 90 degrees clockwise 


Transpose and reverse rows:

def rotate90(matrix: Array[Array[Int]]): Array[Array[Int]] = {
  transpose(matrix).map(_.reverse)
}
print(rotate90(matrix))

5 3 1
6 4 2

Voila.

Rotate 90 degrees anticlockwise


Transpose and reverse columns:

def rotateMinus90(matrix: Array[Array[Int]]): Array[Array[Int]] = {
  val t = transpose(matrix)
  t.head.indices.foreach { i =>
    (0 until (t.length/2)).foreach { j =>
      val temp = t(j)(i)
      t(j)(i) = t(t.length - j - 1)(i)
      t(t.length - j - 1)(i) = temp
    }
  }
  t
}
print(rotateMinus90(matrix))

2 4 6
1 3 5

Full code:


Pimp my library pattern

Let's say I want to add a function to the String class to 'calibrate' data for comparison, eg remove trailing whitespace/convert to lowercase/remove punctuation

I want to call it like " StrING   ".calibrate which would give me "string"


import scala.language.implicitConversions
object StringCalibrationImplicits {

  class CalibratedString(s: String) {
    def calibrate = {
      // also might wanna filter out punctuation marks or whatever      s.trim.toLowerCase
    }
  }

  implicit def calibrate(s: String): CalibratedString = new CalibratedString(s)
  
}

Then from another class

import com.ojha.StringCalibrationImplicits._
val s = "  StrING   "s.calibrate



Tagging in ScalaTest

Really useful for separating tests running in different CI builds.

I usually define a tags.scala which contains the tag definitions:

package com.ojha

import org.scalatest.Tag

object UnitTest extends Tag("com.ojha.UnitTest")

object IntegrationTest extends Tag("com.ojha.IntegrationTest")


Then tag tests like this:

it should "be a cat" taggedAs UnitTest in {
  
}


Then from the sbt console to run only the unit tests:

"test-only -- -n com.ojha.UnitTest"

To run everything except integration tests:

"test-only -- -l com.ojha.IntegrationTest"








Tuesday, 16 February 2016

Maths puzzle: Scatter graph with horizontal line

Question

If you have a 2D scatter plot with n points, you want to draw a horizontal line such that the perpendicular distance between the line and the points is minimised. 

Answer

Intuitively, you will take the mean of the y-coordinates and draw the line through the y-axis at this point.

Can we prove this a little more formally?

You basically want to find the minimum $k$ for $\sum_{i=0}^{n} (y_i - k)^2$

$$ \frac{d}{dk} \sum_{i=0}^{n} (y_i - k)^2 $$

$$ = \frac{d}{dk} \sum_{i=0}^{n} (y_i^2 + k^2 - 2 * y_i*k) $$

$$ = \sum_{i=0}^{n} (2*k - 2 * y_i) $$

To find minimum, set to 0 and solve:

$$ 0 = \sum_{i=0}^{n} (2*k - 2 * y_i) $$

$$ => 0 = n * k + \sum_{i=0}^{n}  -  y_i $$

$$ => k = \frac{ \sum_{i=0}^{n}  y_i}{n}$$

Second derivative is $n$ which is positive so this is minimum.



Bit twiddling tricks



  • Power of two: if (n & (n-1) == 0) then n is a power of two
  • Odd or even: if (n & 1 == 0) then n is even
  • Set nth bit of number i to 1:  i | (1 << n)
  • Set nth bit of number i to 0: i & ~(1 << n )

Solution to Codility Chapter 10: Peaks


The 12 Days of Christmas

Question. 

On the first day of Christmas my true love gave to me: a partridge in a pear tree

On the second day of Christmas my true love gave to me: two turtle doves and a partridge in a pear tree

Etc.

On the 12th day of Christmas, of which type of bird/person do you have the greatest number? 

Answer:

Exhaustively, on day 12:

1 * 12 partridges
2 * 11 turtle doves
3 * 10 french hens
4 * 9 calling birds
5 * 8 gold rings
6 * 7 geese a-laying
7 * 6 swans a-swimming
8 * 5 maids a-milking
9 * 4 ladies dancing
10 * 3 lords a-leaping
11 * 2 pipers piping
12 * 1 drummers drumming

(yes i had to google what was what on each day)

So the answer is that you have 42 geese and 42 swans.

Intuitively we can say that the biggest value is when we are multiplying the numbers that are closest together, in this case 6*7. So the answer for the max is $floor(\frac{13}{2}) * (13-floor(\frac{13}{2}) )$

How to prove this mathematically:

Firstly, is there a general formula?

Yes, where $i$ is the day you first getting a particular type of animal then clearly on day twelve you have $i*(13 - i) $

To find the max we differentiate:

$$\frac{d}{di} i*(13 - i) $$

$$ = \frac{d}{di} 13i - i^2 $$

$$ = 13 - 2i $$

Set this to zero and solve

$$ 13 - 2i = 0 => i = \frac{13}{2}$$

Is this definitely maximum? Differentiating again = $-2$ which means this is maximal.

Obviously, our data is discrete, so we check the values on either side of 6.5, 6 and 7. In this case they give the same answer: 6 * 7 and 7*6 = 42.

(Fun fact when I was fourteen my friends and I made up our own version of this song. All I can remember is the following:

5 g strings
4 manly hugs
3 french kisses
2 bulging biceps
And a nice piece of ass for me

I've forgotten everything I ever learned in A level chemistry, but it's good to know that the important stuff sticks. )

Quicksort in scala

Basic Algorithm


Step 1: Choose pivot p as some element in the array.
Step 2: Partition the array so that everything on the left of p is < p and everything on the right > p (this step is $O(n)$).
Step 3: Recurse on the left and right.

Best and worse cases


Intuitively the best case is when the pivot happens to be the median each time. $O(n * log(n))$

The worst case would be choosing pivot = first element where the array is already sorted. $O(n^2)$

Implementation


There are two main implementation to consider. The first is the most intuitive:


Analysis of Lomuto quicksort

We can see that the partition method is $O(n) = a*n + c_1$

The check 'if (r < l)' occurs in constant time, $c_2$

The go method is more interesting. We can say that in the best case, when the partition is at the median every time

$$T(n) = 2 T (\frac{n}{2}) + (a * n + c_1) + c_2 = 2 T (\frac{n}{2}) + a * n + c_3  $$

Constant doesn't really make much difference as $n \rightarrow \infty $ so let's drop it:

$$T(n) = 2 T (\frac{n}{2}) + a * n  $$

$$T(\frac{n}{2}) = 2 T (\frac{n}{4}) + \frac{a * n}{2} $$
$$ => T(n) = 2 \{ 2 T (\frac{n}{4}) + \frac{a * n}{2} \} + a * n $$
$$ => T(n) = 4 T (\frac{n}{4}) + 2 * a * n $$
$$ => T(n) = 2^k * T(\frac{n}{2^k}) + k * a * n $$

We want to find this formula in terms of $T(1)$

$$ 1 = \frac{n}{2^k}$$

$$ => 2^k = n $$

$$ => k = log_2(n) $$

Substitute this value of $k$ into the earlier formula:

$$ => T(n) = 2^{log_2(n)} * T(1) + log_2(n) * a * n $$

$$ => T(n) = n * T(1) + a * n * log_2(n)  $$

$T(1)$ will happen in constant time $c$

$$ => T(n) = n * c + a * n * log_2(n)  $$

So we can see that the time complexity should be $n * log_2(n)$

Hoare Partitioning


The other implementation of quicksort is less intuitive. Apparently it does fewer swaps (x3 fewer) than lomuto, but the analysis is too complicated for me to bother with right now.

def hoare(): Unit = {
  def swap(i: Int, j:Int) = { val t = a(i); a(i) = a(j); a(j) = t }

  def aux(l: Int, r: Int): Unit = {
    val p = a(l + r / 2)
    var i = l
    var j = r

    while (i < j) {
      while (a(i) < p) i += 1            while (a(j) > p) j -= 1
      if (i <= j) { swap(i,j); i+= 1; j -= 1 }

      if (l < j) aux(l, j)
      if (r > i) aux(i,r)

    }
  }
  aux(0, a.length-1)
}





Saturday, 6 February 2016

Probability Puzzles

Question 1:

Three people are trying to win a game. Each of them is given a hat that is pink or yellow with equal probability ($\frac{1}{2}$). They can each see the colour of the other two hats, but not their own hat colour. They can define a strategy before playing the game but cannot communicate with each other when the game starts. 

Each of them has to guess the colour of their own hat (the guesses are kept secret from each other).  Valid guesses are 'pink',  'yellow' or 'don't know'. 

They win the game if at least one of them guesses correctly and none of them guesses incorrectly. 

Answer 1: 

The strategy should be that if someone sees the other two people wearing the same colour hat, they should guess the opposite colour. If they see two people wearing different coloured hats then should guess 'don't know'. 

The possible outcomes are the following: 

$$ P(3 pink) = \frac{1}{8} $$

$$ P(3 yellow) = \frac{1}{8} $$

$$ P(2 pink, 1 yellow) = \frac{3}{8} $$

$$ P(1 pink, 2 yellow) = \frac{3}{8} $$

Therefore this strategy will work $\frac{3}{4}$ of the time. 

Question 2: 

A plane has 100 passengers who have each been allocated one of 100 seats. However the first passenger to board the plane chooses his seat randomly (perhaps he lost his ticket). The remaining 99 passengers sit in their own seat if it is unoccupied or also choose a seat randomly if someone is sitting in their allocated seat. What is the probability that the last passenger sits in their own seat? 

Answer 2: 

As soon as somebody sits in the first passenger's seat, they break the cycle and everything after that is fine. Therefore, when the last passenger boards the plane the only seats that can be free are either the first passenger's seat or the last passenger's own seat, with equal probability, so the answer is $\frac{1}{2}. $

Question 3: 

A family has two children.

a) One of them is definitely a girl. What is the probably that the other one is also a girl? 
b) The older one is a girl. What is the probability that the younger one is also a girl? 
c) One of the children is a girl with a very unusual name (let's say Rumplestiltskin). What is the probability that the other child is also a girl? (Note that we don't know whether she is older or younger than her sibling).

Answer 3:

a) Three equal options (G B), (B G) or (G G). Answer = $\frac{1}{3}$ 

b) Two equal options (G B) or (G G). Answer = $\frac{1}{2}$

c) 

Let's say that the overall probability that a girl is called Rumplestiltskin = $\frac{1}{10000}$

Possible events: 

P(Girl called RumplestiltskinBoy) =  $ \frac{1}{10000} * \frac{1}{2} * \frac{1}{2} = \frac{1}{40000}$

P(BoyGirl called Rumplestiltskin) = $ \frac{1}{2} * \frac{1}{2} * \frac{1}{10000} = \frac{1}{40000} $

P(Girl with a normal nameBoy) = $ \frac{9999}{10000} * \frac{1}{2} * \frac{1}{2} = \frac{9999}{40000}$

P(BoyGirl with  normal name) = $ \frac{1}{2} * \frac{1}{2} * \frac{9999}{10000} = \frac{9999}{40000} $

P(Girl with normal nameGirl with normal name) = $\frac{9999}{10000} * \frac{1}{2} * \frac{9999}{10000} * \frac{1}{2} = \frac{99980001}{400000000}$

P(Girl with normal nameGirl called Rumplestiltskin) = $ \frac{9999}{10000} * \frac{1}{2} * \frac{1}{10000} * \frac{1}{2} = \frac{9999}{400000000}$

P(Girl called RumplestiltskinGirl with normal name) = $\frac{1}{10000} * \frac{1}{2} * \frac{9999}{10000} * \frac{1}{2} = \frac{9999}{400000000}$

P(Girl called RumplestiltskinGirl called Rumplestiltskin)* = $\frac{1}{10000} * \frac{1}{2} * \frac{1}{10000} * \frac{1}{2} = \frac{1}{400000000}$ 

Conditional probability: 

$$ P (A \mid B) = \frac{P(A \cap B)}{P(B)}$$

Probability of two girls where one girl is called Rumplestiltskin = $\frac{19999}{400000000}$

Probability that one is called Rumplestiltskin = $\frac{39999}{400000000}$

So the P(two girls given one is called Rumplestiltskin) = 
$$ \frac{19999}{40000000} / \frac{39999}{400000000} = 0.499987... \approx \frac{1}{2}$$

(Yep maths is magic) 

*someone call child services

Question 4:

50% of people that complete in a competition win a prize. 95% of people that won the prize felt that they deserved a prize. 75% of people that did not win a prize thought they deserved a prize. 

Whats the probability of getting a prize in the competition if you feel you deserve one? 

Answer 4:

Bayes theorem:  
$$P(A \mid B) = \frac {P(B \mid A) P(A)} {P(B)} $$

$$ => P(prize \mid deserve) = \frac{ P (deserve \mid prize) P (prize) } { P (deserve) } $$

$$=> P(prize \mid deserve) = \frac{0.95 * 0.5}{0.85} = 0.5588... \approx 0.559 $$

Question 5:

What is the expected number of coin tosses to get n heads in a row? 

Answer 5:

Expected number of coin tosses to get one head: 
$$E_1 = 2$$

To get n heads you first need n-1 heads in a row. Then you can either toss another head with probability $\frac{1}{2}$ or a tail, in which case you need $E_n$ more heads. This defines the following recurrence relation:

$$E_n = \frac{1}{2}(E_{n-1} +1) + \frac{1}{2}(E_{n-1} + 1 + E_n)$$

$$ => E_n = 2 E_{n-1} + 2 $$

Hypothesis: 

$$ E_n = 2^{n+1} - 2$$

With a simple proof by induction we see this is indeed the case. 

Friday, 29 January 2016

5-card poker probabilities

(I do this as a warm up exercise when I need to think about combinatorics, weird as that may sound)

The total number of choices for 5 cards is

$$ 52 * 51 * 50 * 49 * 48 = 311875200 $$

One pair: 

$$ {13 \choose 1} * { 4 \choose 2} * { 12 \choose 3 } * { 4 \choose 1}^3 = 1098240 $$

Two pair:

$$ {13 \choose 2 } * { 4 \choose 2}^2 * { 11 \choose 1} * {4 \choose 1} = 123552 $$

Three of a kind: 

$$ {13 \choose 1} * {4 \choose 3} * { 12 \choose 2 } * { 4 \choose 1}^2 = 54912 $$

Straight: 

$$  {10 \choose 1} * { 4 \choose 1} ^ 5 - (straight flush,royal flush) =   {10 \choose 1} * { 4 \choose 1} ^ 5 - {10 \choose 1}* {4 \choose 1} = 10200 $$

Flush:

$$ {13 \choose 5} * {12 \choose 1} - (royal flush,straight flush) = {13 \choose 5} * {12 \choose 1} - {10 \choose 1}* {4 \choose 1} = 5108 $$

Full House: 

$$ {13 \choose 1} {4 \choose 3} * { 12 \choose 1} * { 4 \choose 2} = 3744 $$

Four of a kind:

$$ {13 \choose 1} * { 12 \choose 1} * { 4 \choose 1} = 624$$

Straight flush (not including royal flush):

Per suit there are 10 choices of straight (A - 5, 2 - 6, 3 - 7, 4 - 8, 5 - 9, 6 - 10, 7 - J, 8 - Q, 9 - K, 10 - A)

$$ {10 \choose 1} * {4 \choose 1} - {4 \choose 1} = 36 $$

Royal flush: 

$$ {4 \choose 1} = 4 $$



Thursday, 28 January 2016

Least Squares Regression

If you have a scatter graph of n points, how can you find a 'line of best fit' for the data?

There exists some line $l_i$ that we need to find:

$$l_i = m (x_i) + c$$

However, the y-intercept of this line (c) is the y-value when x = 0. This may not be the most useful results so one neat adjustment we can make is redefined the line so that it crosses the y axis when $x$ = $\bar{x}$.

$$l_i = m (x_i - \bar{x}) + c $$

For a least squares regression we want to find m and c such that the following sum is minimised:

$$ \sum_{i=1}^{n}(y_i - l_i)^2 = \sum_{i=1}^{n}(y_i - (m(x_i - \bar{x}) + c))^2 \equiv Q $$

Solution: 


Minimising $Q$ involves differentiating Q by $c$ and $m$, setting each result to zero and solving for $c$ and $m$.


Firstly differentiate wrt $c$: 


$$Q = \sum_{i=1}^{n}(y_i - (m(x_i - \bar{x}) + c))^2 $$

$$\frac{dQ}{dc} = 2 \sum_{i=1}^{n}(y_i - (m(x_i - \bar{x}) + c)) * (-1) = 0 $$

$$ =>  - \sum _{i=1}^{n} y_i + m  \sum _{i=1}^{n} (x_i - \bar{x}) + \sum _{i=1}^{n}c = 0 $$

Note that $  \sum _{i=1}^{n} (x_i - \bar{x}) = 0 $

Therefore:

$$  - \sum _{i=1}^{n} y_i  + n c = 0 $$

$$ => c = \frac{\sum _{i=1}^{n} y_i } { n } = \bar{y} $$

Now let's differentiate wrt $m$:


$$Q = \sum_{i=1}^{n}(y_i - (m(x_i - \bar{x}) + \bar{y}))^2 $$

$$ \frac{dQ}{dm} = 2 \sum_{i=1}^{n}(y_i - (m(x_i - \bar{x}) + \bar{y}))(\bar{x}-x_i) = 0 $$

$$ => \sum_{i=1}^{n} (y_i - (m x_i - m \bar{x} + \bar{y}))(\bar{x} - x_1) = 0 $$

$$ => \sum_{i=1}^{n} (y_i - m x_i + m \bar{x} - \bar{y}))(\bar{x} - x_1) = 0 $$

$$ => \sum_{i=1}^{n} [(y_i \bar{x}- m x_i \bar{x} + m \bar{x}^2 - \bar{y}\bar{x}) - (y_i x_i - m x_i^2 + m \bar{x} x_i - \bar{y} x_i)] = 0 $$

$$ => m \sum_{i=1}^{n} ( \bar{x}^2 + x_i^2 - 2 \bar{x} x_i) + \sum_{i=1}^{n} (y_i \bar{x} - \bar{y} \bar{x} - y_i x_i - \bar{y} x_i) = 0 $$ 

$$ => m  \sum_{i=1}^{n} (x_i - \bar{x})^2 -  \sum_{i=1}^{n} (y_i - \bar{y})(x_i - \bar{x}) = 0 $$

$$ => m = \frac{\sum_{i=1}^{n} (y_i - \bar{y})(x_i - \bar{x})}{\sum_{i=1}^{n} (x_i - \bar{x})^2 }$$

Voila. Now to find the line, we can easily calculate $c$ and $m$. 


Formal definition of a Monad

A monad it a parametric type with two functions defined, $flatMap$ and $unit$.
trait Monad[T] {
  def flatMap[U](f: T => Monad[U]): Monad[U]
}
def unit[T](t: T): Monad[T]

Monads must satisfy the following three laws:

Associativity law: $$ m.flatMap(f).flatMap(g) = m.flatMap ( x => f(x).flatmap(g)) $$

Left unit law:  $$ unit(x).flatMap(f) = f(x) $$

Right unit law: $$ m.flatMap(unit) = m $$















Wednesday, 27 January 2016

Dijkstra in scala

The easiest way to understand dijkstra is by watching a youtube video.

Step 1: Mark all distance to start node as 0 and distance to all other nodes as infinity.
Step 2: Take current node and update distance to all its linked nodes that are unvisited
Step 3: Choose the next node which is the closest unvisited node.
Repeat steps 2 and 3 until target nodes is marked visited.

The following gist shows the code. Full code on github.

How to implement a graph

Let's take the edit distance example graph from an earlier post.

This is our example graph:


An edge exists between two nodes if they differ by only one letter.

There are two main ways of representing graphs in memory:

1. Adjacency Matrix

This is a 2x2 matrix with a bit set if there is an edge between two vertices. Note that because this is a undirected graph this matrix will be symmetrical.

       cat    cab    car    bar    tar    tat
cat             1       1
cab   1                1
car    1       1                1  
bar                      1                1
tar                                1              1
tat                                          1

2. Adjacency List

For each Vertex maintain a list containing all linked vertices.

cat      [cab, car]
cab     [cat, car]
car      [cat, bar, cab]
bar      [car, tar]
tar       [bar, tat]
tat       [tar]

(Note that to solve the problem, it's a simple matter of building a graph and executing dijkstra's algorithm to find the shortest path between two nodes.)

Computing edit distance between strings in Scala

Question: Assuming you have a set of n letter dictionary words ($S$), a start word ($s_0 \in S$) and a target word ($s_t \in S$) find the shortest chain of words ($C$) where one letter differs between consecutive words in the chain and the first and last words in the chain are $s_0$ and $s_t$ respectively.

Eg $S = [cat, cab, car, tat, tar, bar], s_0 = cat, s_t = tar $
                      $ => C = [cat -> car -> bar -> tar]$

Top level solution


To solve this programatically, we can see that this is a graph programming problem - we can build a graph representing the data then find the shortest path within the graph from $s_0$ to $s_t$.


Scala has no graph implementation in the standard collections, but there is a neat little project called scala-graph which you can use.

def computeEditDistance(start: String, end: String, dictionary: Set[String]): Option[Int] = {
  val graph = Graph.from(dictionary, computeEdges(dictionary))
  val path = graph.get(start).shortestPathTo(graph.get(end))
  path.map(_.edges.size)
}

How to compute the edges


The interesting part is the computeEdges function. Here is a naive implementation:

private def computeEdgesSlowly(dictionary: Set[String]): List[UnDiEdge[String]] = {
  val (_, edges) 
     = dictionary.foldLeft (dictionary.drop(1), List.empty[UnDiEdge[String]]) {
        case (acc, word) =>
          val (unvisitedWords, edgeList) = acc
          val edgesForThisWord = unvisitedWords.collect { 
            case (w) if differsByOneLetter(w, word) 
               => word ~ w 
          }
          (unvisitedWords.drop(1), edgeList ++ edgesForThisWord)
      }
  edges
}

For every word in the list we check it against every other unvisisted word in the list meaning that this algorithm is O(n^2) where n = the number of words in the dictionary.

Eg

cat is compared to  [ cab, car, tat, tar, bar ]
cab is compared to [ car, tat, tar, bar ]
car is compared to  [ tat, tar, bar ]
tat is compared to   [ tar, bar ]
tar is compared to   [ bar ]

Cleverer algorithm for computing edges

It is possible to compute the edges in O(n) which is a huge improvement on the implementation above. :D :D

Example:  [ cat, cab, car, tat, tar, bar ]

Step 1: 

Firstly we compute 'subs' for each word in the dictionary

    'cat' ->  [ '?at'  , 'c?t' ,  'ca?' ]
    'cab' -> [ '?ab' , 'c?b' ,  'ca?' ]
    etc...

val mapOfSubs = dictionary.map { word =>
  val subs = (0 until word.length) map { case i =>
    word.patch(i, Seq('?'), 1)
  }
  word -> subs
}.toMap

Step 2:

We build a hashmap from the subs

    '?at' ->  [ 'cat', 'tat' ]
    'c?t' ->  [ 'cat' ]
    'ca?' -> [ 'cat', 'cab', 'car' ]
    '?ab' -> [ 'cab' ]
    'c?b' -> [ 'cab' ]
    etc...

val cachedMap = mutable.HashMap.empty[String, Set[String]]
dictionary.foreach { word =>
  val subs = mapOfSubs.get(word)
  subs.foreach(o => o.foreach {
    sub => 
      cachedMap(sub) 
        = Set(word) ++ cachedMap.getOrElse(sub, Set.empty[String])
  })
}

Step 3:

Now we build up the list of edges. For each word, we get 3 subs for each word. Then we look up each of these 3 subs in the above hashmap to get all the nodes for which the current words has an edge with. We keep a list of nodes we have visited so that we don't add each node twice (the edges are undirected).

val visited = mutable.Seq.empty[String]
dictionary.flatMap { word =>
  word :+ visited
  val subs = mapOfSubs(word)
  subs.flatMap { s =>
    val links = cachedMap(s)
    links.collect { 
      case l if !visited.contains(l) 
        => word ~ l // ~ = undirected edge    }
  }
}.toSeq


Full code is here.


Tuesday, 26 January 2016

Tree algorithms in Scala: DFS, BFS and Height of binary tree

Tree:


         3
   5          2
1   4      6

Preorder Traversal


root -> left subtree -> right subtree
3 ->  5 -> 1 -> 4 -> 2 -> 6

def preOrder(node: Node): Unit = {
  if (node != null){
    print(node.value + " ")
    preOrder(node.left)
    preOrder(node.right)
  }
}

def iterativePreOrder(root: Node): Unit = {
  val stack = new mutable.Stack[Node]()
  var node = root
  while (stack.nonEmpty || node != null) {
    if (node != null) {
      print(node.value + " ")
      if (node.right != null) stack.push(node.right)
      node = node.left
    }
    else {
      node = stack.pop()
    }
  }
}

InOrder Traversal


left subtree -> root -> right subtree
1 -> 5 -> 4 -> 3 -> 6 -> 2

def inOrder(node: Node): Unit = {
  if (node != null){
    inOrder(node.left)
    print(node.value + " ")
    inOrder(node.right)
  }
}

def iterativeInOrder(root: Node): Unit = {
  val stack = new mutable.Stack[Node]()
  var node = root
  while (stack.nonEmpty || node != null) {    if (node != null) {      stack.push(node)      node = node.left    }    else {      node = stack.pop()      print(node.value + " ")      node = node.right    }  }
}

PostOrder Traversal


left subtree -> right subtree -> root
1 -> 4 -> 5 -> 6 -> 2 -> 3

def postOrder(node: Node): Unit = {
  if (node != null) {
    postOrder(node.left)
    postOrder(node.right)
    print(node.value + " ")
  }
}

def iterativePostOrder(root: Node): Unit = {
  val stack = new mutable.Stack[Node]()
  var node = root
  var prev: Node = null  while (stack.nonEmpty || node != null) {
    if (node != null) {
      stack.push(node)
      node = node.left
    }
    else {
      node = stack.pop()
      if (node.right != null && prev != node.right) {
        stack.push(node)
        node = node.right
      }
      else {
        print(node.value + " ")
        prev = node
        node = null      }
    }
  }
}

Breadth first search

3 -> 5 -> 2 -> 1 -> 4 -> 6 

def bfs(root: Node): Unit = {
  val q = new mutable.Queue[Node]()
  var node = root

  q.enqueue(node)
  while (q.nonEmpty) {
    node = q.dequeue()
    print(node.value + " ")
    if (node.left != null) q.enqueue(node.left)
    if (node.right != null) q.enqueue(node.right)
  }
}


Height of a binary tree



def height(root: Node): Int = {
  val q = new mutable.Queue[(Node, Int)]()
  var node = root

  q.enqueue((node, 1))
  var max = 1  while (q.nonEmpty) {
    val d = q.dequeue()
    node = d._1

    if (node.left != null) {
      max = d._2 +1      q.enqueue((node.left, max))
    }
    if (node.right != null) {
      max = d._2 +1      q.enqueue((node.right, max))
    }
  }
  max
}



Wednesday, 20 January 2016

Scala: Pass by name vs pass by value

Scala call by value vs call by name



In scala when you pass by name (using =>) the expression gets evaluated every time it is used.

Passing by name and passing by value will result in the same output if pure functions are used and both evaluations terminate.

Note that in scala the default is passing by value which is more performant. 

Scala with Cats: Answers to revision questions

I'm studying the 'Scala with Cats' book. I want the information to stick so I am applying a technique from 'Ultralearning...