Alexandra's Tech: October 2014

Wednesday, 15 October 2014

Solution to Codility Lesson 11: Ladder

The Question:

You have to climb up a ladder. The ladder has exactly N rungs, numbered from 1 to N. With each step, you can ascend by one or two rungs. More precisely:

with your first step you can stand on rung 1 or 2,

if you are on rung K, you can move to rungs K + 1 or K + 2,

finally you have to stand on rung N.

Your task is to count the number of different ways of climbing to the top of the ladder.

For example, given N = 4, you have five different ways of climbing, ascending by:

1, 1, 1 and 1 rung,

1, 1 and 2 rungs,

1, 2 and 1 rung,

2, 1 and 1 rungs, and

2 and 2 rungs.

Given N = 5, you have eight different ways of climbing, ascending by:

1, 1, 1, 1 and 1 rung,

1, 1, 1 and 2 rungs,

1, 1, 2 and 1 rung,

1, 2, 1 and 1 rung,

1, 2 and 2 rungs,

2, 1, 1 and 1 rungs,

2, 1 and 2 rungs, and

2, 2 and 1 rung.

The number of different ways can be very large, so it is sufficient to return the result modulo 2P, for a given integer P.

Write a function:

class Solution { public int[] solution(int[] A, int[] B); }

that, given two non-empty zero-indexed arrays A and B of L integers, returns an array consisting of L integers specifying the consecutive answers; position I should contain the number of different ways of climbing the ladder with A[I] rungs modulo 2B[I].

For example, given L = 5 and:

    A[0] = 4   B[0] = 3
    A[1] = 4   B[1] = 2
    A[2] = 5   B[2] = 4
    A[3] = 5   B[3] = 3
    A[4] = 1   B[4] = 1

the function should return the sequence [5, 1, 8, 0, 1], as explained above.

Assume that:

L is an integer within the range [1..30,000];

each element of array A is an integer within the range [1..L];

each element of array B is an integer within the range [1..30].

Complexity:

expected worst-case time complexity is O(L);

expected worst-case space complexity is O(L), beyond input storage (not counting the storage required for input arguments).

Elements of input arrays can be modified.

The Solution

The key here is to notice that the number of ways of ascending for N=4 is Fibonacci(N+1).

I created a cache of the first x fibonacci numbers modulo 2^y where x is the max value in array a and y is the max value in array b.

Then, for every earlier fibonacci number f (f <= x) you can look up in the cache and take cache[f] and modulo it again by whatever value is in b.

So for example, if we consider the first few fibonacci numbers and cache them modulo 2^3:

Fib(index) = 1, 1, 2, 3, 5, 8, 13, 21

index = 0, 1, 2, 3, 4, 5, 6, 7

cache value = Fib(index) % 8 = 1, 1, 2, 3, 5, 0, 5, 5

Note that in this example, array b will contain values in the range (1, 3).

So we iterate over array a, look up the corresponding value in the cache and take it modulo corresponding value in b.

Happy days.

Solution is here:

Monday, 6 October 2014

Nerd conversation over dinner: Designing a key-value store

Question 1

Hi, so, imagine you have an interface that looks like this:

Interface {
void store(Object key, Object value);
Object getValue(Object key);
}

How would you implement this?

Answer 1

In memory hash map.

Question 2

Ok, so what if you have 12 billion values and they can be like a gig each?

Answer 2

So in this case you want to write to disk. You can store in memory until your heap is like 80% full then some background process (or 1 very expensive write) can write to files.

Question 3

How are these files structured?

Answer 3

Every time you write you create a new file. The advantage of this is when you write you are don't need any more disk seeks.

Question 4

So how does a read work?

Answer 4

When you do a read you search backwards through the files for your key.

Question 5

Can you think of any optimisations you could do to make your reads more efficient?

Answer 5

Bloom filter. Every file should have a bloom filter associated with it so you can look up the key in the bloom filter and determine either that the key does not exist in the file or that it might exist in the file.

Therefore, when you do a read you check each file's bloom filter to determine whether or not you need to search the file.

[Note that Cassandra uses bloom filters like this and in Cassandra you can resize your bloom filter to make it more accurate - in fact Cassandra allows you to configure bloom filters for trade-offs between accuracy and memory usage. ]

Another thing you could do is have some sort of indexing within your file to say hash(key) % x => your value lies in Y part of the file (e.g. first quarter of the file).

Question 6

What if you have update your value for a particular key so that your older files are full of stale data that isn't needed?

Answer 6

Kind of like garbage collection you could have some process that performs 'compaction'. Perhaps when there is lots of memory you could go through and compact the files, removing data that isn't needed. Locks on files won't be required if you can store new values in memory until your compaction process is finished.

Union Find Algorithm

Imagine:

- you have a set of n objects
- where objects can have some sort of connection between them
- and you want to determine if there is a path between two objects.

We assume that the connection between two of these objects is reflexive, symmetric and transitive.

How could this be implemented?

Basically, you can maintain disjointed subsets of components illustrating which are transitively connected.

Eg:

In this example the division of nodes would be: {0, 3, 4, 8} {1} {2, 5} {6, 7}

Implementation 1: Bias towards find

You could store it in an array showing the relationships:

[0, 1, 2, 0, 0, 2, 3, 3, 0]

(eg elements at index 2 and index 5 are both two meaning they are in the same subset)

With this algorithm find is O(1) but union is O(n) which is still a bit slow.

Implementation 2: Maintain little trees

Again maintain an array but this time it is an array of little trees.

Start with [0, 1, 2, 3, 4, 5, 6, 7, 8]

union(2,3) => Update 2 to have root 3: [0, 1, 3, 3, 4, 5, 6, 7, 8]

union(3, 8) => Update 3 to have root 8: [0, 1, 3, 8, 4, 5, 6, 7, 8]

union(1, 6) => Update 1 to have root 6: [0, 6, 3, 8, 4, 5, 6, 7, 8]

union(7, 3) => Update 7 to have root 8 (3's root): [0, 6, 3, 8, 4, 5, 6, 7, 8]

With this implementation both find and union are worst case O(n)

Improvements

The problem with implementation 2 is that your little trees can end up not so little and can get very tall. There are some optimisations you can do to improve performance to minimise tree depth.

Sunday, 5 October 2014

Auction Kata: Thread safety and the CopyOnWrite pattern

Recently I did a Kata where I implemented an 'Auction' class which accepts bids by users on items for some price.

The requirements specified that we should be able to retrieve the following:

- all the items that a user has bid on

- all bids on an item

- and the current winning bid on an item.

This suggested to me that I should have two maps:

HashMap<Item, PriorityQueue<Bid>> bidsForItems;

HashMap<User, HashSet<Item>> itemsUsersHaveBidOn;

Easy so far.

Making it Thread Safe

So it was quite easy to make this thread safe. I simple utilised Concurrent collections.

ConcurrentHashMap<Item, PriorityBlockingQueue<Bid>> bidsForItems;

ConcurrentHashMap <User, CopyOnWriteSet<Item>> itemsUsersHaveBidOn;

The bid method was pretty much unchanged.

However, there is one problem with this implementation: the two maps can be out of sync.

Eg imagine two threads:

Thread 1: Updates bidsForItems saying that there is a new bid on Item Fridge which happens to be a winning bid by User Chris;
Thread 1: Updates itemsUsersHaveBidOn map saying that User Chris has placed a bid on Item Fridge.

Thread 2 wants to call GetWinningBid for Item Fridge, find out which User has placed the winning bid, and see what other items this user has bid on.

If Thread 2 executes between the two actions by thread 1 then Thread 2 might get inconsistent results.

Making it Consistent

To make it consistent I utilised the CopyOnWrite Pattern..

Firstly, I abstracted the two maps into a 'DataStore' class:

Then the 'Auction' class must hold an atomic reference to the DataStore:

Then the bid method looks like this:

We take a copy of the existing datastore and create a new one from it to include the new bid. Then we perform a compare and swap operation to update the AtomicReference to the datastore. The compare and swap method in the while condition will return true if the value is updated (and the loop with terminate) but false if the datastore that we copied has been modified since we took our copy.

The problem with this implementation is that we are creating a new HashMap every time there is a bid. There is one nice optimisation that can be done where instead of creating a new Map every time, we have an immutable wrapper class around it like this:

Saturday, 4 October 2014

Getting Started with Ansible on Mac OS X

Ansible is a great tool for config management on a set of remote machines. However, Ansible can be used to execute commands locally too. My uses for Ansible include writing a"playbook" (i.e. an Ansible script) to install software locally if I get a new laptop and provisioning virtual machines in amazon's EC2.

In the this post you will see how to execute a one liner on set of hosts. More fun stuff will be in later posts. :-)

1. Firstly, install Ansible. Assuming you use homebrew (and if you are on a mac you really should be!) execute the following:

brew update
brew install ansible

2. Create a working directory.

mkdir ansible_wd
cd ansible_wd

3. Within your current directory create a file called hosts and open it with your favourite editor. This hosts file will contain a list of machine names/ip addresses etc.

vim hosts

4. In hosts paste the following:

[group_name]
localhost

This means you have a group called [group_name] (change this to something sensible!) with one host in the group - localhost.

5. Assuming you have an RSA key set up , you need to paste your public key in ~/.ssh/authorized keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

6. Now you are ready to run your first Ansible command:

Make sure you have your sound on and execute the following:

ansible all -i hosts -a "say hello ansible"

What does this do?

'all' means execute on all hosts
'- i' option means inventory which is your hosts file
'-a' means you are executing an ad-hoc command on the relevant hosts.
"say hello ansible" is the command to be executed on every hosts. Note that on Mac OS X, say should be installed by default.

Another example to install htop locally using homebrew:

ansible group_name -i hosts -a "brew install htop"

So you might be wondering what is the point? Why would I bother setting up Ansible to execute local commands like "brew install htop" in a more roundabout way than simply typing the command? As mentioned at the start of the post, one useful thing you can do is add commands to an Ansible script which you can execute on any machine to install software. This will save massive amounts of time when you get your new dev machine and you can magically install everything by running one script. For me, it's open-jdk, htop, wget among other things.

You might wonder why can't you simply write a shell script to execute a series of steps? Well you could, but Ansible is a bit cleverer than that - it can manage state and do all sorts of crazy stuff which we will see in coming posts.

In the next post we will learn how create an Ansible playbook to execute statements on remote or local machines.

Comparable or Comparator in Java?

I was writing some code recently where users can place bids on items.

I wanted to store the bids in a PriorityQueue to enable easy retrieval of the max bid on an item.

Bid.java

I wanted to implement either a Comparator or make Bid Comparable to compare two bids and order them by highest prices.

Attempt 1: Make Bid implement Comparable<Bid>:

However:

- this doesn't really make sense because it only makes sense to compare two bids if they are associated with the same Item.
- given the current class structure, there is no way to handle this in the compareTo(Bid bid) method.

Therefore we should write a Comparator specifically for our use case to be passed into the Priority Queue. :-)

So in general it depends on whether the implementation of compare two is specific to the object or the usage.

Setting up Active MQ on Unix

Install activemq locally

Download gzip'd tar'd file from one of the mirror sites

eg.

wget http://mirror.vorboss.net/apache/activemq/5.9.1/apache-activemq-5.9.1-bin.tar.gz

tar zxvf apache-activemq-5.9.1-bin.tar.gz

cd apache-activemq-5.9.1/bin

chmod 755 activemq

Start activemq

./activemq start

Check it is running:

netstat -an|grep 61616

You can go to the admin console in a web browser:

http://localhost:8161/admin/

Default username and password is admin/admin.

Create JMS project

I created a maven project from the command line:

mvn archetype:generate -DgroupId=com.ojha.play -DartifactId=activemq -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

In the pom I added a dependency on active mq:

<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>activemq-all</artifactId>
<version>5.9.0</version>
</dependency>

Implement producer and consumer

Producer:

Consumer:

I ran the producer and checked the console in the web browser to see the message:

Then ran the consumer to dequeue the message.