Monday, 6 October 2014

Nerd conversation over dinner: Designing a key-value store

Question 1

Hi, so, imagine you have an interface that looks like this:

Interface {
    void store(Object key, Object value);
    Object getValue(Object key);
}

How would you implement this? 

Answer 1


In memory hash map.

Question 2


Ok, so what if you have 12 billion values and they can be like a gig each? 


Answer 2


So in this case you want to write to disk. You can store in memory until your heap is like 80% full then some background process (or 1 very expensive write) can write to files.

Question 3


How are these files structured?

Answer 3


Every time you write you create a new file. The advantage of this is when you write you are don't need any more disk seeks.

Question 4


So how does a read work?

Answer 4


When you do a read you search backwards through the files for your key.

Question 5


Can you think of any optimisations you could do to make your reads more efficient?

Answer 5


Bloom filter. Every file should have a bloom filter associated with it so you can look up the key in the bloom filter and determine either that the key does not exist in the file or that it might exist in the file.

Therefore, when you do a read you check each file's bloom filter to determine whether or not you need to search the file.

[Note that Cassandra uses bloom filters like this and in Cassandra you can resize your bloom filter to make it more accurate - in fact Cassandra allows you to configure bloom filters for trade-offs between accuracy and memory usage. ]

Another thing you could do is have some sort of indexing within your file to say hash(key) % x => your value lies in Y part of the file (e.g. first quarter of the file).

Question 6


What if you have update your value for a particular key so that your older files are full of stale data that isn't needed?

Answer 6


Kind of like garbage collection you could have some process that performs 'compaction'. Perhaps when there is lots of memory you could go through and compact the files, removing data that isn't needed. Locks on files won't be required if you can store new values in memory until your compaction process is finished.




No comments:

Post a Comment

Scala with Cats: Answers to revision questions

I'm studying the 'Scala with Cats' book. I want the information to stick so I am applying a technique from 'Ultralearning...