Recently, I've made some contributions to an open source project for which an understanding of Cassandra would be useful. With no prior knowledge of Cassandra, the following post summarises what I have learned in a hour by conducting some online research.
Cassandra is
decentralised. Data is stored across multiple
nodes and this collection of nodes is known as a
cluster. Among these nodes there is a lot of replication, as specified by the user. For example, imagine there are 12 nodes. The user can specify that they want each piece of data replicated on 6 of the 12 nodes which obviously increases
redundancy but reduces the risk of
data loss.
Replication factors are defined across keyspaces. A
keyspace is defined as a container for application data, similar to a schema in a relational database.
When a user writes to Cassandra, they must also specify a
consistency. This is the number of nodes to which the data must be written before the write is reported successfully to the user. Continuing from our earlier example, if we have 12 nodes in total, a replication factor of 6 and a user issues a write with consistency of three, then when the data has been written to 3/6 of the nodes (ie the replication is 50% complete) the user is informed that their write has been successful.
This idea of consistency also applies to reads. If a user tries to read some data with consistency two, then they must read from two of the six replication nodes and return the most up-to-date result to the user.
Possible consistencies are the following:
ANY - fire and forget for writes, one for reads
ONE
TWO
THREE
QUORUM - means that a majority of nodes must be consistent (so 4 out of our 6)
ALL - all six nodes must have the data
Cassandra can be configured to be
data centre aware. If this is the case, then replication factors must be defined per data centre.
With data centre aware configuration, there are two additional consistencies that can be used.
EACH_QUORUM means that you must be at QUORUM consistency in each data centre.
EACH_QUORUM Example:
2 data centres
Each data centre has 6 nodes so the cluster size is 12
6 node replication factor where Cassandra is not data centre aware.
QUORUM consistency means that the request is complete when 4 nodes are consistent. However these nodes could be unevenly distributed between data centres:
What if the London data centre has a power cut? Suddenly we have lost the benefit of the replication.
Therefore, we can make Cassandra data centre aware and define the replication factor = 3 per data centre.
Now, if we use consistency of EACHQUORUM we can have consistency between two nodes in each data centre:
If one of your data centres is unavailable for some reason then any read or write that a user attempts at QUORUM will fail. However, applications can be set up to reduce consistency if a data centre or some nodes are unavailable.
There is also
LOCAL_QUORUM consistency which can be used when Cassandra is data centre aware. It means that you achieve QUORUM consistency in the data centre of the node you are connected to. If you are interested, the node you are currently connected to is called the
coordinator.
Data centre awareness can extend to Cassandra knowing exactly what rack each node is on.
Hinted Handoffs
Let's look at one more example to understand
hinted handoffs.
2 data centres in London and Winchester.
6 nodes in each data centre => cluster size = 12
Replication factor of 3 per data centre => replication factor = 6
There is a power outage at the data centre in London so it is unavailable.
The user is connected to one of the nodes in the Winchester data centre and attempts to do a write at consistency 1. This write succeeds and the data replicated among 3 nodes in Winchester. However, because the London data centre is unavailable, the data cannot be replicated across 3 nodes in London. However the coordinator stores a
hint and remembers that this data needs to be synced in London. If the London data centre becomes available within 3 hours, the coordinators with stored hints will push to the recovered nodes to bring them up to date.
If it takes longer than 3 hours for nodes to recover, then this self healing is not possible and instead we need to run a
repair on nodes. This means that the recovered nodes go round asking other nodes for information to bring their own state up to date.
Of course, this three hour limit is configurable in Cassandra.
Okay so that was fun. I've only just touched the surface with Cassandra. It seems to me that theoretically you could almost configure it to run like a relational database though apparently its query language (CQL) is less powerful that SQL in that is is missing joins and stuff. Of course replication has its pros and cons (expense vs safety) but it seems like it's a pretty clever little application and I look forward to learning more about it. :D