Saturday, December 19, 2009

Terrastore and the Cap Theorem

This is an edited version of the original "Terrastore and the Cap Theorem" article, updated to reflect latest Terrastore developments.

Terrastore is a new born document store based on the wonderful Terracotta technology, focused on providing a feature-rich, scalable, yet consistent, data store.

Data stores today are often classified depending on how they deal with availability, partition-tolerance and consistency, and how they resolve the conflict between them.
Such a conflict is described as the CAP theorem, stating that you can only pick two of them and relax the third: the matter isn't trivial, but it's not the focus of this post, so I will not go any further with it and point you at some well written posts such as this one from Julian Browne, and this one from Jeff Darcy.

With the CAP theorem in mind, a few people wisely asked: how does Terrastore relate to it?

Let's discuss!

Terrastore architecture in a nutshell

Terrastore is a distributed/partitioned store with a master-based topology powered by Terracotta.
It can be configured to work with a single cluster, made up of one active master, one or more optional passive masters, and one or more servers; or it can be configured to work with a clusters ensemble: a set of more clusters working together as a single whole.
All Terrastore documents are automatically partitioned between clusters (if more than one is provided as an ensemble) and intra-cluster server nodes, so that every node only holds in its own main memory a subset of all documents.
All Terrastore server nodes are equal peers, talking to each other only for requesting documents actually owned by some other node.
Data replication is managed by the central (Terracotta) master: each active master manages data replication inside its own cluster; so when working as an ensemble, data never cross a cluster boundary and each active master scales independently from others.

Consistency (C)

Terrastore is a consistent store.
What does it mean? Does it support serializable transactions? Is it fully ACID?

Well, Terrastore is somewhat in the middle between ACID and BASE.
ACID means Atomic, Consistent, Isolated and Durable: that is, the flagship characteristics of your old relational database.
BASE means Basically Available, Soft state, Eventually consistent: that is, the flagship characteristics of many distributed non-relational databases currently on the market, which do not (normally) give any time guarantee in regards to data consistency among all distributed nodes (see my previous post about the topic).

Terrastore provides per-document consistency, meaning that all single updates to a single document will be atomic, isolated and durable: you will always see the latest version of a document, but no support is provided for transactions between different documents or between consecutive operations on the same document.

Avoiding cross-operation and cross-document transactions makes Terrastore more scalable than relational databases, while at the same time as safe as them to reason about updates order.
The complexity of a consistent write operation is very low, because no synchronous replication is involved between nodes: it has a best-case complexity of O(1), requiring only a semi-asynchronous replication toward the master, and a worst case of O(2) if the written document should be owned by another node (and hence internally routed there).

Availability (A) and Partition Tolerance (P)

I'll talk about availability and partition tolerance together because you can't really consider the former without taking into account the latter.

Terrastore availability depends on whether we're talking about clusters, masters or servers.

In case of a Terrastore ensemble made up of several clusters, if one or more clusters are completely unreachable all data hosted inside them will be unavailable, but other clusters with related data will remain available.
This means Terrastore can tolerate partitions between clusters: a cluster will continue serving its and other available clusters data, while considering as unavailable data belonging to unreachable clusters.

When moving our focus inside a single cluster, Terrastore will be fully available as far as there's one reachable master and one reachable server.
You can dynamically run and shutdown how many server nodes you want, and they can fail at any rate.
But, if connection between server(s) and master(s) is lost, the cluster will suddenly be unavailable: this means that Terrastore can't tolerate partitions between servers and masters inside a single cluster.

Final words: is Terrastore superior to other solutions?

The answer is obviously: no, it isn't.
Whatever other product-owners say, the truth is just that: there's no silver bullet, nor universal solution.
Just go with a non-distributed non-relational solution if you need super-fast access to your data.
Take a look at a "base"-oriented solution if you have very large amount of data, probably distributed on several data centers, and need both partition-tolerance and full data availability.
Or go with Terrastore if you need a distributed store with consistency features.

Evaluate.
Pick your choice.
And be happy.