Amazon SimpleDB 101 & Why It Matters

Amazon continues to amaze us with its Amazon Web Services series of offerings. The latest is SimpleDB, which will be available in limited beta in a few weeks. And it is bound to have a major impact on web infrastructure. As Amazon says in its email to existing developers:

This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

As we’ve already noted,

…the center of gravity is shifting away from monolithic centralized data management to massively parallel distributed data management.

If you are in the business of managing massive amounts of distributed data, you cannot gloss over the Amazon WS trifecta — data-in-the-cloud is the future and with WS, Amazon is way ahead of the pack.What about the offerings of other vendors? Google, for example, has BigTable, and truth be told, SimpleDB has a distinctly BigTable-ish feel to it. But a side-by-side comparison makes it clear that Amazon WS in general – and SimpleDB in particular — is superior, for the following reasons:

  • Google’s offerings – not only BigTable but GoogleBase, Gdisk, etc. — all have an ad hoc, grab-bag-of-tools feeling to them, devoid of any integrated strategy. Or if there is one, it is well-hidden.
  • Amazon WS clearly involves a well-designed master plan aimed at changing the face of software as a service, each new offering akin to a chess piece in a game focused on creating strategic long-term value. And with SimpleDB, the queen has moved to the center.
  • Amazon WS is based on the YOYODA principle — You Own Your Own Data, Always. Along with Amazon S3, SimpleDB is a sharp arrow in the quiver of open data proponents.
  • Amazon WS includes a built-in, flexible payment system so users are neither forced to offer their app for free nor have an “ad-supported” model forced upon them. Now you can build a data-based web app on SimpleDB and seamlessly charge for it.

Tersely put, SimpleDB is hugely disruptive. It will take some time to evolve the new thinking patterns and new design disciplines that this technology forces us to consider. To do so, consider this breakdown of the similarities and differences between SimpleDB and conventional relational databases.

Very, very simplistically speaking, domains are like tables, with items like rows and attributes like columns. A query cannot cross domains, so in this analogy you can’t “join” domains. But that sort of thinking is a holdover from the relational database normalized model.In reality a domain is much more like a database, so we have to stop thinking in terms of tables and joins.

Say we had an SQL database, with tables for “Company,” “Departments” and “Employees.” In SimpleDB, the items (rows) for all three could all go in one domain (database), with it you can run queries on this domain and using operators like UNION and INTERSECT, you can do the equivalent of joins.Existing web technologies such as Ruby on Rails, Django and Hibernate all have an Object Relational Mapper (ORM), which maps language objects to relational database tables.

If designers of these ORMs want to stay in the scalable apps game, they should take a serious look at using SimpleDB as a data store. Better yet, they should build ORMs from the ground up to integrate with SimpleDB.More than two years ago I wrote that Web 2.0 needs Data 2.0. The combination of EC2, S3 and SimpleDB is a toolkit for assembling massively scalable REST addressable web databases. Data 2.0 is now officially here. May the fun and games begin. [digg=]

Nitin Borwankar is a database guru based in San Francisco Bay Area. You can find his writings on hisblog, TagSchema.