Does Microsoft's Cosmos DB promise too much?

Microsoft says its new Azure cloud database is all types of databases in one. Here's why that might be a problem

Microsoft apparently missed database godfather Michael Stonebraker’s memo. In 2005 Stonebraker declared the “one size fits all” mentality of the database market is an idea whose “time has come and gone.” Fast forward to 2017 and Microsoft launched Azure Cosmos DB, a new database that promises to do... everything.

No, really. Everything.

Relational data? Check. Documents? Yep. Graph? Of course. Strong consistency? Bingo! Eventual consistency? That, too! In fact, Cosmos DB has five consistency models to choose from.

Not surprisingly, euphoric cries greeted the press release, with one developer gushing that it “absolutely beats any competitor in the cloud” and, as such, “not sure why would you go for anything else today.” Microsoft, even less surprisingly, agreed, calling Azure Cosmos DB “the first globally-distributed data service that lets you elastically scale throughput and storage across any number of geographical regions while guaranteeing low latency, high availability, and [five well-defined] consistency [models].”

The problem with such “everything under the sun” products, however, is that what they gain in breadth they often lose in depth. As Jared Rosoff, a former MongoDB executive told me, “When you tell me your database does everything what I hear is that it’s mediocre at all of it.”

He may have a point.

You had one job…

While NoSQL hasn’t killed the general purpose relational database (see MySQL’s continued strength as proof), it has given the market different ways to accommodate diverse application requirements. As ArrangoDB board member Luca Olivari told me, “Key value stores are blazingly fast with extremely simple data, document stores are brilliant for complex data, and graph solutions shine with highly interconnected data.”

Some discount this splintering of the database market. As Olivari went on to tell me, mastering these systems involves “a steep learning curve (in truth, many steep learning curves)” while “keeping your data consistent, your application fault-tolerant, and your architecture lean is rather impossible.”

Like it or not, this is the world we live in. Check out DB-Engines.com and you’ll find hundreds of databases, each carving out its own niche. Stonebraker called this trend over a decade ago:

The last 25 years of commercial DBMS development can be summed up in a single phrase: “One size fits all.” This phrase refers to the fact that the traditional DBMS architecture (originally designed and optimized for business data processing) has been used to support many data-centric applications with widely varying characteristics and requirements … This concept is no longer applicable to the database market, and [we believe] the commercial world will fracture into a collection of independent database engines.

This prediction, written years before MongoDB, Apache Cassandra, Neo4j, and other NoSQL databases entered the market, has been prescient. Thoughtworks’ Martin Fowler has explained the reason for such “polyglot persistence”: “Any decent sized enterprise will have a variety of different data storage technologies for different kinds of data.”

Rosoff was even more direct in a conversation we had: “You get huge benefits from specialization.” When I pressed him on this (MongoDB, after all, is often credited with leadership in the NoSQL pack because it’s more general purpose in nature), he explained, “There are 100x gains to be had by being special purpose. MongoDB got huge scale and perf by dumping joins and transactions. There is no free lunch.”

In other words, a do-everything-equally-well database probably doesn’t exist. Indeed, it almost certainly can’t exist.

Multi-model dreaming?

Not everyone agrees, of course. For several years there has been a growing trend around “multi-model” databases, with Azure Cosmos DB simply the newest among a list that includes ArrangoDB, OrientDB, and more. With such multi-model databases, and particularly one backed by Microsoft’s heft and experience in the database market, Serdar Yegulalp argues, Azure Cosmos DB could “challenge assumptions about whether the hard choices we had to make when picking such products even need to be made anymore.”

Olivari took this one step further, telling me, “Native multi-model databases, like ArangoDB, are built to process data in different shapes: key/value pairs, documents and graphs. They allow developers to naturally use all of them with a simple query language that feels like coding. That’s one language to learn, one core to know and operate, one product to support, thus an easier life for everyone.”

Still...it’s one thing to say you can support multiple database models by mapping data from disparate models to a common back-end, and quite another to say you support specific databases. At least, support them well. Microsoft’s DocumentDB (which morphed into CosmosDB) tried to beat MongoDB at the document database game, failed, and then had to embrace MongoDB’s wire protocol to allow MongoDB developers to use their preferred MongoDB drivers and toolchain but push the data into DocumentDB. IBM tried the same thing back in 2013.

Neither has done much to divert MongoDB’s popularity into their wallets. It’s hard to imagine a multi-model database, by definition a jack-of-all-trades, unseating any of the popular databases.

It’s also difficult to imagine developers getting excited about having to master... everything. As Rosoff told me, “Most developers struggle with mastering one model. Giving them more choices just makes that harder.”

Perhaps this is one reason that multi-model databases have been sliding in popularity over the past year: OrientDB, for example, fell to 46th place on DB-Engines’ ranking, from 41st place a year ago. Take a tour of the other multi-model databases and the story is much the same.

Azure Cosmos DB may be different. It may redefine the genre and deliver every database model, all types of consistency, and massive scale. Microsoft, with its expertise in databases, might be able to pull it off. The only thing working against its success is the entrenched idea that an “everything database” cannot truly compete with specialized databases.

According to Stonebraker, the man who did more to give us general purpose relational databases than pretty much anyone else, the answer is that “this strategy has failed already, and will fail more dramatically off into the future.”

Join the newsletter!

Error: Please check your email address.

Tags Microsoft

More about ApacheIBMMicrosoftMySQL

Show Comments
[]