Techworld

The time for NoSQL standards is now

Like Larry Ellison's yacht, the RDBMS is sailing into the sunset. But if NoSQL is to take its place, a standard query language and APIs must emerge soon

A decline for Oracle over the next 15 years is inevitable. It will be impossible to sustain the RDBMS-only paradigm against all logic as the new wave of databases lumped in under NoSQL and big data takes over. Oracle is responding with partnerships, and it already has a NoSQL database, but it's difficult to imagine a transition that leaves Oracle's revenue stream intact -- smells almost like Novell, circa 1996.

Yet the RDBMS will take its time to fade. The reason? Aside from the obvious -- it's an entrenched technology -- the advantages that made the RDBMS ubiquitous in the first place are going to keep it around a bit longer.

[ Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. | Find out which database works best for you in InfoWorld Test Center's survey, "NoSQL standouts: New databases for new applications." | Follow the latest issues in software development with InfoWorld's Developer World newsletter. ]

It may surprise you that I don't consider "transactions" to be one of those advantages. They've been overrated for some time. It's absurd to purport that a transaction, which must be too fine-grained to be useful across multiple request/response cycles, is an indispensible tool for most applications. Moreover, there are other ways to assure reasonable consistency.

The key advantage of the RDBMS, rather, is standardization. The history of relational databases has proven that even a poor job of standardization can create a market better than none at all. Indeed, standardization is the key obstacle to the takeover by the new breed of databases.

A new era for dataA transition to NoSQL (which I prefer to call NewDB) is inevitable. The relational database was created in an era of slow 10MB hard drives and low expectations. NoSQL is the stuff of the Internet age.

NoSQL has been created for an era when storage is cheap, while performance and scalability expectations are high. It's written for an era of digital hoarders. Current business, marketing, and information technology trends have ensured that I am now fully aware of exactly what Kim Kardashian likes and how much you like her. I'm not sure why I need this information, but there it is, and it must be analyzed.

We also live in an era when even as the market is improving, internal IT departments of brick-and-mortar companies are being sized up for outsourcing. The demand for the skilled expertise of those who care for and feed Oracle databases is likely to be forcefully abated. At most companies, the DBA is often no more than a skilled system administrator.

All of this means we need databases that do not require us to flatten the data and force it into a structure that the application must transform to use. We need databases that can handle today's massive data storage across as many disks as necessary to meet our needs for immediate gratification. A delay is simply not acceptable.

Standards, anyone?Yet there are obstacles to this transition. First, NoSQL lacks a dominant force. For the RDBMS, no matter which product you choose, you have at least a subset of ANSI standard SQL on which you can depend. For any of the new databases, you may have Pig, Hive, SPARQL, Mongo Query Language, Cypher, or others. These languages have little in common. For the RDBMS, you have some connector standard, at least, in the venerable ODBC. For NewDB, you must rely on a database-specific connector.

Most people knew the RDBMS wasn't for everything and everyone, but the standard created the market. Markets lure a cascade of long-term institutional and individual investment; they also create longevity. Standards survive products, vendors, and the myopic, short attention spans of venture capitalists. There are thousands of niches in our industry -- expensive software that nearly no one uses -- but the technologies that have been most profitable over the long term have played in a standardized space. The technologies that have outlived their VC-funded startups are those where a market was created.

What does NewDB need to dislodge Oracle? Principally, the service programming interface for database drivers and Application APIs for major languages and platforms, as well as a standard query language.

That shouldn't and mustn't require much effort for anything but the specific and the exotic. In other words, it should be possible to use the mandatory features of a query language for normal queries like "retrieve by ID" or "find by property or child property." A basic driver and standardized API should be usable for CRUD operations and elementary finder queries. Optional features and specific APIs should only be required for features that are unique to the database or database type such as the distance of two records along a graph.

This doesn't need to be a perfect effort. Certainly no RDBMS vendor's dreams have been complete without their individual incompatible extensions and quirks that lock in customers. The standard just needs to be "good enough" to make people feel like they could switch databases. In some ways, this is more important for the so-called polyglot persistence of NewDB. I may start out with a document database problem that seems made for MongoDB when a new requirement makes the problem just fit a graph database like Neo4j so much better.

A common denominator for queriesIn this polyglot nature lies the problem. A query language designed for a graph database isn't necessarily attuned for a document database or a key-value pair structure. In many cases this is OK because most of the time we're querying for simple items. Most of these databases support some form of hierarchical query (that is, I want all parents, grandparents, or great-grandparents of red-headed children or all customers who ordered any product that contained a particular model of transistor no matter how deeply embedded), where SQL does not. Obviously, a NewDB standard query language should support that. A standard need not always say "must"; it can say "if possible."

Last year, there was a ray of hope: A couple of Microsoft researchers noted that NoSQL needed standards. This sparked a renewed effort to implement UnQL and LINQ support for MongoDB, which covered both the point-to-point nature of platform support and an attempt at a basic unified query language. Spring Data is unifying the Java world around a CRUD interface, however, so we're still in the mud. UnQL doesn't seem to have legs. What about Neo4j and LINQ?

What's needed now is for the NoSQL vendors (10gen, Cloudbase, and so on), interested parties (such as SpringSource, Red Hat, Microsoft, and IBM), and various projects to come together, take some of these separate efforts, and propose standards. First, define the query level. Then define the connector standards.

Aside from the market-creating effects of standardization, this kind of mature effort tends to make purchasing managers feel more comfortable that they haven't bet on the next "OODBMS" pipe dream. There's money in them hills, and the equipment to mine it is called standardization. It is not only in our best interest as technology consumers that this process starts, but also in the best interests of the NoSQL vendors and projects themselves.

So how about it, Apache Hadoopers? How about it, 10gen? How about it, Neo Technologies? Cloudbase? Go ahead, compete, but let's also raise the tide for all boats. Well, maybe not the SS Ellison -- I'm sure he'll get by.

This article, "The time for NoSQL standards is now," was originally published at InfoWorld.com. Follow the latest developments in business technology news and get a digest of the key stories each day in the InfoWorld Daily newsletter. For the latest business technology news, follow InfoWorld on Twitter.

Read more about data management in InfoWorld's Data Management Channel.

Comments

Comments are now closed

Twitter Feed

Featured Whitepapers