One of the most fundamental choices to make when developing an application is whether to use a SQL or NoSQL database to store the data. Conventional SQL (i.e. relational) databases are the product of decades of technology evolution, good practice, and real-world stress testing. They are designed for reliable transactions and ad hoc queries, the staples of line of business applications. But they also come burdened with restrictions—such as rigid schema—that make them less suitable for other kinds of apps.
NoSQL databases arose in response to those limitations. NoSQL systems store and manage data in ways that allow for high operational speed and great flexibility on the part of the developers. Many were developed by companies like Google, Amazon, Yahoo, and Facebook that sought better ways to store content or process data for massive websites. Unlike SQL databases, many NoSQL databases can be scaled horizontally across hundreds or thousands of servers.
The advantages of NoSQL don’t come without a cost, though. NoSQL systems don’t generally provide the same level of data consistency as SQL databases. In fact, while SQL databases have traditionally sacrificed performance and scalability for the ACID properties behind reliable transactions, NoSQL databases have largely ditched those ACID guarantees for speed and scalability.
In short, SQL and NoSQL databases offer different tradeoffs. While they may compete in the context of a specific project—as in, which to choose for this application or that application—they are complementary in the bigger picture. Each is suited to different use cases. The decision is not so much a case of either/or as it is a question of which tool is right for the job.
NoSQL vs. SQL
The fundamental difference between SQL and NoSQL is not all that complicated. Each has a different philosophy for how data should be stored and retrieved.
With SQL databases, all data has an inherent structure. A conventional database like Microsoft SQL Server, MySQL, or Oracle Database uses a schema—a formal definition of how data inserted into the database will be composed. For instance, a given column in a table may be restricted to integers only. As a result, the data recorded in the column will have a high degree of normalization. A SQL database’s rigid schema also makes it relatively easy to perform aggregations on the data, for instance by way of JOINs.
With NoSQL, data can be stored in a schema-less or free-form fashion. Any data can be stored in any record. Among the NoSQL databases, you will find four common models for storing data, which lead to four common types of NoSQL systems:
- Document databases (e.g. CouchDB, MongoDB). Inserted data is stored in the form of free-form JSON structures or “documents,” where the data could be anything from integers to strings to freeform text. There is no inherent need to specify what fields, if any, a document will contain.
- Key-value stores (e.g. Redis, Riak). Free-form values—from simple integers or strings to complex JSON documents—are accessed in the database by way of keys.
- Wide column stores (e.g. HBase, Cassandra). Data is stored in columns instead of rows as in a conventional SQL system. Any number of columns (and therefore many different types of data) can be grouped or aggregated as needed for queries or data views.
- Graph databases (e.g. Neo4j). Data is represented as a network or graph of entities and their relationships, with each node in the graph a free-form chunk of data.
Schema-less data storage is useful in the following scenarios:
- You want fast access to the data, and you’re more concerned with speed and simplicity of access than reliable transactions or consistency.
- You’re storing a large volume of data, and you don’t want to lock yourself into a schema, as changing the schema later could be slow and painful.
- You’re taking in unstructured data from one or more sources that produce it, and you want to keep the data in its original form for maximum flexibility.
- You want to store data in a hierarchical structure, but you want those hierarchies to be described by the data itself, not an external schema. NoSQL allows data to be casually self-referential in ways that are more complex for SQL databases to emulate.
Querying NoSQL databases
The Structured Query Language used by traditional databases provides a uniform way to communicate with the server when storing and retrieving data. SQL syntax is highly standardized, so while individual databases may handle certain operations differently (e.g., window functions), the basics remain the same.
By contrast, each NoSQL database tends to have its own syntax for querying and managing the data. CouchDB, for instance, uses requests in the form of JSON, sent via HTTP, to create or retrieve documents from its database. MongoDB sends JSON objects over a binary protocol, by way of a command-line interface or a language library.
Some NoSQL products can use SQL-like syntax to work with data, but only to a limited extent. For example, Apache Cassandra, a column store database, has its own SQL-like language, the Cassandra Query Language or CQL. Some of the CQL syntax is straight out of the SQL playbook, like the SELECT or INSERT keywords. But there is no way to perform a JOIN or subquery in Cassandra, and thus the related keywords don’t exist in CQL.
A design choice common to NoSQL systems is a “shared-nothing” architecture. In a shared-nothing design, each server node in the cluster operates independently of every other node. The system doesn’t have to get consensus from every single node to return a piece of data to a client. Queries are fast because they can be returned from whichever node is closest or most convenient.
Another advantage of shared-nothing is resiliency and scale-out. Scaling out the cluster is as easy as spinning up new nodes in the cluster and waiting for them to sync with the others. If a NoSQL node goes down, the other servers in the cluster will continue to chug along. All the data remains available, even if fewer nodes are available to serve requests.
Note that a shared-nothing design is not exclusive to NoSQL databases. Many conventional SQL systems can be set up in a shared-nothing fashion, although that typically involves sacrificing consistency across the cluster for performance.
If NoSQL provides so much freedom and flexibility, why not abandon SQL entirely? The simple answer: Many applications still call for the kinds of constraints, consistency, and safeguards that SQL databases provide. In those cases, some “advantages” of NoSQL may turn to disadvantages. Other limitations stem from the fact that NoSQL systems are relatively new.
Even if you’re taking in free-form data, you almost always need to impose constraints on it to make it useful. With NoSQL, imposing constraints involves shifting the responsibility from the database to the application developer. For instance, the developer could impose structure through an object relational mapping system, or ORM. But if you want the schema to live with the data itself, NoSQL does not typically do that.
Some NoSQL solutions provide optional data typing and validation mechanisms for data. Apache Cassandra, for instance, has a slew of native data types that are reminiscent of those found in conventional SQL.
NoSQL systems trade strong or immediate consistency for better availability and performance. Conventional databases ensure that operations are atomic (all parts of a transaction succeed, or none do), consistent (all users have the same view of the data), isolated (transactions don’t compete), and durable (once completed they will survive a server failure).
These four properties, collectively referred to as ACID, are handled differently in most NoSQL systems. Instead of immediate consistency across the cluster, you have eventual consistency, due to the time needed to copy updates to other nodes in the cluster. Data inserted into the cluster is eventually available everywhere, but you can’t guarantee when.
Transaction semantics, which in a SQL system guarantee that all steps in a transaction (e.g. executing a sale and reducing inventory) are either completed or rolled back, aren’t typically available in NoSQL. For any system where there needs to be a “single source of truth,” such as a bank, the NoSQL approach won’t work well. You don’t want your bank balance to be different depending on which ATM you go to; you want it to be reported as the same thing everywhere.
Some NoSQL databases have partial mechanisms for working around this. For instance, MongoDB has consistency guarantees for individual operations, but not for the database as a whole. Microsoft Azure CosmosDB lets you select a level of consistency per request, so you can choose the behavior that fits your use case. But with NoSQL, expect eventual consistency as the default behavior.
Most NoSQL systems are conceptually similar, but are implemented very differently. Each tends to have its own metaphors and mechanisms for how data is queried and managed.
One side effect of that is a potentially high degree of coupling between the application logic and the database. This isn’t so bad if you pick a NoSQL system and stick with it, but it can become a stumbling block if you change systems down the road.
If you migrate from, say, MongoDB to CouchDB (or vice versa), you must do more than just migrate data. You also must navigate the differences in data access and programmatic metaphors—in other words, you must rewrite the parts of your application that access the database.
Another downside to NoSQL is the relative lack of expertise. Where the market for conventional SQL talent is still quite large, the market for NoSQL skills is nascent.
For reference, Indeed.com reports that as of the end of 2017, the volume of job listings for conventional SQL databases—MySQL, Microsoft SQL Server, Oracle Database, and so on—remains higher over the last three years than the volume of jobs for MongoDB, Couchbase, and Cassandra. The demand for NoSQL expertise is growing, but it’s still a fraction of the market for conventional SQL.
Merging SQL and NoSQL
We can expect some of the differences between SQL and NoSQL systems to disappear over time. Already many SQL databases now accept JSON documents as a native data type, and can perform queries against that data. Some even have native ways to impose constraints on JSON data, so that it is handled with the same rigors as conventional row-and-column data.
On the flip side, NoSQL databases are not only adding SQL-like query languages, but other capabilities of traditional SQL databases. For instance, at least two document databases – MarkLogic and RavenDB – promise to be ACID compliant.
Here and there are signs that future generations of databases will straddle the paradigms and offer both NoSQL and SQL functionality. Microsoft’s Azure Cosmos DB, for instance, uses a set of primitives under the hood to interchangeably reproduce the behaviors of both kinds of systems. Google Cloud Spanner is a SQL database that combines strong consistency with the horizontal scalability of NoSQL systems.
Still, pure SQL and pure NoSQL systems will have their place for many years to come. Look to NoSQL for fast, highly scalable access to free-form data. This comes with a few costs, like consistency of reads and other safeguards common to SQL databases. But for many applications, those safeguards may well be worth trading for what NoSQL offers.