A Different Way to Scale
Back in 1998, developers were struggling to keep up with demands of the ever-growing internet. Users were entering the market space by the millions and keeping up with hardware demands was becoming difficult. At the time, traditional databases scaled vertically with the growing demand. Essentially just adding more CPU, memory and disk space to single servers. While this worked well when the internet was small, scaling vertically at this faster pace would prove impossible.
NoSQL was an attempt to solve this problem by changing how scaling was done. Instead of making your Database server bigger of faster, why not simply add another smaller server and distribute the load across them equally? This horizontal scaling model was less expensive, extremely fast to perform and easy enough to implement where business did not need to think too far out.
This distributed model also had another unintended consequence—speed. The server no longer needed to think as much on how to organize its data. It simply wrote to the next available location in the distributed cluster. This trade-off of organization with elasticity freed up CPU and I/O to perform more read and write operations.
Not All NoSQL Was Created Equal
As NoSQL grew in popularity, it began to take on different forms depending on the need. While there were dozens of variations, most fell into one of four categories:
The first type of NoSQL is referred to as Columnar Database. This form of NoSQL is optimized for column level I/O. Data is structured by column as opposed to row. The biggest difference between columnar databases and a traditional relational database is that each record doesn’t require a single value per column. Rather, it’s possible to create column families. A single record might be defined as an ID, a column family for “contact” information, and another column family for “address” information.
One of the distinct advantages of the Columnar architecture is how efficiently data is stored. Column families don’t require fields to always be present, whereas in a relational database you need to provide null values for any fields not necessary in the row. This may seem insignificant, however null padding can start to add up when you factor in the cost of storage.
Another unique advantage to the Columnar model is the speed at which you can aggregate columns. More complex aggregation functions within column families can be performed with much less CPU using this architecture.
Key Value Database
The second form of NoSQL is Key-Value. This database architecture is exactly how it sounds. Objects are stored in associative arrays and accessed via a unique key. Unlike traditional relational databases where schema is determined upfront and enforced strictly during CRUD operations, this form of NoSQL is entirely unstructured.
The primary advantage of this type of NoSQL is read performance. Since data is accessed using hash keys, the CPU is able to locate items very quickly and with minimal effort. This architecture is recommended for applications with a high ratio of reads to write and with a relatively simple data structure. Keep in mind though, data storage tends to be more expensive with Key-Value Databases due to costs of premium SSD storage.
Graph Database architecture was born out of need to store hierarchical data that is difficult to represent in traditional relational databases. Graph architecture stores data in nodes and maintains relationships between nodes using edges (aka graphs).
When compared to traditional relational databases, Graph architecture can offer faster data access speeds due to the data being stored in a more natural format. This format avoids unnecessary joins and complex queries when associating highly normalized data. Keep in mind, the performance benefits are only seen when the node tree is very deep and relationships between nodes are many. In smaller or mid-size applications, a relational database will outperform Graph.
Document Database is the most widely adopted form of NoSQL to date. Document Databases use collections to store documents. Think of a collection as a table and a document as a record in that table. The only difference is collections don’t enforce any rules among its documents (apart from having a unique identifier). This is in direct contrast to the ridged schema present in traditional database architecture—even among some of NoSQL variations.
Situations where the definition of data can change often or in unpredictable ways tend to favor Document Database over other types of NoSQL. In addition to allowing for fluid definitions of documents, the query engine is designed to handle this volatility in a very intuitive way. For example, a query against a collection filtering by a property will only consider documents containing that property. Types are also factored in when filtering. For example, when filtering by a property with an expected numeric value, only documents containing numeric values for that property will be considered.
The cost of this flexibility is a greater emphasis on managing business rules and data requirements in your application code. This trade-off is typically not an issue since most code bases have field definitions and constraints built into their objects anyway.
The Future of NoSQL
As scaling continues to grow as a priority, players such as Microsoft Azure and Google are attempting to solve the horizontal scaling challenge by abstracting away the server entirely. At present these features are somewhat clunky and prone to error, however we should anticipate these solutions becoming more stable and dominant in the years to come.
We wrote a great article that goes into detail on cloud architecture in our blog on cloud vs dedicated hosting. We suggest checking that out if you are interested in or considering a database hosting solution.
To learn more about MongoDB we suggest checking out their website at mongodb.com.