A Different Way to Scale

Back in 1998, developers were struggling to keep up with demands of the ever-growing internet. Users were entering the market space by the millions and keeping up with hardware demands was becoming difficult. At the time, traditional databases scaled vertically with the growing demand. Essentially just adding more CPU, memory and disk space to single servers. While this worked well when the internet was small, scaling vertically at this faster pace would prove impossible.

NoSQL was an attempt to solve this problem by changing how scaling was done. Instead of making your Database server bigger of faster, why not simply add another smaller server and distribute the load across them equally? This horizontal scaling model was less expensive, extremely fast to perform and easy enough to implement where business did not need to think too far out.

Slow
Slow
Fast
Fast
Scaling Vertically
Scaling Vertically
Scaling Horizontally
Scaling Horizontally
Large
User-base
[Not supported by viewer]
Large
User-base
[Not supported by viewer]
vs
vs
Small
User-base
[Not supported by viewer]
Small
User-base
[Not supported by viewer]
more ram
more ram
faster cpu
faster cpu
larger disk
larger disk
more smaller servers
more smaller servers

This distributed model also had another unintended consequence—speed. The server no longer needed to think as much on how to organize its data. It simply wrote to the next available location in the distributed cluster. This trade-off of organization with elasticity freed up CPU and I/O to perform more read and write operations.

Not All NoSQL Was Created Equal

As NoSQL grew in popularity, it began to take on different forms depending on the need. While there were dozens of variations, most fell into one of four categories:

Columnar Database

The first type of NoSQL is referred to as Columnar Database. This form of NoSQL is optimized for column level I/O. Data is structured by column as opposed to row. The biggest difference between columnar databases and a traditional relational database is that each record doesn’t require a single value per column. Rather, it’s possible to create column families. A single record might be defined as an ID, a column family for “contact” information, and another column family for “address” information.

Contact
FirstName: Becky
<b style="font-size: 10px">FirstName</b>: Becky
LastName: Smith
<b style="font-size: 10px">LastName</b>: Smith
Key
1002
1002
1001
1001
Address
Zip: 65806
[Not supported by viewer]
City: Springfield
[Not supported by viewer]
State: MO
[Not supported by viewer]
FirstName: Joe
[Not supported by viewer]
MiddleName: Taylor
<b style="font-size: 10px">MiddleName</b>: Taylor
LastName: Wilson
<b style="font-size: 10px">LastName</b>: Wilson
Address: 123 Main St.
<b style="font-size: 10px">Address:</b> 123 Main St.
Zip: 65721
[Not supported by viewer]
Column Family
Column Family
Row Key
Row Key
Column
Column

One of the distinct advantages of the Columnar architecture is how efficiently data is stored. Column families don’t require fields to always be present, whereas in a relational database you need to provide null values for any fields not necessary in the row. This may seem insignificant, however null padding can start to add up when you factor in the cost of storage.

Another unique advantage to the Columnar model is the speed at which you can aggregate columns. More complex aggregation functions within column families can be performed with much less CPU using this architecture.

Key Value Database

The second form of NoSQL is Key-Value. This database architecture is exactly how it sounds. Objects are stored in associative arrays and accessed via a unique key. Unlike traditional relational databases where schema is determined upfront and enforced strictly during CRUD operations, this form of NoSQL is entirely unstructured.

firstname=john
[Not supported by viewer]
u0000001a
u0000001a
lastname=doe
[Not supported by viewer]
dob=01/02/1989
[Not supported by viewer]
firstname=jane
[Not supported by viewer]
u0000002b
u0000002b
firstname=jack
[Not supported by viewer]
u0000003c
u0000003c
lastname=jackson
<b style="font-size: 11px">lastname</b>=jackson
Item
Item
Hash Key
Hash Key
Attributes
Attributes

The primary advantage of this type of NoSQL is read performance. Since data is accessed using hash keys, the CPU is able to locate items very quickly and with minimal effort. This architecture is recommended for applications with a high ratio of reads to write and with a relatively simple data structure. Keep in mind though, data storage tends to be more expensive with Key-Value Databases due to costs of premium SSD storage.

Graph Database

Graph Database architecture was born out of need to store hierarchical data that is difficult to represent in traditional relational databases. Graph architecture stores data in nodes and maintains relationships between nodes using edges (aka graphs).

Purchase
Purchase
User
User
Have
Have
Contain
Contain
Ships From
Ships From
Orders
Orders
Items
Items
Belong To
Belong To
Have
Have
Invoices
Invoices
From
From
Payments
Payments
Have
Have
Supplier
Supplier
Edges
Edges
Nodes
Nodes

When compared to traditional relational databases, Graph architecture can offer faster data access speeds due to the data being stored in a more natural format. This format avoids unnecessary joins and complex queries when associating highly normalized data. Keep in mind, the performance benefits are only seen when the node tree is very deep and relationships between nodes are many. In smaller or mid-size applications, a relational database will outperform Graph.

Document Database

Document Database is the most widely adopted form of NoSQL to date. Document Databases use collections to store documents. Think of a collection as a table and a document as a record in that table. The only difference is collections don’t enforce any rules among its documents (apart from having a unique identifier). This is in direct contrast to the ridged schema present in traditional database architecture—even among some of NoSQL variations.



{
   "id":"0001",
   "firstName":"john",
   "lastName":"doe"
}
[Not supported by viewer]


{
   "id":"0002",
   "firstName":"jane",
   "lastName":"doe",
   "dob":"1/2/1986"
}
[Not supported by viewer]
Contact


{
   "id":"0002",
   "firstName":"jack",
   "lastName":"jackson",
   "address":{
      "line1":"123 main st.",
      "city":"springfield",
      "state":"mo",
      "zip":60821
   }
}
[Not supported by viewer]
Collection
Collection
Document
Document

Situations where the definition of data can change often or in unpredictable ways tend to favor Document Database over other types of NoSQL. In addition to allowing for fluid definitions of documents, the query engine is designed to handle this volatility in a very intuitive way. For example, a query against a collection filtering by a property will only consider documents containing that property. Types are also factored in when filtering. For example, when filtering by a property with an expected numeric value, only documents containing numeric values for that property will be considered.

The cost of this flexibility is a greater emphasis on managing business rules and data requirements in your application code. This trade-off is typically not an issue since most code bases have field definitions and constraints built into their objects anyway.

The Future of NoSQL

As scaling continues to grow as a priority, players such as Microsoft Azure and Google are attempting to solve the horizontal scaling challenge by abstracting away the server entirely. At present these features are somewhat clunky and prone to error, however we should anticipate these solutions becoming more stable and dominant in the years to come.

We wrote a great article that goes into detail on cloud architecture in our blog on cloud vs dedicated hosting. We suggest checking that out if you are interested in or considering a database hosting solution.

To learn more about MongoDB we suggest checking out their website at mongodb.com.

Get Meshy