Tech - NOSQL Tutorial - Part 1 (Introduction)

Several people and organizations have cited the growing size and consumption of data in the current digital age. I have deliberately used the word 'digital' instead of 'internet'. There is growing convergence of internet, media (video, audio) and devices. One can today, browse a newspaper on a Smart TV and conversely view a movie on a phone or tablet. There is of course the 'social' chatter on sites such as Facebook, Flickr and Twitter to name a few.

The implication of all this is of course transmission and storage of data. I am going to cover the latter aspect in this series of articles.

I have worked on various RDBMS starting from databases like SQL Server, Oracle and right up to DB2 and Teradata. I have always been curious about how data and it's application. So, I have been researching on some of the technologies that power large scale data. A key component has been the rising use 'NOSQL' databases. The NO - doesn't stand for No as in Dr. No but for 'Not Only'. This should avoid any perception that these databases should replace the traditional RDBMS. Now that we have that sorted out, hopefully I will not have the hordes of database enthusiasts despairing on me!

To start with, there are no standards yet that have emerged among NOSQL databases. It has so far been powered by Open Source products and organizations. In general, the most common types of NOSQL databases are the following -
  1. Key - Value based - Think of it as a hash with key:value pairs of data. Examples - Amazon Dynamo, Voldemort.
  2. Column based - A multi-dimensional sorted map. Examples - Google Big Table, HBase.
  3. Document based - Schema free, document based databases. Examples - Apache CouchDB, MongoDB.
  4. Graph - Inspired by the Graphy theory (yes, from Mathematics). Examples - Neo4j

While the above databases are different in their own way, they all share some common characteristics, like -
  1. NOSQL has been designed from the ground up to be hugely scalable and very fast.
  2. They are very web friendly with API's to access and manipulate data.
  3. They may not support ACID (Atomicity, Consistency, Isolation, Durability) but lean more towards CAP (Consistency, Availability, Partition Tolerance).
As most people have been used to RDBMS, comparisons are inevitable. Since, we don't have a standard yet for the NOSQL databases, I will refrain from it for now.

Some of the popular websites that use the various NOSQL databases (among other databases) are - Twitter (Cassandra, Hbase), Facebook (Cassandra), Google (Big Table), LinkedIn (Voldemort). Typically, the data that is generated on these sites run into several terabytes per day. The kind of data that is generated is quite varied too - right from search terms to tweets to photos. Imagine doing a sentiment or cohort analysis on a database with millions or even billions of rows and you get the picture.

One trend seems to be emerging, the era of using only an RDBMS for all data storage purposes is over. NOSQL databases seem to be emerging as a credible option for some of the applications. As they say, horses for the courses. For now, I am hooked into this new breed of databases. 

In the next part, I will start looking at one of these databases and dig a bit deeper.

Click here for - Part 2..

Comments

Popular posts from this blog

Cloudera Quick Start VM in Hyper-V

Book Review - The Price of Being Fair

Azure Chronicles - VM Security