TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
- Webinars
  - Expert Panel Exploring Best Practices for Unified Data Management January 13, 2025
  - De-Risking Innovation: Safely Adopting GenAI January 14, 2025
  - Building Reliable Data and AI Systems January 15, 2025
  - Talking Business to Your Data: Conversational Analytics January 16, 2025
- Virtual Summits
  - Virtual Events TDWI Virtual Summit Series: Generative AI in Action: Lessons Learned from Successful Implementations December 9, 2024
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
Train
- In-Person Events
  - Conference TDWI Transform West - Las Vegas December 13, 2024
  - Executive Summit TDWI Modern Data Leader's Summit Las Vegas: Modern Data Foundations: Essential Strategies for AI Success December 20, 2024
- Virtual Live Seminars
  - Seminar Data Architecture Essentials: Building a Data Foundation for Enterprise Analytics November 26, 2024
  - Seminar Getting Started with AI in Your Organization November 26, 2024
  - Seminar Data Modeling Essentials November 26, 2024
  - Seminar ChatGPT 101 for Business Users November 26, 2024
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

NoSQL and Hadoop: Document-Based versus Relational Databases

NoSQL and Hadoop have overlapping capabilities but they are not competitors. We examine seven features that differentiate a NOSQL document database from a relational database.

By Sachin Sinha, Mehul Shah
January 10, 2017

If the term "big data" has been bandied around your organization as something that should be further explored, then you've likely also heard about Hadoop and NoSQL. Both these technologies are closely associated with big data, and therefore overlap in terms of internal architecture and functionality. For example, they're both great for managing large and rapidly growing data sets, and they're both great for handling a variety of data formats, following the schema-on-read design paradigm.

Both can leverage commodity hardware and scale horizontally, also referred to as scaling out. Contrast this to scaling up, in which you upgrade your existing servers with more powerful hardware, as with traditional relational databases. With regard to data formats, both technologies are suitable for the different types you want to manage, including structured, semistructured, and unstructured.

With these overlapping capabilities, it might seem that NoSQL and Hadoop are direct competitors, right? Not exactly. Although each technology is great for handling big data, they are intended for different types of workloads. A simple way to distinguish them is to look at the workloads they handle best. Hadoop is good for analytics- and historical-archive use cases, whereas NoSQL shines itself in operational workloads complementing their relational counterparts.

NoSQL databases started their journey as key-value store databases and later document/JSON and graph databases joined them. Although the simplicity of key-value store databases made them popular, increasingly people started asking for more when it came to storing complex and hierarchical data typically stored in a JSON or XML. That gave rise to the document-oriented database.

Today, document-oriented databases are one of the main categories of NoSQL databases. The central concept of a document-oriented database is a hierarchical document. While each document-oriented database differs on internal implementation, in general, they all assume data is encoded in some standard JSON or JSON like format. These DBMS provide many advantages over the relational databases especially allowing schema flexibility, high availability, and data distributed across multiple nodes in cluster.

Below are the key features that differentiate a NOSQL document database from a relational database.

High availability: Document databases are highly available and provide much better SLAs compared to their relational counterparts by being distributed horizontally as part of a cluster.

Consistency: Document databases usually lean more towards relaxed consistency models, so reads will always lag behind few writes. It's a classic CAP (consistency, availability, and partitioning) theorem tradeoff where you get higher availability and horizontal scaling in return for looser consistency. Most document databases either provide strong or weak consistency with an exception on Azure Cloud=based Document DB which provides four consistency models: strong, bounded staleness, session, and eventual. This provides more choices to the application builders.

Partitioning/sharding: Data in document databases is partitioned using a hash- or round-robin-based approach. This allows for storing and managing large data volumes at scale. This also makes the read/write access faster, allowing for much higher throughput compared to their relational counterparts.

Data model flexibility: This is one of the hallmarks of document databases -- they allow developers to model the database exactly as objects in their application. In addition, schema-on-read enables faster development, reducing overall time to market. They eliminate the object relational impedance mismatch by modeling the application behavior. In an ever-changing business landscape, this becomes even more important to allow the data model to evolve easily and keep application development time reasonable.

Querying capability: Document databases provide multiple ways to query the data. These methods range from something as simple as REST operations such as GET/PUT/POST/DELETE to SQL like queries. Some of these (such as Mongo DB) allow secondary indexes. Azure Document DB indexes all the properties of a document by default without compromising performance. Some databases (such as Azure Document DB) provide a rich, SQL-like query language syntax and support most ANSI SQL operations, whereas MongoDB has a rich ecosystem of developer tools which aid with faster delivery and easy adoption.

Transaction support: Document databases are usually weak in this area and provide BASE support. Many of them provide transaction support within a collection only. Generally they don't provide transactions across collections. If they do, it's at the cost of higher latency for reads and writes.

Elastic scale: Cloud-based document databases provide the elastic scale capability to meet the growing demands of the application. Storage and computing resources both can be scaled to provide additional capacity. Both AWS Dynamo DB and Azure Document DB provide elastic scale that can be programmed easily, so applications can scale out during peak hours and times and scale back to the regular workload when not needed.

Some of the big name vendors currently in market can be broadly classified into two high-level categories:

On premise: Cassandra, MongoDB, CouchDB

Public Cloud: AWS DynamoDB, Azure Document DB, and Google Cloud Datastore

It should be noted that on-premises vendors do run on cloud mostly as infrastructure-as-a-service. The native cloud-based vendors such as AWS Dynamo DB and Azure Document DB operate in platform-as-a-service mode with very low maintenance overhead and smaller TCO.

A Final Word

We have just scratched the surface of document-based databases to help readers understand the differences as well as advantages over relational databases. Many new cloud-based, mobile-based applications are adopting a polyglot persistence strategy of using one or more data stores. One can have a relational database along with a document-based database, both as part of the application to allow for handling the new world scenarios. We want to call out the advantages of native cloud-based document databases such as Azure Document DB and AWS Dynamo DB because they have much lower TCO compared to their on-premises counterparts such as Mongo DB and Cassandra.

Whether on-premises or cloud only, document-based NOSQL provides features, functionality, and flexibility unavailable a few years ago. We believe this is the new frontier in big data for operational workloads that is bound to expand many times in the coming years.

About the Authors

Sachin Sinha is director of big data analytics at ThrivON. In this role, Mr. Sinha is responsible for design of innovative architectures, development of methodologies, and delivery of solutions in big data, analytics, and data warehousing that help clients realize maximum value from their data assets. For over 15 years, Mr. Sinha has designed, architected, and delivered big data, data warehousing, and business analytics solutions. Specializing in data engineering and architecture, Mr Sinha's domestic and international consulting portfolio includes a broad array of organizations in the healthcare, financial services, insurance, pharmaceutical, and energy domains. You can contact the author at ssinha@thrivon.com.

Mehul Shah is a senior solutions architect for Microsoft. He engages and consults with business and technical leaders of large enterprise customers and partners on cloud strategy and architecture journey, digital transformation, data and analytics strategy, architecture, design, and data Program/project planning. He has over 15 years of experience in information management and successfully led, managed, and executed enterprise information management projects for commercial and government organizations. He earned an MBA in marketing and analytics and an MS in computer science from the University of Maryland. You can contact the author at scorpmel@gmail.com or visit his blog at mehulshah008.blogspot.com.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

NoSQL and Hadoop: Document-Based versus Relational Databases

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

NoSQL and Hadoop: Document-Based versus Relational Databases

Related Articles

Trending Articles

What’s Ahead in Generative AI in 2025? (Part Two)

What’s Ahead in Generative AI in 2025? (Part One)

Curb Your Hallucination: Open Source Vector Search for AI

4 Practical Tips to Create Value with AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career