If you work with databases, you’ve almost certainly run across NoSQL; MongoDB is one of the highest-trading stocks in the world right now and a large portion of databases are using MongoDB, and its market share has been on a sharp incline recently. NoSQL offers the ability to access tremendous quantities of information, on a level that SQL couldn’t ever hope to match. However, a lot of people are still using SQL, and with good reason. Let’s stack these database types against each other, and see how they stack up.
Busting some myths
First of all, NoSQL does not supersede SQL. That would be like saying cars are superior to boats because they’re newer technology. NoSQL and SQL do the same thing: store data. They take different approaches to this, and it’s important to know which one is right for your project. The short version is this: SQL goes deep, and noSQL goes wide.
It’s a bad idea to choose your database technology based on your stack. There are reasons why stacks like LAMP and MEAN have derived the way they did, but don’t assume they are rules: you can connect to a MySQL or SQL Server in Node.js; you can use a MongoDB NoSQL database in your PHP or .NET application. You may not find as many tutorials and resources, but your requirements should determine the database type, not the language.
ACID, CAP, and the difficulty of multi-row transactions
NoSQL manages to store data more efficiently by storing it in ways that aren’t as well-structured. Since you’re not dealing with tables any more, it’s impossible to do things like multi-row transactions unless you’re very, very clever. If you can wrap your head around your chosen noSQL database format you can usually kludge something together, but it’s a bit like putting in screws with a hammer.
On the other hand, the complexity of SQL can lead to performance drain. You won’t notice it if you’re dealing with smaller data volumes, but it can start to take a toll on the RDBMS and those delays will pass on to your users.
Atomicity isn’t a concern for most noSQL databases: the ACID philosophy that defined DB management for decades doesn’t apply. Instead, they use CAP: consistency, availability, and partition tolerance. If you’re okay with multi-row queries not being transactionally guaranteed, noSQL is usually fine. It’s not ideal, but it’ll do the job. Because noSQL systems use normalized data models, many of them don’t need to update multiple rows at once anyway, since all the data about a single entity is found in a single record.
NoSQL supports scale-out models, allowing you to easily add new servers right into your cluster. The beauty of this is that it will allow you to grow slowly without the need to sub out your present hardware investment.
When you want NoSQL
NoSQL excels when dealing with high volumes of information low-complexity transactions. It isn’t relational, which is both its strength and weakness—it’s endlessly horizontally scalable, but it can’t access and index specific pieces of complex data with the same speed or precision.
Say you are building a service that has a high amount of users, and you wish to collect data on these users. Using a form of noSQL will allow you to start collecting data in docs, possibly a doc for each user, and make major changes without having to update the docs. Maybe you added new features and now store a preference for that feature in the doc. Old profiles won’t have a preference set, and will be set to default, and it’s just slapped in when they change it. No new columns have to be added to a structured database, no outage and no major overhauls.
If you have a JSON doc for each user that contains data on the user, you can keep collecting and storing it into that doc, while adding new things to collect later. You might start by collecting their basic info like name, address, and email but later decide to start collecting more data on how they use to app in order to improve the experience. In a lot of use cases, you do not need complex queries- simply load the file you need for the user’s configuration and account info, and just store everything you want about them. There’s seldom a need to retrieve data from files of multiple users at once, and this covers what would be a large number of queries in a structured database.
Spotify would be a great example of this. GDPR data requests have revealed they collect sometimes hundreds of megabytes of data for a single user, stored in several JSON documents. They log what you listen to, every time you pause/play or seek a song, even what brand of bluetooth headphones you have connected for every single action. It’s very likely the data collection came gradually, with the flexible format allowing them to add it in piecemeal without making any major updates to their database system.
In this world of data mining, noSQL is king. With this ‘collect first and figure out what to do with it later’ approach many large tech companies use, it’s apparent why it’s necessary. You might wonder if a non structured database would make it harder to actually make use of this data- the data is usually later aggregated with different sources and mined with specialized tools outside of the system it was collected in. There’s an entire industry centered around data warehousing and mining, often utilizing the latest in machine learning, AI, and voodoo magic rituals.
Where SQL is still king
SQL remains the champ if you need to access specific pieces of complex information, stored in a single server. We talked about noSQL scaling well horizontally (that is, across multiple physical servers) but SQL scales well vertically – it’s a better fit for a single, high-powered database server. For a majority of actual work you’ll be doing, that’s fine: ‘one server’ could be a single desktop box, but it could also be a floor-to-ceiling monolith running an industrial system. Vertical scaling only stops being viable when you’re dealing with truly huge amounts of data.
Imagine you’re a hospital. Patient records don’t just have a lot of individual data points in them, they have highly complex data points. You need to store basic information like name and ID number alongside things like blood test results, patient histories, prescriptions. For patients with chronic illnesses or long-term disabilities, a patient file could be huge: tens or hundreds of thousands of words going back decades. It’s not enough to just be able to access that information: you need to access specific, deep parts of it quickly. A bad drug interaction in 1972 is relevant to a prescription today, and you need a language that can query with that level of specificity. With noSQL, that could be an extremely complex process. With SQL?
WHERE interaction = 'negative';
That’s not just simple, it’s probably one of the most simple pieces of SQL you can write. Three short lines of code that would take noSQL half a page and still might give you half a page of useless data alongside it. Or hey, you want to know whether anybody in the entire hospital has had a negative reaction to a particular drug?
WHERE interaction = 'negative';
You change one line of code. There’s a reason SQL has been around since the 70s and is still sitting proud on its throne – nothing else allows for short, simple queries with that sort of accuracy. SQL can’t handle large-scale data volumes, but it’s still a remarkably robust and elegant system.
Sharding is one way to scale database performance. It’s usually done either by storing sections of data in different ‘buckets’ or by using multiple databases with the same schema, allowing the use of many servers. It’s a more advanced topic, but it’s important to note that there are some options for horizontal scaling of a traditional SQL database when performance is the concern.
JOINing it all together
If you’re happily running a SQL database, don’t immediately jump ship for NoSQL. Migrating to a different type of database can be challenging and costly. Instead, take a closer look at the challenges in potential scaling. If you find that your data is not growing at the rates that you expect, or you require support for multi-row transactions (like in credit and debit systems, for example), then stick with SQL. However, if you find that you have emerging use cases that will benefit more from using NoSQL then you might want to explore it—you’ll develop a valuable new skill set while preparing yourself for the future.
Let’s break this down:
|Queries||Yes||No, though often support sql-like queries is added, like CQL for Apache Cassandra|
|Data format||Tables||Docs, Graphs, JSON […]|
|Speed||Fast (when going deep)||Fast (when going wide)|
I talked about hammers and drills earlier, and that’s really the heart of it: do you want a tool that can penetrate deep and with high accuracy, or do you need to bash through data on a scale where precision starts to get in the way? It’s a bad tradesman who says one is better than the other – they’re just good at different jobs.
If you’re looking for a developer with plenty of MySQL and noSQL experience, you should consider talking to CodeClouds. We have a dedicated offshore team available on a competitive and affordable subscription pricing model that lets you get the exact service you need.