How you can save millions with NoSQL…
With all the hullabaloo with NoSQL companies going public and then losing a third of their valuation, being bought out, and some even going into administration and then being resurrected by customers – you could be forgiven for thinking the NoSQL bubble has spectacularly burst.
It’s crunch time for NoSQL companies, that’s for sure. They need to grow up. Rapidly. They need to become easier to deploy, use and manage. But they’re working on that.
The effort to deploy NoSQL databases, though, is worth it for so many customers. I wanted to bang the drum a little for NoSQL in these troubled times, so here’s my list of how you can really benefit from using them.
Oh and it’s a stream of thought kinda thing – these are in no particular order of preference.
1. You could save millions on Oracle Coherence or Software AG Terracotta licenses
These middleware layers are used to cache often used data in the Java tier. They are extensively used in Financial Services to reduce load on underlying databases, or act as in-process caches, and distributed memory shared across many machines.
They’re bloody expensive though!!! Like… eye-wateringly expensive. Even for very rich banking types.
You can use a NoSQL key-value store to do a similar job, and for much less cash. An in-memory key-value store for transient data can easily be powered by the extremely lightweight Redis NoSQL database.
If you’re in the cloud, take a look at AWS DynamoDB.
These also have useful functionality for complex and custom data types – so they may even be easier to code against for some use cases.
2. Save millions on not coding around relational database structural issues
Relational databases are great…
For relational data.
For structures that change rapidly, or that are deeply nested, they introduce a bunch of overhead.
Imagine an XML or JSON document structure. Say a complex trade FPML document. Let’s say you want to update it whole, but also want to be able to retrieve it by key fields on an ad-hoc basis.
To do this in relational databases you have to either code a special function to handle introspecting the XML data – not the fastest – or duplicate some of the fields as relational columns – not easy on storage space.
And then you’ve got to code around the issue, and spend a lot of time thinking of the best way to do things. It’s just not fun, or productive.
Using a document NoSQL database which can natively handle either XML (MarkLogic) or JSON (pretty much all of them – MongoDB, ArangoDB, CosmosDB, MarkLogic, et al) data will greatly simplify storage and query of complex document structures.
And they’ll save you a boot load of development and testing time, too.
Oh and their licenses are cheaper, and they run on commodity hardware. That’s easy maths.
3. Save pulling out your hair doing complex queries over thousands of pieces of related data
If you’ve got ridiculously complex relationships (in data… not your personal life…) between entities in a complex graph of information, and need to traverse those relationships, you need an effective way to index that data in order to make those queries fly.
This is where the SPOGI style indexes come in in Graph NoSQL databases! Good choices here are Allegrograph (Very standards compliant), GraphDB, and Neo4j (not W3C standards compliant, but has a very nice query language of its own).
Shortest path queries are really, really computationally complex. Especially if they’re calculating the ‘cost’ of the paths as they traverse them from data in the graph.
If you’re doing a lot of these queries (E.g. sat nav style application), then you need a dedicated data store.
Only Graph NoSQL databases provide that.
4. If you have a whole bucket load of data about each record… but only sometimes
Sometimes it’s possible for a single record to have only a few properties out of thousands of possible ones.
You may not want the overhead of defining them all up front – or you may simply not know them all up front!
You may also only want to pull some groups of properties back in one go (E.g. a Summary, or one aspect of the entity), and want it to be fast.
The way Wide column stores, aka Columnar NoSQL databases, aka BigTable clones, work make this very efficient.
Be it Hypertable (A good commercial offering), Cassandra (aka DataStax Enterprise), or Accumulo (good for securing individual data fields), these NoSQL databases can simplify your application and drive more performance.
They’re also easier to understand if your mind is totally stuck in the world of tables and columns!
5. You may be totally indecisive, or have every type of data imaginable, but don’t want 10 different database products
In this instance, a hybrid NoSQL database may be for you. Ones that can handle a variety of query types – be they simple key-value/name-fetch applications, document structures, and graph queries – can all be handled by a single, true-hybrid, database using one API.
These databases include MarkLogic Server or ArangoDB. Definitely try those out.
In Summary
There are a few difficulties in using NoSQL databases – but the benefits far, far outweigh them. The great thing is, you can download the above databases and try them out in minutes.
No great time overhead, and not sales droids to talk to until you think you may find them valuable. Just have a go today.
I cannot recommend enough that you open your mind and try them out. The possibilities are truly endless.
Not to mention, profitable for you and your employer!!!
Hi Adam,
I support a relational db with a high update to read ratio. Each user has a profile saved to the db that contains the widget layout. Every widget change (add/remove, move position) results in an update to the database. Currently, there are 600 to 1000 concurrent users. When does it make sense to look into NoSQL solutions? If so, what type of NoSQL solution would I start to look at?
Thank you!
I would be very tempted to use a key value store for these types of preferences. Very fast in memory updates, but backed by disk so the user has preferences saved between sessions (although that is optional… depends on your requirements). Consider redis, or a hosted version like redis labs, or aws DynamoDB. Hope that helps!