Real world NoSQL performance pitfalls…

I’ve said for a long time that I am concerned about the accuracy of performance claims on Open Source NoSQL databases. In this blog post I give you a couple of links to people who have tried these databases in the Real World and found them wanting.

Most databases will (or should!) perform just fine out of the box. You try the examples and samples that come with them and they will be blisteringly fast. Step beyond these, however, and you start to come across performance issues. This is true of any technology that is either a) Not been around long enough to have Best Practice guides/expertise and b) Only tested against theoretical workloads rather than Enterprise real-world workloads across many customers, on business critical systems.

Clifford Farrugia has a great article on MongoDB performance pitfalls, for example, that makes this very point. Amongst other things he mentions that MongoDB suffers in complex queries because in its implementation these large result sets have to be processed by a JavaScript engine. He also points out that the language bindings for MongoDB can have their own performance issues. I suspect this again is due to a lack of real-world workload performance testing.

Clifford Farrugia has another article where he compares disc usage between MySQL and MongoDB. To a certain extent a document-orientated NoSQL database will always use more disc space. This is because they by default have a bunch of indexes turned on “just in case”. What is particularly interesting here though are the subtle points he makes around MongoDB not having data compression, and how much impact longer data item names (that’s right – names, not content) has on storage use.

He makes the valid point that if your app is not read-heavy, then having a memory mapped implementation that relies on having enough RAM for all the cached data (rather than on the fly cache optimisation like MarkLogic) can be an expensive alternative. Yes Open Source software is free, but the hardware ain’t.

I’ve also found a great thread on Stack Overflow about CouchDB performance being less than ideal. This makes some great points again about design deficiencies and their impact on performance. Particularly worrying are comments about CouchDB’s HTTP layer being slow (ironic given it’s an Apache project), and that a document in the hundreds of KB (yes, KB) is larger than what someone might expect to be managed by CouchDB! CouchDB’s on the fly JSON encoding/decoding comes in for criticism too.

It does sound like I’m bashing the others, and to an extent I am, but I’m not making this stuff up. Just search for “<your fave NoSQL DB> performance issues” and you’ll see a plethora of problems. Do the same query for MarkLogic, and the first thread you come across that isn’t from our website or a detailed tuning guide is this one on Stack Overflow. I found this particulary amusing, as both the top two Answers mention the authors dropping Open Source databases for MarkLogic when the performance or scale reaches a critical point. Clearly for real world loads MarkLogic is on to something.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.