Hybrid NoSQL: MarkLogic as a Key-Value store: Update

I’ve managed to change the code of Redis-Benchmark to give real results on my computer for yesterdays MarkLogic comparison. Read on for final results!

I forked the Redis code and patched src/redis-benchmark.c to add support for the HSET and HMSET commands! Was actually a lot easier than expected thanks to how well written the benchmark harness is to support random key names and test data.

Below are the results for HSET – setting a single random field on a named hash:-

adamfowbookwork:src adamfowler$ ./redis-benchmark -h 192.168.123.4 -n 100000 -r 100000 -t hset -c 50
====== HSET ======
 100000 requests completed in 4.60 seconds
 50 parallel clients
 3 bytes payload
 keep alive: 1

0.00% <= 1 milliseconds
8.01% <= 2 milliseconds
98.28% <= 3 milliseconds
99.69% <= 4 milliseconds
99.81% <= 5 milliseconds
99.89% <= 6 milliseconds
100.00% <= 6 milliseconds
21739.13 requests per second

And below are the results for HMSET, setting 10 fields in a single call on random fields on a named hash:-

adamfowbookwork:src adamfowler$ ./redis-benchmark -h 192.168.123.4 -n 100000 -r 100000 -t hmset -c 50
====== HMSET (10 keys) ======
 100000 requests completed in 5.85 seconds
 50 parallel clients
 3 bytes payload
 keep alive: 1

0.00% <= 1 milliseconds
0.28% <= 2 milliseconds
71.92% <= 3 milliseconds
97.45% <= 4 milliseconds
99.31% <= 5 milliseconds
99.51% <= 6 milliseconds
99.58% <= 7 milliseconds
99.60% <= 8 milliseconds
99.67% <= 9 milliseconds
99.70% <= 11 milliseconds
99.78% <= 12 milliseconds
99.84% <= 13 milliseconds
99.88% <= 14 milliseconds
99.88% <= 17 milliseconds
99.90% <= 19 milliseconds
99.90% <= 20 milliseconds
99.92% <= 22 milliseconds
99.92% <= 25 milliseconds
99.95% <= 64 milliseconds
99.99% <= 65 milliseconds
100.00% <= 65 milliseconds
17094.02 requests per second

This looks like HMSET is slower, but if you think about it you’re actually saving 10 pieces of data per request, so it’s pretty damn performant! 10x the amount of data for approx a 20% performance hit. Not bad.

I was expecting somewhere between 3.5x and 5x speed over MarkLogic’s saving of documents with 10 fields (XML elements)…

Our results from yesterday for saving 100000, 10-element documents in MarkLogic: 3225.80 per second.

This means for simple aggregates Redis hashes are only 5.3 times faster than MarkLogic. So my guestimate wasn’t far out. I guess the Redis guys and gals have done some performance tuning since the blog post I was basing my performance factors on!

A different MarkLogic Client

MarkLogic Content Pump (MLCP) has overheads and is likely not that highly tuned as redis-benchmark. I’ve also downloaded and tested the xmlsh application with the MarkLogic extension to xmlsh.

I’m still reading files off disc, so there’s a significant IO penalty that redis-benchmark doesn’t suffer from.

Here’s the ingest.sh file:-

#!/bin/sh
export XMLSH=./
export MLCONNECT=xcc://admin:admin@192.168.123.4:7777/Documents
export XMODPATH=./ext
date
./unix/xmlsh ./ingest.xh
date

And the ingest.xh XMLSH file:-

import module ml=marklogic
ml:put -baseuri /test/ -uri "doc{seq}.xml" -x -repair none -maxthreads 20 -maxfiles 30 -r aggregates

This now takes 28 seconds, which means a total throughput of 3571.43 documents per second. This can probably be improved upon, but it’s sufficient for my tests today.

Using a RamDisk instead of spinning disc

After some tuning, I created a RamDisk on my mac instead of the spinning HDD, and modified ingest.xh to be the following:-

ml:put -baseuri /test/ -uri "doc{seq}.xml" -x -repair none -maxthreads 20 -maxfiles 30 -r aggregates

This gave final execution in 27 seconds repeatedly, giving a throughput of 3703.70 docs / second.

This means that Redis is 4.61 times faster than MarkLogic in storing these very simple aggregates.

NB I also assigned 6 cores (3 cores – 6 hw threads) to the vmware image to see if that helped… it didn’t!

Configuring memory caches to minimise disc writes

MarkLogic appends data to in memory stands and journals to disc. This is limited to 16 MB though. My data load is about 35 MB. My changing the list cache and tree cache sizes to 64 MB I should avoid added disc penalties.

Running the same test again gives 39 seconds – so slower! This probably means I’m actually within the margin of error between my tests.

Configuring 2 forests instead of 1

My testing shows I’m CPU bound rather than IO bound on the MarkLogic Server. This is because the app server threads are blocking until the Forest (shard) managing thread completes. Probably ideal for a VMWare disc and network test, as these are not the bottlenecks.

I’ll configuring 2 forests instead of the default 1 to balance out this problem. A slight cheat as Redis was only running on 1 core, and this is equivalent to running 2 cores (2 redis instances), but interesting none the less.

I also changed the vmware image back to 4 hw threads (2 physical cores) so its an apples to apples comparison again.

The results were… still 28 seconds. So it looks like I’m still CPU bound, and that’s where the issue lays in my testing.

What does this mean?

A Key-Value store is generally expected to be 50x faster than a document store due to how it works – we see only up to 4.6x a performance difference. That’s a factor of ten less speedy (technical term!). We have shown that the more complex the aggregates operations get, and especially the more paranoid you are about data consistency and durability, the closer the performance of the two databases are.

This is not surprising. Redis is built for blazingly fast read queries. MarkLogic is built for fast but complex query loads – they therefore have different performance characteristics on the writing side of things.

Practically, it shows that a document store like MarkLogic is not fundamentally disadvantaged in pure write performance. Thus if you have complex query loads or data models then use MarkLogic, if not then feel free to use Redis.

How can the tests be improved?

I would really like to spend time altering redis-benchmark so that both hash, list and field names, and values can all be randomised so you can test update versus create workloads. I’ve done this for HMSET using random hash and field names.

I also suspect that the bigger the aggregates (more fields/elements) then the nearer MarkLogic will get to Redis. Trying 100 elements per hash/document may be a useful test. As would be nested hashes, if indeed Redis supports those?

I’d also like to have a benchmarking app written in C or C++ against MarkLogic’s REST API rather than using Java against the XCC API – as this would give better client performance and test the part of the server most applications would use (REST).

I suspect MarkLogic’s performance would be significantly closer to Redis if using a better client – or simply generated the XML from a variable rather than reading from the file system/RAM disc. I’ll hunt around for an alternative to MLCP (Content Pump) and XMLSH in the meantime.

Actually, I’ll contribute the Redis benchmark code back first, as I’m sure you’ll want to play with those tests!!! Here’s the pull request for my changes. Enjoy!

Advertisements

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s