I review the latest beta of the ArangoDB hybrid Document and Graph store…
ArangoDB is a hybrid, or multi-model, NoSQL Document and Graph store. This is becoming a common combination, and provides a lot of flexibility and power. I’ve been watching ArangoDB for 3 years since they started, so it’ll be interesting to see what has changed…
Vital Statistics
Latest release: Version 3.2 Beta (Jun 13th, 2017) (Current production version is 3.1)
Commercial backer: ArangoDB GmbH (span out of triAGENS GmbH, an IT Consultancy in Germany)
Website: https://www.arangodb.com
Twitter: @arangodb
Licensing: Community core (Apache 2.0) with Enterprise supported version
Sales model: Subscription model, including support for the community version, not just enterprise
Release Press Release: https://www.arangodb.com/2017/06/arangodb-3-2-beta-release-pluggable-storage-engine-rocksdb-distributed-graph-processing-clusterfoxx/
Release Full Details: As above
What’s new
The biggest change is the use of Facebook’s RocksDB key-value store as a storage engine (I’ll review RocksDB separately in a future article). This is a huge change that will require load testing in your applications, as consistency and locking work differently than the previous mmfiles method.
Plus, as MongoDB customer found out when introducing WiredTiger as a storage engine, you’ll always want to test and retest a new storage layer to look for unanticipated regression and performance issues.
It’s a positive change though, allowing document-level locking, and preventing writes blocking reads, and vice versa. This should greatly improve performance under high write and read loads.
Support for the Pregel graph processing model will be a boon for developers. Think of this like an efficient version of Map/Reduce for graph algorithms processed across data spread throughout the cluster. Interestingly, you can choose single node or cluster execution. Certain graph algorithms demand running on a single machine with all the data, so being able to choose should keep graph aficionados happy!
The Enterprise version also benefits from LDAP security support, and encryption at rest. Key for many serious Enterprise customers like Banks and Governments.
Satellite Collections (Enterprise Edition only) is a great new feature where you can instruct the engine to use replicas for processing, removing the need for a network hop to find key collection data on another node. This may not sound like much but I can tell ya from running distributed processing algorithms over NoSQL database, that this will help tune large document processing algorithms to a great extent.
What I like
One of the big problems in a distributed graph store is efficient processing of graph algorithms across multiple servers in a cluster. In order to use commodity servers rather than expensive high powered servers you need to spread the data out. Of course for graph algorithms that follow paths between nodes this introduces lag as you traverse the graph.
I’m glad to see ArangoDB spending a lot of Engineering time on this key problem. If they can crack it with Pregel in this release, they’ll be way ahead of several of their competitors.
For example, the recently added SmartGraphs feature (Enterprise edition only). This manages the sharding of nodes on a graph based on a known attribute. Consider a social graph – most people have connections in the same country (unless you’re me and write about IT!) – so sharding a graph based on country of a user works well. By applying this attribute you minimise the network hops for most queries.
Combined with SatelliteCollections (Enterprise edition only), mentioned above, you can tune your data storage to ensure fast graph queries even when the graph is sharded – handling tricky cases like me with my international, jet setting, twitter connections. After talking with ArangoDB, they believe the combination of these features will be very useful to IoT use cases in particular. Please add a comment to this article if you’ve found the combination useful – I’d love to hear war stories!
When I first wrote about ArangoDB in State of NoSQL 2016, I assessed it to not yet be quite ACID compliant. This was due to laggy eventually consistent replicas. This has since been solved, thanks to the ability of additional Primary servers to be marked as ‘followers’ for shards on other primary servers. Updates to primaries and its primary followers are synchronous, allowing for high availability of data whilst ensuring ACID compliance.
This is key for mission-critical data scenarios where you need to be damned sure the data is safely on disc when the database tells you it’s saved. I cannot emphasise enough how important I think being ACID compliant is to Enterprise customers with mission-critical application needs.
I love the fact that ArangoDB’s free version uses a true Open Source license – the Apache 2.0 license. Unlike the AGPL v3 that other vendors use, this license does not restrict organisations who want to use the software, and perhaps customise it, for a production commercial application.
For me, ArangoDB as a company seems to have a nice X factor. I can’t quite put my hand on it. I think it probably stems from my love for hybrid Document and Graph stores, which these guys and gals do very well. Also because everyone I’ve talked to there seems really friendly. Heck, even their President took time out to find some information for me for this article!
ArangoDB’s engineering team seems to think about customer problem spaces and likely uses, and adds functionality right there – they’re not led by dogma and their features don’t appear to be dreamed up internally with no reference to the outside world, and new features are never lacking a real-world use case.
I would not be surprised if their customers found them very responsive to adding in new features based solely on their needs in the real world, rather than stick rigidly to an artificial road map made up out of guess work. I have no primary evidence to back that up though – it’s just a gut feel. I’d love to hear about your experiences, so please add a comment if you’ve dealt with ArangoDB personally as a customer.
What’s not so good
Although I myself have said before that a Document store can operate like a key-value store [1] [2] [3], I’m always at pains to point out that to do this well you should provide the same data type and data operations support as a key-value store like Redis.
The ArangoDB as a key-value store use case seems limited to the marketing department (For a variety of reasons, mentioned below in an appendix). It’s a shame because I think it would be relatively easy for Document NoSQL databases like ArangoDB to support these use cases in their APIs, thus providing a migration option from companies using key-value stores to using hybrid NoSQL databases. One database product to rule them all, as it were.
Having said that, it’s a minor gripe, and most people will use ArangoDB for its document and graph database features, and AQL query language, as they provide a rich set of functionality for the majority of use cases.
ArangoDB as a company is fairly new, having only been formed in 2014 in Germany (albeit spun out of an existing company). It appears to have around 20-50 staff from what I can glean from their Careers website, so getting effective local support outside of the EU region and time zone may be tricky. They do appear to have just hired a US sales leadership team, so they are getting there fast!
ArangoDB do take part in many events in the US and Europe each year. I suggest you go talk to them there.
Where it is used
ArangoDB’s website appears to only have news releases for the last year, so I only found limited information on this. There is a very well hidden Case Studies page though – thanks to ArangoDB’s President, Luca Olivari, for pointing me toward that.
Oxford University appear to use ArangoDB to reduce hospital attendance and healthcare costs, and improve test results. A phone app uses a finger-end blood pressure monitor to send information on a patient back to the NHS trust. This is via a Node.js application that stores data in ArangoDB. This of course means you can effectively track a patient and decide intelligently when to ask them to come in to hospital, reducing the load on the NHS. It’s a great little use case which is easily replicable to other problem areas.
An unnamed customer (those pesky customer NDAs! I know them well.), a Fortune 100 company, uses ArangoDB for Digital Certificate and Cryptographic Key management. Wowsers! Interesting use case for a Document store!
There’s also a really interesting use case involving exposing micro data services using ArangoDB’s Foxx service support. You can find a full explanation of that use case on its own page. This is particularly interesting as it prevents the developer from having to first write a data layer service, followed by a thin wrapper micro service. Foxx provides the flexibility so you can blend ArangoDB data services with your existing micro services structure.
ArangoDB’s website can be found here.
Appendix
ArangoDB as a key-value store – supplementary detail
ArangoDB claims to be a Document store, Graph store, AND a key-value store – it’s right there on the front of their website. Whilst it is technically true that you can store arbitrary data values against a document ID, there is no mention of using ArangoDB for true key-value store workloads in their technical documentation.
See this subsection on Data models – it mentions Document and Graph, but not key-value. No mention of key-value type use cases like sets and lists in their API documentation either.
There are use cases where ArangoDB replaced key-value stores, but that was useful because the customer realised the document model was what it needed, not a true key-value store. ArangoDB website’s multi-model page explicitly states you have to use a document as a container for a value, and makes no mention of pure key-value store functionality like lists, sets, maps, or their access APIs.
ArangoDB have told me they view key-value uses as a specialised version of document operations. For example, if you store a document by a key you can manipulate the content using AQL. These could be list / array operations, type determination and conversion operations, stack operations (via push / pop on arrays) – no inherent optimised storage for sorted sets or maps though.
This support is a bit esoteric and very hard to find. I hope in future there’ll be some movement on the documentation and features in this area given they make a bit play on this in their marketing material.
Permissions
Logo reproduced with permission of ArangoDB.
Nice article … just missing a word about the amazing foxx ! You can have a nice restful API running inside arangoDB