It’s hard to believe, but I’ve been at MarkLogic for 13 months and had this blog going for a year! What’s been achieved with NoSQL, how has the market changed, and what does the future hold? I discuss these issues in this post…
Joining MarkLogic
When you’re hired in to the technical organisation (Pre-Sales or Consultancy) in MarkLogic, you have to undergo a 20 hour project. You have to download and use the software and documentation in your spare time to create an application. You’ll then demo this to your peers in pre-sales and consultancy as one part of the interview process.
You’re not given training though, and only a few pointers of where to look or fix system issues when you hit them – just as you would in a real pre-sales situation. My application was to load in Open Data from the UK Government about expenses, and search through it providing various table based reports on the data.
CSV datasets are common the the UK Gov. Although not particularly great from a search perspective (unlike a huge unstructured XML document) they do hold a lot of interesting data. Coming from a User Experience Platform (UXP) company, my demo at least looked good!
I used my JavaScript and JSON knowledge to plug in a jQuery table to the underlying search results page information to provide an interactive table. Worked fine, and obviously I was hired!
It’s an interesting process to go through. The piece I liked most was getting to know the good and bad parts of a product before jumping in to a company. You get a feel for what is good to show people, and what bits need more work.
User Interface is always the bug bear of Enterprise Software vendors and I knew that doing UI via XQuery is non trivial – there had to be A Better Way [tm].
MarkLogic 6 released
Last September after I joined on 30 Aug we launched MarkLogic Server 6. This provided some of these UI improvements by providing an updated and much improved Application Builder. This made use of a new REST API rather than purely XQuery calls.
Application Builder is meant to be a quick start – people are supposed to customise apps afterwards. It’s a bit of a jump though from no coding to suddenly needing a bit of JS here, XQuery there, and XSLT. This is probably why adoption of Application Builder has slowed internally in favour of toolkits like Roxy.
Roxy is an MVC framework for XQuery applications, as well as a code deploy tool. It gets the grunt work out of the way in much the same was as Ruby on Rails does. Indeed, that’s where its inspiration comes from.
At this point I’d started creating a very complex search and content re-use application in Roxy. It grabbed messages from Facebook and Twitter public feeds and let you visualise networks of communication. This technique can be used to gain commercial insight, or spot people plotting riots even.
There was still a lot of XQuery. I built in a drag and drop capability in JavaScript – you could drag messages of interest to a bucket and they were collected together for later analysis or reading.
Going through this experience I was starting to think we needed a UI layer to take all of the heavy UI lifting away from XQuery entirely… But no time to investigate further.
I then built a situational awareness application. When loaded up in a browser this HTML5 application immediately gave you map tiles or your areas, and the location of your equipment. As the situation and position of things changed over time, the interface automatically refreshed.
Cool piece of code, but how to make it truly real time? We have real time alerting in MarkLogic Server, but we tend to demo it with polling based web apps. Fine for a small demo, but not great where the frequency of updates is very high.
I decided to built a Node.js based app to provide the HTML5 application with a WebSockets server. MarkLogic doesn’t have WebSockets built in, hence using Node. I quickly realised that interacting with our REST API needed an abstraction layer. This is where MLJS Core (Formerly MLDB) came from.
MarkLogic and the User Interface
I created mldb, now mljs, to ease the integration to the rest api. With any integration there is repetitiveness, and so mljs core handled authentication, connection settings, logging and other grunt work. It also defaults to the same settings as a vanilla MarkLogic install, so it’s quick to get up and running with.
This worked a treat, and gave me a very responsive situational awareness application supporting content and geospatial search, with altering to any new or changed information.
It became obvious after having this low level functionality complete that I could plug in an Ajax toolkit instead of a node js library to communicate with MarkLogic in the browser rather than in just node js.
I made this modification, whilst still supporting node js use, and set about plugging some widgets in to this. I created first a search page with a query bar widget, results, facet selection, pushing and sorting. This was actually really quick to do.
I made this pluggable so you could customize how different results looked, with support for xml, json, and snippet results. Worked well. I even plugged the high charts graph library in to this to visualise results.
Something was missing though. I had to manually create plumbing code to link doing a search with updating each widget. This upset the development patterns lover in me!
I created a controller architecture that you can plug any widget in to our create your own for. You can even have multiple search controllers per page.
MarkLogic World
Now I had a core library and a widget framework which was saving me hours in customer demo builds. I wanted to share with an audience so submitted a talk proposal for MarkLogic world.
This was accepted and so I travelled to Vegas in April 2013 to give a fifteen minutes talk to our customers and staff. This was well received, and presently surprising in how many people had already used the basic, non widget, mljs code.
MarkLogic world was a good conference, although numbers limited due to sequestration and not being located near our user powerbase of Washington D.C.. this was also the first time we’d gone public about our upcoming version seven, due out in the fall.
We announced that for the first time in NoSQL history we would shop a database for multiple data structures. We already had documents of course, and in memory columnar database capabilities thanks to our range indexes implementation.
In MarkLogic seven this would be extended by supporting a magical yet simple structure of a triple. A triple consists of a subject, a predicate and an object. A subject is a standard iri identifier. A predicate can be thought of either as a property name or a relationship name. The object is then a simple intrinsic type, or another subject.
Using this simple structure you can express simple facts like ‘adam likes cheese’, or build up a sophisticated web of facts like Wikipedia. This is also a fundamental technology in the semantic web of interrelated data first proposed by Sir Tim Berners-Lee.
This enables you to model relationships between entities also. An entity could be a subject in its own right, or indeed a MarkLogic document or document version, pointing back to a document uri in the database.
So you could extract facts from many police interviews and build up a web of personal relationships in a gang, or combine your own data with this post reference data published as linked open data in rdf.
Exciting times. The first real application of this I saw was the BBC ‘s sports ontology where when you click on the ‘premier league’s page it pulls back stories that mention the premier league, but also about teams in that league, or players that pay for those teams. All done via relationships defined in a triple store.
MarkLogic 7 Early Access
After MarkLogic world we entered a series of early access programme stages. This was very well subscribed with people already versed in semantic technology putting MarkLogic through its paces.
This includes one of our employees loading 2.5 billion (yes, billion) triple in to a single MarkLogic node. I also used this to position future application ideas to various parts of the UK government in order to go beyond just content search for their future applications.
I’ve used mljs to create a new semantic context controller and a set of widgets to allow mere mortals to interactively create a complex sparql query without resorting to code.
I’ve also created an entity facts widget, a subject search results widget, and a high charts based graph explorer widget. I’ve even added support for a suggested MarkLogic ontology so you can use a list of subjects or facts to quickly get to related content (documents), or the documents where facts were implied or extracted from.
This effort with my colleagues Jochen Joerg and Ken Tune resulted in us coming second in a worldwide internal demo competition. I even won best rookie entrant!
Our demo shows how to take two disparate relational databases and import them to the triple store using the direct mapping of the W3Cs rdb 2rdf standard. I then run inferring over them and link them to already imported claims documents.
The demo culminates in running a combined semantic and content query to discover joint customers who have high cost claims and poor balances. All within five minutes. Per cool stuff.
What the future holds
This week marks or MarkLogic summit series. We run events in New York, London and D.C.to showcase recent and upcoming product features. What makes these unique is that our customers present at them to show their real life uses of MarkLogic.
The new York event was last Thursday with London happening this Tuesday. The new York event was so popular alongside or puts release of MarkLogic seven that our website went down due to the load! The same requests we get in a typical day occurred in an hour!!! A nice problem to have, and it shows how much interest were generating from this release.
MarkLogic is the only NoSQL database with enterprise features, a built in search engine that rival fast, autonomy or the Google search appliance. The only one to manage values, documents and triples. Most importantly the only one trusted ad the primary data source for mission critical systems in defence, intelligence, publishing and financial trade stores.
It’s been an exciting year, albeit busy! I don’t see it quieting down any time soon either. MarkLogic can now be bought on Amazon to rent, and soon on the gcloud uk government procurement catalogue. The UK public sector is starting to really embrace the technology, with lots of interest in just the last couple of months.
So raise a glass (well, pint in my case) to another intriguing and successful year at MarkLogic! Cheers!
People at work have told me that I spent a lot of time this year with improving my marklogic skills. I didn’t really see It until reading your post here. I was just thinking how much marklogic does now and how its really becoming a one stop shop for your data. So much of that has happened in this last year… if you count version 6 been part of this year.
Congrats on a great first year!