I’ve always thought that the key to any technology’s adoption is the ability for it to be accessed from as close to the user as possible. In this post I explain how a Big Data store can be queried from an instant messaging client.
I bet you’re thinking “Oh I remember using Instant Messaging… back in 2001!” but if you think about it, we use these services all the time. Facebook chat, twitter, salesforce.com’s chatter. Google Chat is even, under the hood, a fully XMPP compliant Jabber compatible Instant Messaging network. They just expose it via GMail and their own Google Chat application.
When we want to access information we go out of our normal work flow to google.com and type in a plain text query. Results are displayed in text paragraphs, with a sprinkle of meta data. It’s hardly rocket science, or the most sophisticated UI possible, but it works and provides great value.
Imagine if you could query the underlying information store just as you can chat to your friends on Facebook or twitter. In most modern browsers you have an ‘awesome bar’ which is intelligent enough to know the difference between a web page, a bookmark, and a search engine query. What if you had a desktop ‘font of all knowledge’ that you could type in advanced queries to?
Big Data is a big challenge. Queries can be very sophisticated. For humans to understand them they need to be represented friendly enough using some natural language style grammar. We want to be able to say ‘Hey facebook, find my friends from school in 1994, and show me their friends – I may know them… Even better, sort them by the likelihood of me knowing them.’ You may work in the NHS and need to ask “What was the total amount spent on all types of statins in August for all GP surgeries in the East Midlands?”. You could even be an information analyst and ask “Show me what new reports have been added that match the search I carried out on Monday. And if any more arrive, alert me to them as soon as they do”.
Most importantly you want to do this from the place you are most comfortable. You don’t want to go out to several different applications all with different query styles and capabilities. You definitely don’t want to manually trawl through a bunch of potentially useless answers. (Hi Google. No I don’t want tech information published in 2007 – it’ll be woefully out of date for my needs today).
How do you find precise information at work? I bet you ask someone who works with that information all the time, right? They act as a filter and guide to the right set of information that’s most likely to be able to answer your query. An Information Buddy if you will. I bet you may even contact them initially over your company’s IM network? (Hello, fellow ex-IBMers. Sametime anyone?)
I have chat clients open all the time. Skype, MSN, Yahoo, AIM, Google Chat, Jabber, iChat, Facebook, Twitter – ALL of the above, all the time. Just in case. It occurred to me that this is the best way to get short bursts of information to me via MarkLogic’s alerting framework. It then became obvious that I’d have to be a buddy of a computer programme’s online chat account in order to receive information… but chat is two way…
Why not allow the computer programme to answer questions in plain text? Why not pass that query on to MarkLogic’s NoSQL database to query over a vast array of data? This can rate the results by relevance and give me short summaries to my IM client and links to the full story.
NodeJS is also great at providing a framework for rapidly creating a server. I needed something that could receive RESTful service calls (carrying alerts) from MarkLogic and pass them to a Jabber buddy (me!). I also needed to be able to ask this buddy for information and have it query MarkLogic. My MLDB driver with a NodeJS library for XMPP/Jabber Instant Messaging was an obvious fit.
I spent three hours just putting the nuts and bolts in place. Have the jabber client connect on starting the server, support multiple clients, have each IM client provide their credentials rather than use a generic (and wide open) service account. Interpret commands from a Jabber Buddy. Provide some basic help for people new to the system.
The idea is basically to create a bot like the IRC chat bots of old. I got mine working in three hours from scratch, which includes research time, and performing a search against a MarkLogic database, as you can see from the below picture.
As you can see I have the search showing snippet information with results ordered by relevance. Thanks to NodeJS and MarkLogic’s low latency and high performance, the search executes blisteringly fast.
For my next trick I’m going to allow people to create a saved search. They’ll be able to do this in a normal MarkLogic web application like those I often build in Roxy using a very simple, generic XQuery library, and alert handler.
I can then extend this so that an Analyst sat at their computer can quickly think ‘Oh bottom! Major panic. I gotta be all over <subject A>, right now, and until further notice!’. My NodeJS app, which I’ve imaginatively called MLNodeJabber, will then both set up the alert on the fly and give you a summary for the latest, most relevant information. Say over the last 48 hours.
It could even take in to account your default profile settings, such as the collections you care about, your location, and the types of information that most concern you. This gives the best relevancy matching possible, tailored just for you.