MarkLogic V6, twitter, and json

So part of what we do in pre-sales is to see what content we can store in MarkLogic and then search over. A big topic these days is Social Media Analytics. This can be used by companies to see the current trends when their products are mentioned. Marketing people can discover links they didn’t expect, such as  the link between Gatorade and Flu. A lot of people seem to drink this when they have Flu, so they may want a marketing campaign for that use. Clever stuff.

I decided I would use MarkLogic 6’s new native support for managing JSON to store a bunch of tweets. This could be the result of a search, or just my own user’s statuses page content. This is not as straighforward as it sounds. This is partly because of OAuth that authenticates requests to twitter and is mandatory on V1.1 (which gives me nice detailed JSON tweet information) and partly because of how JSON strings are parsed by the new json:transform-from-json command. This command gives you an xml representation of JSON, and can be passed options to parse JSON in a variety of ways.

OAuth

My colleague, Norman Walsh, kindly pointed me in the direction of his OAuth xqy module on GitHub. I grabbed a copy and then started playing. I used his instructions but missed a few things due to brevity. This page should help others in a similar position. He uses this to grab his statuses on twitter so I figured it would be a good fit. I had a few issues with it whilst trying to hit the 1.1 twitter API’s json endpoints. I managed to fix this using the following steps (I think – contact me if it doesn’t work for you).

  1. Use git to grab a copy of Norman’s oauth.xqy module
  2. edit the oa:sign method so it reads this this (This uses the built in hmac-sha1 algorithm in MarkLogic 6):-
    declare function oa:sign($key as xs:string, $data as xs:string) as xs:string {
      xdmp:hmac-sha1($key,$data,"base64") 
    };
  3. Now create yourself an xquery module that has access to the oauth library. I use roxy, hence the URL below. Yours will likely be different.
    let $service-document := 
          <oa:service-provider realm="http://twitter.com">
           <oa:request-token>
             <oa:uri>http://twitter.com/oauth/request_token</oa:uri>
             <oa:method>GET</oa:method>
           </oa:request-token>
           <oa:user-authorization>
             <oa:uri>http://twitter.com/oauth/authorize</oa:uri>
           </oa:user-authorization>
           <oa:user-authentication>
             <oa:uri>http://twitter.com/oauth/authenticate</oa:uri>
             <oa:additional-params>force_login=true</oa:additional-params>
           </oa:user-authentication>
           <oa:access-token>
             <oa:uri>http://twitter.com/oauth/access_token</oa:uri>
             <oa:method>POST</oa:method>
           </oa:access-token>
           <oa:signature-methods>
             <oa:method>HMAC-SHA1</oa:method>
           </oa:signature-methods>
           <oa:oauth-version>1.0</oa:oauth-version>
           <oa:authentication>
             <oa:consumer-key>YOUR APP'S CONSUMER KEY</oa:consumer-key>
             <oa:consumer-key-secret>YOUR APP'S CONSUMER SECRET</oa:consumer-key-secret>
           </oa:authentication>
          </oa:service-provider>
      let $access-token := "YOUR USER'S APP ACCESS TOKEN"
      let $access-token-secret := "YOUR USER'S APP ACCESS SECRET"
      let $options
        := <oa:options>
           <screen_name>YOURSCREENNAME</screen_name>
           <count>25</count>
           <page>1</page>
         </oa:options>
      let $oaresult := oa:signed-request($service-document,
                        "GET", "https://api.twitter.com/1/statuses/home_timeline.json",
                        $options, $access-token, $access-token-secret)

    This returns a string (NOT an xml oa:response node, weirdly, must ask Norman about that…) Here’s the output:

    <oa:response xmlns:oa="http://marklogic.com/ns/oauth">[{"created_at":"Sat Sep 29 23:44:08 +0000 2012","id":252192064891146240,"id_str":"252192064891146240","text":"dozens"...

JSON in MarkLogic 6

json:transform-from-xml needs a top level json element, not an array, so you now need to add these lines too:-

  let $rawjsona := fn:substring-after($oaresult,"[")
  let $rawjson := fn:substring($rawjsona,1,fn:string-length($rawjsona) - 1)
  let $wrappedjson := fn:concat("{""response"": [",$rawjson, "]}")
  let $xml := json:transform-from-json($wrappedjson)

You now have an XML representation of your JSON that you can insert in to MarkLogic 6! Cool huh? Here’s what it generates:-

<json xmlns="http://marklogic.com/xdmp/json/basic" type="object">
 <response type="array">
  <json type="object">
   <created__at type="string">Sat Sep 29 19:32:52 +0000 2012</created__at>
   <id type="number">252128830393831425</id><id__str type="string">252128830393831425</id__str><text type="string">RT @FrazerLloyd: @LadyxFarrah Scholes is still one of the best in the world. 
...
  </json>
 </response>
</json>

I believe there are also ways to directly stream in JSON using a HTTP POST/PUT to an AJAX Server, a new type of server for MarkLogic 6, but I’ve not had a play with that yet.

I’ve tested the above URL on V1 of the twitter API (as above) and V 1.1. To switch to V 1.1 simply change the URL to https://api.twitter.com/1.1/statuses/home_timeline.json

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.