XQuery gotchas part 1

Making this part one, because I can imagine the need for many more parts in future!

I like XQuery. Took very little time to get used to it. Learning curve comparable to when I learnt Ruby a couple of years ago. As with any language though there are things in there that hate us mere mortals. Here are a couple that have gotten me in the last 6 weeks.

Methods not being called

This one is particularly infuriating. Took me 90 minutes to find this one. If you have a method declared in another module like this:-

declare function m:do-something($userid as xs:string,$xml as element(f:some-element) ) { xdmp:log(“I should totally do something here”) };

If you call it incorrectly like this, you will get an error (too few arguments):-

let $output := m:do-something(“myuser”)

If you call it like this, it will work and log a message (on MarkLogic):-

let $output := m:do-something(“myuser”,<f:some-element>some content</f:some-element>)

If, however, you fetch the some-element from a flwor or similar, and it happens not to exist, what do you think should happen? :-

let $output := m:do-something(“myuser”,(/f:some-element)[1]) (: Note: no documents with some-element as a root in the database 🙂

Coming from other programming languages you may expect one of two things. Either a runtime error because some-element is required in the method declaration, or a runtime error if the value was being used in the method. In reality, neither of these things happens. Because the second parameter is required, and the value you pass in is an empty set (), the method IS NOT CALLED! No error will occur, and the log message will not be written.

To me this is utterly, utterly weird. If you need the method to be invoked even if the value is the empty set () then you need this declaration:-

declare function m:do-something($userid as xs:string,$xml as element(f:some-element)? ) { xdmp:log(“I should totally do something here”) };

Note the question mark after the variable declaration. This means ‘zero or one f:some-elements’. Needless to say finding this thing was an absolute pig.

Returning root elements

Executing this will give you all reports in the previously declared i namespace:-

/i:report

What about this one? :-

/i:report[1]

You may expect the first report, in document order. Much as this one returns the first title, in document order:-

/i:report/i:title[1]

Except /i:report[1] returns ALL reports. This is because you’re referring to the document node not a normal node. To select the first report in document order you instead need:-

(/i:report)[1]

This is one of the first examples in the XQuery book, but sometimes can still catch you out. E.g. in something like this:-

fn:distinct-values(/i:message[./i:collector-ref/i:type/text() = “facebook”]/i:sender/i:identity-ref/i:service-id/text())

This would return no results whereas this:-

fn:distinct-values((/i:message)[./i:collector-ref/i:type/text() = “facebook”]/i:sender/i:identity-ref/i:service-id/text())

Would return all unique senders of facebook messages that you have stored.

So remember, if you’re selecting a root node and using a predicate on it, always surround the root node XPath with () first.

Namespacing

The ye olde namespacing issue. This has driven many a new MarkLogic employee to distraction from what I’ve been told. The problem is not so much the language of XQuery, but more how a human’s brain understands XML. Lets say you have this:

<message>
 <sender>
  <name>Adam Fowler</name>
  <email>adam.fowler@marklogic.com</email>
 </sender>
 <body>Lorem ipsum dolar sit amet</body>
</message>

This is very similar to the basic examples on the web. You see XPath examples like the following too:-

/message/sender/name/text()

They all work lovely, and indeed would return ‘Adam Fowler’ from the above message. (assuming this was the only message in the database). The problem comes when you get this:-

<message xmlns="http://mydomain.com/ns/message">
 <sender>
  <name>Adam Fowler</name>
  <email>adam.fowler@marklogic.com</email>
 </sender>
 <body>Lorem ipsum dolar sit amet</body>
</message>

Now your XPath would not return anything. Even worse, if your XQuery module had a default namespace like this:-

declare default element namespace “http://www.w3.org/1999/xhtml&#8221;;

Then even the first example would fail. This is because /message does not mean ‘The message node in any namespace’ but instead means ‘The message node in the default namespace’. If your default is blank, or different from the target document, then you will never get the result you want.

The best way to avoid these issues I’ve found is to a) always declare a default namespace in XHTML producing modules, and b) always use namespaces everywhere in XPath. (Even if blank)

So for the first example you could do this:-

declare namespace n=””;
/n:message/n:sender/n:name/text()

Whereas in the second example you could do this:-

declare namespace n=”http://mydomain.com/ns/message&#8221;;
/n:message/n:sender/n:name/text()

Both would work no matter what the default namespace of the module is. Much safer.

Also, you should avoid the temptation to do this:-

/*:message/*:sender/*:name/text()

This means ‘the message node in any namespace’ which will work, but which will be computationally expensive. Also, if you embed part of a document from another namesapce within your message document, which also happens to have a message node, then the results of the query would be unpredictable. Safer to avoid doing this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.