Many of us know what we need to know to get by in most programming languages. Even to do some advanced things this is usually enough. This post points out a few interesting things I’ve found in the W3C XQuery 1.0 spec that you may not have seen before, and gives useful examples for each.
And yes, I did speed read the whole document. Clearly I have no life! Still, saves you from having to do it! … You’re very much welcome.
Yeah, still trying to get my head around these… As per my previous post. I’ll post more once I figure out what on Earth is going on.
Static and Dynamic analysis phases
Like many programming language compilers, XQuery goes through a static evaluation phase followed by a dynamic evaluation phase. During the static phase expressions are evaluated and assigned a ‘static type’. This term is used throughout the XQuery spec, so is worth highlighting here. Only schema imports are required in this phase, as well as evaluating any imported function modules (in order to use their defined static types).
Once this is done the dynamic evaluation phase begins, which is where input values are evaluated by the expressions. A Dynamic Type is assigned to expressions as they are evaluated. This will either be the same or more specific than the static type. E.g. A function may be defined as xs:integer* but if it returns just one integer, the dynamic type of it’s returned value would be xs:integer – exactly one integer.
This is a handy way to avoid the need for loops when analysing a set of values or elements. Consider this example:
xquery version "1.0-ml"; let $expr1 := (34,45,56,47) return some $x in $expr1 satisfies $x = 47
Predicates and expression rewrites (Just before 2.4 Concepts)
As from the Spec: “The expression in the following example cannot raise a casting error if it is evaluated exactly as written (i.e., left to right). Since neither predicate depends on the context position, an implementation might choose to reorder the predicates to achieve better performance (for example, by taking advantage of an index). This reordering could cause the expression to raise an error.”
$N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")]
Or in other words, the above could be evaluated by an implementation as this:-
$N[xs:date(@x) gt xs:date("2000-01-01")][@x castable as xs:date]
Which, if @x is not castable to an xs:date would return an error. Better to be safe and use the following:-
$N[if (@x castable as xs:date) then xs:date(@x) gt xs:date("2000-01-01") else false()]
As in many languages, there are certain things other than booleans that can be evaluated as a boolean. In XQuery this is used most commonly to test whether a variable has a value associated with it, or is the empty set. Quite often in MarkLogic you will fetch a request field and test if it has a value before deciding what to do:-
let $query := xdmp:get-request-field("q") let $results := if ($q) then local:do-query($q) else local:get-latest()
This also works for testing nodes. See the rules listed on the above. Be aware – a number with value 0 or NaN returns false – even though 0 is a valid number.
I bet you’ve all written expressions using node(), text() or item() in, right? Consider the following:
$var/contact[./address/node()]
Which is used to fetch all contacts with at least one address node. What you need to be careful of is using this instead:
$var/contact[./address/*]
This is because * is not the same as node(). As you see from the above list, * would also match comments, so the following with /* would return true, when you would expect it to return false:-
let $var := <contact> <address> <!-- no address --> </address> </contact>
// is not a valid path expression (3.2 Path Expressions Paragraph 4)
Remember if you’re naughty/lazy/don’t know the exact position of a named element you could match like this:-
let $element := fn:doc("somedoc.xml")//addressLine1
This would work fine. The following though would raise an exception:-
if (fn:doc("somedoc.xml")//addressLine1) then "found address" else "no address"
Doing this, though, would give the if statement the initial node, so would work:-
if (fn:doc("somedoc.xml")/contact//addressLine1) then "found address" else "no address"
So you’ve probably used these with their common aliases. There are a couple extras worth mentioning though. If you wanted to return a snippet of information, for example, of the elements immediately before and after your matched element. How would you do that? You can use the following-sibling:: and preceding-sibling:: axes to get them. Also of note are the parent:: and ancestor:: axes. The opposite of / and // respectively.
This really follows on from Axes, above, and lists the total number of abbreviations, which I bet you thought weren’t abbreviations but actual syntax, right? E.g. .. for parent::. Of particular interest is the last item on this list:
let $values1 := (1,2,3,4) let $values2 := (3,4,5,6) let $all := ($values1,$values2) (: $all contains (1,2,3,4,3,4,5,6) :) let $unique := $all/. (: $unique contains (1,2,3,4,5,6) :)
For a Sequence E, E/. removes duplicates based on node identity. Results are returned in document order.
You have to be careful in XQuery with = and != operands. This is due to XQuery’s support for sequences. Consider the following. Rather than them all returning false, only the third returns false:-
(1, 2) = (2, 3) (2, 3) = (3, 4) (1, 2) = (3, 4)
This is because the = operator returns true if any element on the left hand side matches the value of any element on the right hand side. – not all elements must match to return true.
Consider this example too. Normally you would think = returns the opposite of !=. I.e. A != B is equal to fn:not(A = B). This is not the case:-
(1, 2) = (2, 3) (1, 2) != (2, 3)
This is because != returns true if any element on the left hand side is not within the sequence on the right hand side.
You may have come across these before but not realised how useful they are. For example, the following two definitions are equivalent:-
let $first := <book><para>text</para></book> let $second := element book {element para {"text"} }
But the second one is more text. So why would you ever use it? Well, the advantage is that using a computed constructor you can have a dynamic element name. Consider you had the following XML:-
let $source := <results><result key="FirstName" value="Adam"/><result key="LastName" value="Fowler"/></results>
And wanted to change it to this:-
<results><FirstName>Adam</FirstName><LastName>Fowler</LastName></results>
You could use the following XQuery to do that:-
return element results { for $result in $source/result return element {$result/@key} {$result/@value} }
How cool is that???
Order by options (second to last example before 3.8.4)
Unbeknownst to most of us, order by has a few extra options – stable, collation and empty least. Consider this example:-
for $b in $books/book stable order by $b/title collation "http://www.example.org/collations/fr-ca", $b/price descending empty least return $b
Firstly, the stable keyword specifies that if two books with the same order by values are encountered, then they are returned in the same order as the source document. This helps the output be predictable with reference to the input, E.g. over multiple calls with the same input.
Secondly, the collation option is useful for when you want to specifically control how string ordering occurs. Here we see we’re using the French Canadian collation. Equally you could have a collation that did an order by ignoring capitalisation or white spaces
Finally, the empty least specifies that results without a price are returned at the end of the output rather than the start of the output. Useful if you want to show partial data results, but only after results with full information available.
If you have a query which you truly don’t care about the returned order, then a performance increase may occur when you surround it in unordered { }. This is common if you have nested queries, and always do ordering in the final top level query.
If you want to pass an element as a particular sub type, but don’t want to change it’s identity for passing in to a function that uses that sub type, then you can use treat instead of cast. For example:-
$myaddress treat as element(*, USAddress)
In Summary
I hope you’re enjoyed all these weird and wonderful parts of the XQuery 1.0 Spec. I’ll have to put something similar together for the XQuery 3.0 spec which is built in to MarkLogic 6.0! (Now there’s a teaser…)
Nice article. The following statement is incorrect: `As you see from the above list, * would also match comments, so the following with /* would return true, when you would expect it to return false`. I believe that you meant to use `node()` where you have `*`.