Construct Terms
Variables
In construct terms, variables serve as place holders for the subterms they are bound to. Construct terms may contain label variables, namespace variables, and subterm variables without restriction. Allowing variable restrictions in construct terms is not desirable, as a construct term is merely a specification of how the variables should be reassembled and is not intended to constrain the set of possible variable bindings. Obviously, however, label variables may take only values that are admissible as term labels and namespace variables may only be bound to URIs or IRIs.
In the bookstore example from Section \ref{sec:examples:bookstore}, assume that there is the following set of answer substitutions for the variables Title and Author:
| σ1 | Title | title { "Vikinga Blot" } | |
| Author | author { last { "Ingelman-Sundberg" }, first { "Catharina" }} | ||
| σ2 | Title | title { "Boken Om Vikingarna" } | |
| Author | author { last { "Ingelman-Sundberg" }, first { "Catharina" }} | ||
| σ3 | Title | title { "Folket i Birka på Vikingarnas Tid" } | |
| Author | author { last { "Wahl" }, first { "Mats" }} | ||
| σ4 | Title | title { "Folket i Birka på Vikingarnas Tid" } | |
| Author | author { last { "Nordqvist" }, first { "Sven" }} | ||
| σ5 | Title | title { "Folket i Birka på Vikingarnas Tid" } | |
| Author | author { last { "Ambrosiani" }, first { "Björn" }} |
The following construct term collects a single title/author pair for these substitutions (one for each substitution):
results {
result { var Title, var Author }
}
The result of applying the substitutions above to this construct term are the following five data terms:
results {
result {
title { "Vikinga Blot" },
author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
}
}
results {
result {
title { "Boken Om Vikingarna" },
author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
}
}
results {
result {
title { "Folket i Birka på Vikingarnas Tid" },
author { last { "Wahl" }, first { "Mats" } }
}
}
results {
result {
title { "Folket i Birka på Vikingarnas Tid" },
author { last { "Nordqvist" }, first { "Sven" } }
}
}
results {
result {
title { "Folket i Birka på Vikingarnas Tid" },
author { last { "Ambrosiani" }, first { "Björn" } }
}
}
Grouping and Sorting: all and some
It is often desirable to collect all bindings for a variable in a single answer term. The grouping constructs all and some serve this purpose:
- all groups all possible instances of the enclosed subterms resulting from different variable bindings as children of the enclosing term. At least one instance has to exist, and the number of instances always needs to be finite (otherwise the program does not terminate).
- some groups non-deterministically some of the possible instances of the enclosed subterms resulting from variable bindings as children of the enclosing term. Some is quantified by a number which restricts the (maximum) number of alternatives to use. At least one instance has to exist.
The requirement that there has to exist at least one instance in both grouping constructs may seem unintuitive. However, a construct term can only be evaluated if the rule it is part of ``fires'', i.e.\ the query part succeeds and thus yields at least one substitution for the variables occurring in the query. If this behaviour is not desired, the grouping constructs can be combined with optional (see below).
Consider again the substitutions of Example 1. The following construct term creates a list of result subterms (one for each title/author combination from the substitutions) below a results term using the all-construct to collect all instances:
results {
all result { var TITLE, var AUTHOR }
}
The result of applying the substitutions to this construct term might be the
following data term (compare with the set of data terms from Example
1):
results {
result {
title { "Vikinga Blot" },
author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
},
result {
title { "Boken Om Vikingarna" },
author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
},
result {
title { "Folket i Birka på Vikingarnas Tid" },
author { last { "Wahl" }, first { "Mats" } }
},
result {
title { "Folket i Birka på Vikingarnas Tid" },
author { last { "Nordqvist" }, first { "Sven" } }
},
result {
title { "Folket i Birka på Vikingarnas Tid" },
author { last { "Ambrosiani" }, first { "Björn" } }
}
}
Formally, all t or some n t denote the grouping of all or some instances of t obtained from all possible bindings of the variables that are free in the term t. Subterms of t that again have the form all t' or some n' t' are recursively evaluated in the same manner (see below). A variable is free in a (sub)term t, if it (1) occurs in t, and (2) is not in the scope of another, nested grouping construct. E.g. in the term
results {
all result { var TITLE, var AUTHOR }
}
both variables TITLE and AUTHOR are not free, since they are in the scope of an all construct. In the term
results {
result { all var TITLE, var AUTHOR }
}
the variable AUTHOR is free, whereas the variable TITLE is not free. A variable is said to be free for a grouping construct, if it is free in the term enclosed by the grouping construct. E.g. in the term all t, all variables that are free in t are free for the outermost all. All free variables in a construct term need to have the same binding in each of the substitutions that are used for grouping.
Consider a slightly modified variant of the previous construct term. Note that only the variable AUTHOR is in the scope of the all construct, while the variable TITLE is free.
result { var TITLE, all var AUTHOR }
The result of applying the set of answer substitutions of Example 1 to this construct term is the following set of data terms:
result {
title { "Vikinga Blot" },
author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
}
result {
title { "Boken Om Vikingarna" },
author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
}
result {
title { "Folket i Birka på Vikingarnas Tid" },
author { last { "Wahl" }, first { "Mats" },
author { last { "Nordqvist" }, first { "Sven" },
author { last { "Ambrosiani" }, first { "Björn" }
}
Note that each of the three resulting data terms uses only one binding for the variable TITLE of the construct term, but groups possibly several bindings of the variable AUTHOR. In each instance (i.e. data term), the grouping construct groups together substitutions that have the same binding for TITLE. As there exists only one substitution for each of the titles ``Vikinga Blot'' and ``Boken Om Vikingarna'', the grouping construct only groups a single substitution in the first two data terms. In the third data term, three substitutions are grouped (each having the same binding for TITLE, but a different binding for AUTHOR).
The grouping constructs all and some are similar to the so-called collection constructs {.} and [.] in XMAS \cite{xmas-intro} and to the grouping construct {.} in XML-RL \cite{xml-rl}.
Nesting of Grouping Constructs
Grouping constructs may be nested to perform more complex restructuring tasks. Recall that a term of the form all t collects all instances of t with different bindings for the free variables in t. If t contains nested grouping constructs, each instance of t is further grouped according to the nested grouping constructs. For example, the construct term
results {
all result {
all var TITLE,
var AUTHOR
}
}
creates for each binding of the variable AUTHOR (i.e.\ the variable that is free for the outer all) an instance of the subterm result. In each instance, the inner all collects all instances of the variable TITLE (that are part of substitutions with the same binding for AUTHOR). Thus, the construct term creates a list of book titles for each author, and groups the result subterms below a results term. Likewise, the construct term
results {
all result {
var TITLE,
all var AUTHOR
}
}
lists for each book title all authors. Intuitively, nested grouping constructs are similar to nested iteration constructs in imperative languages (like for or while loops), where the inner loop performs a complete run for each iteration of the outer loop. Note, however, that nested grouping constructs do not compute the ``cross-product'', but instead have to respect the different answer substitutions: in the example above, every result elements contains a book title, and only the authors of that book, whereas the cross-product would list for each result also the authors of other books. If it is desirable to compute the cross-product, it is necessary to appropriately modify the query/query term such that it selects titles and authors independently.
Explicit Grouping: group by
In many cases, it is desirable to group by variables whose values should not appear in the result, wherefore they are not part of the subterm that is enclosed by a grouping construct. For example, a construct term might group resulting instances based on the position of a row in an HTML table while not including this position (i.e. the integer number) in the result. While this result could be achieved by using several rules (one for creating the result and one for filtering out superfluous parts), this solution is very cumbersome. For this reason, the grouping constructs all and some may be accompanied by a group by clause containing the (additional) variables by which the instances are grouped. Such clauses have the form
all <subterm> group by { <variables> }
or
some <n> <subterm> group by { <variables> }
where <n> is the maximum number of instances for some, <subterm is the subterm of which instances are created, and <variables is a comma-separated list of variables. All these variables are considered to be part of the free variables of the subterm enclosed by the grouping construct and thus used for grouping, regardless of whether they appear in <subterm or not.
Consider an HTML table, the cells containing arbitrary values. The following query term retrieves all cell values, together with row and column number:
desc table {{
position var Row tr {{
position var Col td { var Value }
}}
}}
Now assume that the table should be "transposed", i.e. rows and columns are exchanged. The following construct term creates such a transposed table. Since the positions are necessary for grouping but should not be included in the resulting data term, it uses group by for this purpose:
table [
all tr [
all td [ var Value ] group by { var Row }
] group by { var Col }
]
The construct term is evaluated as follows: For each different binding of Col (Col is the only free variable in the scope of the outer all), an instance of tr [ ... ] is created. Within each instance, the inner all creates an instance of td [ ... ] for each different binding of Row (within the set of substitutions having the same binding for Col).
Sorting: order by
The grouping constructs all and some create sequences of subterms in arbitrary order (although they should try to return results in the same order in which the corresponding subterms appear in the original sources, if possible). In order to sort the resulting sequence according to the bindings for certain variables, the grouping constructs all and some may be augmented by a sorting specification. Sorting specifications are very similar to explicit grouping and have the form
all <subterm> order by (<comparison>) [ <variables> ]or
some <n> <subterm> order by (<comparison>)[ <variables> ]
where <n> is the maximum number of instances for some, <subterm> is the subterm of which instances are created, and <variables> is a comma-separated list of variables. <comparison> is the name of the comparison function to be used in sorting. Comparison functions take as arguments two lists of terms (representing two different substitutions for the variables in <variables>) and return a value indicating whether the first list is less than, equal to, or greater than the second list. The current prototype runtime system supports the two exemplary comparison functions lexical and numeric (both in ascending order); further comparison functions may be programmed natively in the implementation language of the prototype (i.e. Haskell).
The list of variables influences the grouping in two ways: (1) instances are grouped as if the variables occurred in a group by clause (i.e. are considered part of the variables free for the grouping construct) and (2) the instances are sorted on the bindings of the variables in the list using the specified comparison function. In the two exemplary functions, sorting is performed primarily with respect to the first variable in the list and more specific for each of the following variables. For instance, a variable list [var Last,var First] would specify to sort primarily by the last names, and within instances with the same last name sort by the first name.
results {
all result { all var Author, var Title } order by (lexical) [ var Title ]
}
Consider the following query term (evaluated against the XML document representing the data of bookstore A in the sample data:
bib {{
book {{
var Title -> title {{ }}
var Author -> author {{ var First -> first {{ }}, var Last -> last {{ }} }}
}}
}}
The following construct term creates a list of authors for each book title. Authors are sorted by last name and then by first name. Note that grouping is performed on the variable Author, as well as the variables Last and First.
results {
all result {
all var Author order by (lexical) [var Last, var First],
var Title
}
}
Functions and Aggregations
In addition to arranging data in a new structure, it is often desirable to perform some sort of computation to create new content. For example, a bookstore might want to present books with the value added tax added to all prices, or calculate totals for the items contained in a customer's virtual shopping cart. For this reason, construct terms in Xcerpt may contain functions (i.e.\ computations with a fixed number of arguments) and aggregations (i.e.\ computations with a variable number of arguments). Both functions and aggregations take the form
<fname> ( <arguments> )
where <fname> is the function or aggregation name and <arguments> is a comma-separated list of arguments (variables or other non-grouping subterms). In the case of aggregations, <arguments> may also contain the grouping constructs all and some.
table [
th [ td [ "Title" ], td [ "Price Net" ], td [ "VAT" ], td [ "Total" ] ],
all tr [
td [ var Title ],
td [ var Price ],
td [ mult( var Price, 0.16 ) ],
td [ mult( var Price, 1.16 ) ]
],
tr [
td [ "Totals" ],
td [ sum( all var Price ) ],
td [ sum( all mult( var Price, 0.16) ) ],
td [ sum( all mult( var Price, 1.16) ) ],
]
]
Since a type system is not in the scope of this thesis, the current implementation assumes implicit type casting in functions and aggregations. For example, the function mult (for multiplication) implicitly assumes that all parameters are numbers. In order to provide a comprehensive set of functions and aggregations, a type system would however be beneficial.
The current implementation supports a number of exemplary functions and aggregations, which are summarised in Table \ref{table:functions}. For some frequently used functions, this table also gives an abbreviated, infix notation that may be used instead of the more verbose general form. Beyond these, a wide range of functions are conceivable. The document XQuery 1.0 and XPath 2.0 Functions and Operators \cite{xquery-functions} gives an overview over functions that are desirable in Web query languages.
| Name | Abbreviated | Default | Description |
| Functions | |||
| add(n,m) | n + m | - | adds the two numeric arguments n and m |
| sub(n,m) | n - m | - | subtracts the two numeric arguments n and m |
| mult(n,m) | n * m | - | multiplies the two numeric arguments n and m |
| div(n,m) | n / m | - | divides the two numeric arguments n and m |
| concat(n,m) | n ++ m | - | concatenates the two string arguments n and m |
| glb(n,m) | - | - | calculates the greatest lower bound of two terms n and m wrt. simulation order |
| lub(n,m) | - | - | calculates the least upper bound of two terms n and m wrt. simulation order |
| Aggregations | |||
| count(...) | - | 0 | count the number of arguments |
| sum(...) | - | 0 | compute the sum of all (numeric) arguments |
| avg(...) | - | NaN | compute the average of all (numeric) arguments |
| min(...) | - | +inf | compute the minimum of all (numeric) arguments |
| max(...) | - | -inf | compute the maximum of all (numeric) arguments |
| join(...) | - | "" | join all string arguments to a single string |
| first(...) | - | exception | return the first argument |
| reverse(...) | - | empty | return the arguments in reverse order |
Functions and Aggregations in Construct Terms: Exemplary functions and aggregations available in construct terms. All functions and aggregations perform an implicit type casting to the type given (e.g. "numeric" or "string"). The default value for aggregation functions is used in case the argument list is empty; exception indicates a runtime error and empty the empty list. For normal functions, default values are not applicable.
Optional Subterms: optional
Recall that the construct optional in query terms allows to express that certain subterms of a query term need only be matched if a corresponding subterm exists in the data term against which the query term is evaluated. In case the optional subterm contains variables, it might happen that some of the substitutions resulting from the evaluation of the query do not contain bindings for these variables (as the corresponding subterm did not participate in the matching). As a consequence, construct terms containing such variables need to make provisions for such cases. This is expressed by marking subterms containing variables that are possibly unbound as optional. Like in query terms, such subterms take the form
optional <subterm>but it is also possible to add a default value to be used if no instance can be created as in
optional <subterm> with default <default>
where <subterm> is the subterm containing the optional variables. In case at least one of the variables in <subterm> is not bound (i.e. no ground instance can be created), the first form simply omits the optional subterm, whereas the second form substitutes the subterm <default> for <subterm>. <default> may be any construct term.
students {{
student {{
name { var Name },
optional matrnr { var MatrNr }
}}
}}
Assume that a teacher wants to create an HTML table listing all student names
with inscription numbers, leaving columns empty for each substitution that does
not contain a binding for the variable MatrNr. The corresponding
construct term would look as follows:
table [
all tr [
td [ var Name ],
td [ optional var MatrNr ]
]
]
By using a default specification, it us also possible to insert the string
"unknown" instead of simply leaving the columns empty for those
substitutions that do not contain a value for MatrNr, as in the
following construct term:
table [
all tr [
td [ var Name ],
td [ optional var MatrNr with default "unknown" ]
]
]
The optional does not necessarily prefix the variable immediately, but
may instead enclose a whole subterm containing optional variables; the following
construct term does not generate a second column if there is no inscription
number available, instead of leaving the second column empty:
table [
all tr [
td [ var Name ],
optional td [ var MatrNr ]
]
]
Grouping Constructs and Optional Subterms
As mentioned above, the grouping constructs all and some require the existence of at least one instance of the enclosed subterms, because a query only succeeds if there exists at least one binding for its variables. With optional, this restriction can be lifted: query terms with optional subterms might match while not yielding any bindings for the variables occurring in the grouping construct. With optional in the construct term, it is thus possible to express possibly empty groupings:
students {{
student {{
name { var Name },
optional exercise {{ score { var Score } }}
}}
}}
Given the set of answer substitutions of this query term, the following
construct term computes the sum of all exercises for each student. If no
exercise has been submitted, the sum is 0:
scores {
all student {
name { var name },
total { sum( optional all var Score with default 0 ) }
}
}
In case no exercise has been submitted (i.e.\ there exists no binding for the
variable Score for a student), there exists no instance for all var Score
and the default value of 0 is used as the only argument to the aggregation function
sum(...). Otherwise, a sequence of scores is created and summed up using
sum(...). Since the default value of sum(...) happens to be 0 if the number of
arguments is zero (cf.\ Table \ref{table:functions}), the with default clause
may also be omitted in this case. Alternatively, the construct term could be written as
scores {
all student {
name { var name },
total { optional sum( all var Score ) with default 0 }
}
}
In this case, the aggregation function sum(...) is not evaluated if there
is no instance for all var Score; the value 0 is substituted without any
further computation.
Last modified 2005-07-14 02:56 PM