Skip to content

Xcerpt

Sections
Personal tools
You are here: Home » Documentation » Language » Construct Terms
Xcerpt Team
Researchers:
Sebastian Schaffert
François Bry
Sacha Berger
Tim Furche
Paula Patranjan
Michael Eckert

Students:
Mira Blazheva
Oliver Bolzer
Michael Brade
Raja Gigova
Clemens Ley
Inna Romanenko
Andreas Schroeder
Christoph Wieser
 

Construct Terms

Document Actions
Construct terms serve to reassemble variable bindings, which are determined by query terms, so as to form new data terms. Whereas query terms are patterns for the data (and thus may contain partial term specifications), construct terms are patterns for the result (and thus may only contain total term specifications). Construct terms may furthermore contain variables (but no -> restrictions), and so-called grouping constructs used for grouping different substitutions. Like in data terms, both constructs [ ] and { } may occur in construct terms for expressing ordered and unordered sequences of subterms. The constructs [[ ]] and {{ }} are not allowed, as they express partial term specifications which do not make sense when constructing data items.

Variables

In construct terms, variables serve as place holders for the subterms they are bound to. Construct terms may contain label variables, namespace variables, and subterm variables without restriction. Allowing variable restrictions in construct terms is not desirable, as a construct term is merely a specification of how the variables should be reassembled and is not intended to constrain the set of possible variable bindings. Obviously, however, label variables may take only values that are admissible as term labels and namespace variables may only be bound to URIs or IRIs.

In the bookstore example from Section \ref{sec:examples:bookstore}, assume that there is the following set of answer substitutions for the variables Title and Author:

σ1 Title title { "Vikinga Blot" }
Author author { last { "Ingelman-Sundberg" }, first { "Catharina" }}
σ2 Title title { "Boken Om Vikingarna" }
Author author { last { "Ingelman-Sundberg" }, first { "Catharina" }}
σ3 Title title { "Folket i Birka på Vikingarnas Tid" }
Author author { last { "Wahl" }, first { "Mats" }}
σ4 Title title { "Folket i Birka på Vikingarnas Tid" }
Author author { last { "Nordqvist" }, first { "Sven" }}
σ5 Title title { "Folket i Birka på Vikingarnas Tid" }
Author author { last { "Ambrosiani" }, first { "Björn" }}

The following construct term collects a single title/author pair for these substitutions (one for each substitution):

results {
  result { var Title, var Author }
}
  

The result of applying the substitutions above to this construct term are the following five data terms:

results {
  result {
    title { "Vikinga Blot" },
    author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
  }
}

results {
  result {
    title { "Boken Om Vikingarna" },
    author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
  }
}

results {
  result {
    title { "Folket i Birka på Vikingarnas Tid" },
    author { last { "Wahl" }, first { "Mats" } }
  }
}

results {
  result {
    title { "Folket i Birka på Vikingarnas Tid" },
    author { last { "Nordqvist" }, first { "Sven" } }
  }
}

results {
  result {
    title { "Folket i Birka på Vikingarnas Tid" },
    author { last { "Ambrosiani" }, first { "Björn" } }
  }
}
  

Grouping and Sorting: all and some

It is often desirable to collect all bindings for a variable in a single answer term. The grouping constructs all and some serve this purpose:

  • all groups all possible instances of the enclosed subterms resulting from different variable bindings as children of the enclosing term. At least one instance has to exist, and the number of instances always needs to be finite (otherwise the program does not terminate).
  • some groups non-deterministically some of the possible instances of the enclosed subterms resulting from variable bindings as children of the enclosing term. Some is quantified by a number which restricts the (maximum) number of alternatives to use. At least one instance has to exist.

The requirement that there has to exist at least one instance in both grouping constructs may seem unintuitive. However, a construct term can only be evaluated if the rule it is part of ``fires'', i.e.\ the query part succeeds and thus yields at least one substitution for the variables occurring in the query. If this behaviour is not desired, the grouping constructs can be combined with optional (see below).

Grouping Constructs:

Consider again the substitutions of Example 1. The following construct term creates a list of result subterms (one for each title/author combination from the substitutions) below a results term using the all-construct to collect all instances:

results {
  all result { var TITLE, var AUTHOR }
}
  
The result of applying the substitutions to this construct term might be the following data term (compare with the set of data terms from Example 1):
results {
  result {
    title { "Vikinga Blot" },
    author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
  },
  result {
    title { "Boken Om Vikingarna" },
    author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
  },
  result {
    title { "Folket i Birka på Vikingarnas Tid" },
    author { last { "Wahl" }, first { "Mats" } }
  },
  result {
    title { "Folket i Birka på Vikingarnas Tid" },
    author { last { "Nordqvist" }, first { "Sven" } }
  },
  result {
    title { "Folket i Birka på Vikingarnas Tid" },
    author { last { "Ambrosiani" }, first { "Björn" } }
  }
}
  

Formally, all t or some n t denote the grouping of all or some instances of t obtained from all possible bindings of the variables that are free in the term t. Subterms of t that again have the form all t' or some n' t' are recursively evaluated in the same manner (see below). A variable is free in a (sub)term t, if it (1) occurs in t, and (2) is not in the scope of another, nested grouping construct. E.g. in the term

results {
  all result { var TITLE, var AUTHOR }
}

both variables TITLE and AUTHOR are not free, since they are in the scope of an all construct. In the term

results {
  result { all var TITLE, var AUTHOR }
}

the variable AUTHOR is free, whereas the variable TITLE is not free. A variable is said to be free for a grouping construct, if it is free in the term enclosed by the grouping construct. E.g. in the term all t, all variables that are free in t are free for the outermost all. All free variables in a construct term need to have the same binding in each of the substitutions that are used for grouping.

Consider a slightly modified variant of the previous construct term. Note that only the variable AUTHOR is in the scope of the all construct, while the variable TITLE is free.

result { var TITLE, all var AUTHOR }
  

The result of applying the set of answer substitutions of Example 1 to this construct term is the following set of data terms:

result {
  title { "Vikinga Blot" },
  author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
}

result {
  title { "Boken Om Vikingarna" },
  author { last { "Ingelman-Sundberg" }, first { "Catharina" } }
}
 
result {
  title { "Folket i Birka på Vikingarnas Tid" },
  author { last { "Wahl" }, first { "Mats" },
  author { last { "Nordqvist" }, first { "Sven" },
  author { last { "Ambrosiani" }, first { "Björn" }
}
  

Note that each of the three resulting data terms uses only one binding for the variable TITLE of the construct term, but groups possibly several bindings of the variable AUTHOR. In each instance (i.e. data term), the grouping construct groups together substitutions that have the same binding for TITLE. As there exists only one substitution for each of the titles ``Vikinga Blot'' and ``Boken Om Vikingarna'', the grouping construct only groups a single substitution in the first two data terms. In the third data term, three substitutions are grouped (each having the same binding for TITLE, but a different binding for AUTHOR).

The grouping constructs all and some are similar to the so-called collection constructs {.} and [.] in XMAS \cite{xmas-intro} and to the grouping construct {.} in XML-RL \cite{xml-rl}.

Nesting of Grouping Constructs

Grouping constructs may be nested to perform more complex restructuring tasks. Recall that a term of the form all t collects all instances of t with different bindings for the free variables in t. If t contains nested grouping constructs, each instance of t is further grouped according to the nested grouping constructs. For example, the construct term

results {
  all result {
     all var TITLE,
     var AUTHOR
  }
}

creates for each binding of the variable AUTHOR (i.e.\ the variable that is free for the outer all) an instance of the subterm result. In each instance, the inner all collects all instances of the variable TITLE (that are part of substitutions with the same binding for AUTHOR). Thus, the construct term creates a list of book titles for each author, and groups the result subterms below a results term. Likewise, the construct term

results {
  all result {
     var TITLE,
     all var AUTHOR
  }
}

lists for each book title all authors. Intuitively, nested grouping constructs are similar to nested iteration constructs in imperative languages (like for or while loops), where the inner loop performs a complete run for each iteration of the outer loop. Note, however, that nested grouping constructs do not compute the ``cross-product'', but instead have to respect the different answer substitutions: in the example above, every result elements contains a book title, and only the authors of that book, whereas the cross-product would list for each result also the authors of other books. If it is desirable to compute the cross-product, it is necessary to appropriately modify the query/query term such that it selects titles and authors independently.

Explicit Grouping: group by

In many cases, it is desirable to group by variables whose values should not appear in the result, wherefore they are not part of the subterm that is enclosed by a grouping construct. For example, a construct term might group resulting instances based on the position of a row in an HTML table while not including this position (i.e. the integer number) in the result. While this result could be achieved by using several rules (one for creating the result and one for filtering out superfluous parts), this solution is very cumbersome. For this reason, the grouping constructs all and some may be accompanied by a group by clause containing the (additional) variables by which the instances are grouped. Such clauses have the form

all <subterm> group by { <variables> }
or
some <n> <subterm> group by { <variables> }

where <n> is the maximum number of instances for some, <subterm is the subterm of which instances are created, and <variables is a comma-separated list of variables. All these variables are considered to be part of the free variables of the subterm enclosed by the grouping construct and thus used for grouping, regardless of whether they appear in <subterm or not.

Explicit Grouping:

Consider an HTML table, the cells containing arbitrary values. The following query term retrieves all cell values, together with row and column number:

desc table {{
  position var Row tr {{
    position var Col td { var Value }
  }}
}}

Now assume that the table should be "transposed", i.e. rows and columns are exchanged. The following construct term creates such a transposed table. Since the positions are necessary for grouping but should not be included in the resulting data term, it uses group by for this purpose:

table [
  all tr [
    all td [ var Value ] group by { var Row }
  ] group by { var Col }
]

The construct term is evaluated as follows: For each different binding of Col (Col is the only free variable in the scope of the outer all), an instance of tr [ ... ] is created. Within each instance, the inner all creates an instance of td [ ... ] for each different binding of Row (within the set of substitutions having the same binding for Col).

Sorting: order by

The grouping constructs all and some create sequences of subterms in arbitrary order (although they should try to return results in the same order in which the corresponding subterms appear in the original sources, if possible). In order to sort the resulting sequence according to the bindings for certain variables, the grouping constructs all and some may be augmented by a sorting specification. Sorting specifications are very similar to explicit grouping and have the form

all <subterm> order by (<comparison>) [ <variables> ] 
or
some <n> <subterm> order by (<comparison>)[ <variables> ]

where <n> is the maximum number of instances for some, <subterm> is the subterm of which instances are created, and <variables> is a comma-separated list of variables. <comparison> is the name of the comparison function to be used in sorting. Comparison functions take as arguments two lists of terms (representing two different substitutions for the variables in <variables>) and return a value indicating whether the first list is less than, equal to, or greater than the second list. The current prototype runtime system supports the two exemplary comparison functions lexical and numeric (both in ascending order); further comparison functions may be programmed natively in the implementation language of the prototype (i.e. Haskell).

The list of variables influences the grouping in two ways: (1) instances are grouped as if the variables occurred in a group by clause (i.e. are considered part of the variables free for the grouping construct) and (2) the instances are sorted on the bindings of the variables in the list using the specified comparison function. In the two exemplary functions, sorting is performed primarily with respect to the first variable in the list and more specific for each of the following variables. For instance, a variable list [var Last,var First] would specify to sort primarily by the last names, and within instances with the same last name sort by the first name.

Sort the list of books by the book titles in ascending lexical order:
results {
  all result { all var Author, var Title } order by (lexical) [ var Title ] 
}

Consider the following query term (evaluated against the XML document representing the data of bookstore A in the sample data:

bib {{
  book {{
    var Title -> title {{ }}
    var Author -> author {{ var First -> first {{ }}, var Last -> last {{ }} }}
  }}
}}
  

The following construct term creates a list of authors for each book title. Authors are sorted by last name and then by first name. Note that grouping is performed on the variable Author, as well as the variables Last and First.

results {
  all result { 
    all var Author order by (lexical) [var Last, var First], 
    var Title
  }
}
  

Functions and Aggregations

In addition to arranging data in a new structure, it is often desirable to perform some sort of computation to create new content. For example, a bookstore might want to present books with the value added tax added to all prices, or calculate totals for the items contained in a customer's virtual shopping cart. For this reason, construct terms in Xcerpt may contain functions (i.e.\ computations with a fixed number of arguments) and aggregations (i.e.\ computations with a variable number of arguments). Both functions and aggregations take the form

<fname> ( <arguments> )

where <fname> is the function or aggregation name and <arguments> is a comma-separated list of arguments (variables or other non-grouping subterms). In the case of aggregations, <arguments> may also contain the grouping constructs all and some.

Shopping Cart: Adding the VAT and Computing Totals: Consider the XML document representing the data of bookstore A in the example documents. The following construct term might be used to create an HTML presentation of books in a shopping cart were prices are shown both without and with value added tax (in this case: 16%). The last row computes totals for all prices.
table [
  th [ td [ "Title" ], td [ "Price Net" ], td [ "VAT" ], td [ "Total" ] ],
  all tr [
    td [ var Title ],
    td [ var Price ],
    td [ mult( var Price, 0.16 ) ],
    td [ mult( var Price, 1.16 ) ]
  ],
  tr [
    td [ "Totals" ],
    td [ sum( all var Price ) ],
    td [ sum( all mult( var Price, 0.16) ) ],
    td [ sum( all mult( var Price, 1.16) ) ],
  ]
]
  

Since a type system is not in the scope of this thesis, the current implementation assumes implicit type casting in functions and aggregations. For example, the function mult (for multiplication) implicitly assumes that all parameters are numbers. In order to provide a comprehensive set of functions and aggregations, a type system would however be beneficial.

The current implementation supports a number of exemplary functions and aggregations, which are summarised in Table \ref{table:functions}. For some frequently used functions, this table also gives an abbreviated, infix notation that may be used instead of the more verbose general form. Beyond these, a wide range of functions are conceivable. The document XQuery 1.0 and XPath 2.0 Functions and Operators \cite{xquery-functions} gives an overview over functions that are desirable in Web query languages.

Name Abbreviated Default Description
Functions
add(n,m) n + m - adds the two numeric arguments n and m
sub(n,m) n - m - subtracts the two numeric arguments n and m
mult(n,m) n * m - multiplies the two numeric arguments n and m
div(n,m) n / m - divides the two numeric arguments n and m
concat(n,m) n ++ m - concatenates the two string arguments n and m
glb(n,m) - - calculates the greatest lower bound of two terms n and m wrt. simulation order
lub(n,m) - - calculates the least upper bound of two terms n and m wrt. simulation order
Aggregations
count(...) - 0 count the number of arguments
sum(...) - 0 compute the sum of all (numeric) arguments
avg(...) - NaN compute the average of all (numeric) arguments
min(...) - +inf compute the minimum of all (numeric) arguments
max(...) - -inf compute the maximum of all (numeric) arguments
join(...) - "" join all string arguments to a single string
first(...) - exception return the first argument
reverse(...) - empty return the arguments in reverse order

Functions and Aggregations in Construct Terms: Exemplary functions and aggregations available in construct terms. All functions and aggregations perform an implicit type casting to the type given (e.g. "numeric" or "string"). The default value for aggregation functions is used in case the argument list is empty; exception indicates a runtime error and empty the empty list. For normal functions, default values are not applicable.

Optional Subterms: optional

Recall that the construct optional in query terms allows to express that certain subterms of a query term need only be matched if a corresponding subterm exists in the data term against which the query term is evaluated. In case the optional subterm contains variables, it might happen that some of the substitutions resulting from the evaluation of the query do not contain bindings for these variables (as the corresponding subterm did not participate in the matching). As a consequence, construct terms containing such variables need to make provisions for such cases. This is expressed by marking subterms containing variables that are possibly unbound as optional. Like in query terms, such subterms take the form

optional <subterm>
but it is also possible to add a default value to be used if no instance can be created as in
optional <subterm> with default <default>

where <subterm> is the subterm containing the optional variables. In case at least one of the variables in <subterm> is not bound (i.e. no ground instance can be created), the first form simply omits the optional subterm, whereas the second form substitutes the subterm <default> for <subterm>. <default> may be any construct term.

Consider the XML document representing the student database of Section \ref{sec:examples:studentdb}. The following query term retrieves the student name and optionally his inscription number (contained in the subterm labelled matrnr) from this document. Note the use of optional to indicate optional selections.
students {{
  student {{
    name { var Name },
    optional matrnr { var MatrNr }
  }}
}}
Assume that a teacher wants to create an HTML table listing all student names with inscription numbers, leaving columns empty for each substitution that does not contain a binding for the variable MatrNr. The corresponding construct term would look as follows:
table [
  all tr [
    td [ var Name ],
    td [ optional var MatrNr ]
  ]
]
By using a default specification, it us also possible to insert the string "unknown" instead of simply leaving the columns empty for those substitutions that do not contain a value for MatrNr, as in the following construct term:
table [
  all tr [
    td [ var Name ],
    td [ optional var MatrNr with default "unknown" ]
  ]
]
The optional does not necessarily prefix the variable immediately, but may instead enclose a whole subterm containing optional variables; the following construct term does not generate a second column if there is no inscription number available, instead of leaving the second column empty:
table [
  all tr [
    td [ var Name ],
    optional td [ var MatrNr ]
  ]
]

Grouping Constructs and Optional Subterms

As mentioned above, the grouping constructs all and some require the existence of at least one instance of the enclosed subterms, because a query only succeeds if there exists at least one binding for its variables. With optional, this restriction can be lifted: query terms with optional subterms might match while not yielding any bindings for the variables occurring in the grouping construct. With optional in the construct term, it is thus possible to express possibly empty groupings:
Grouping and Optional: The following query term selects student names and scores of submitted exercises. The subterm labelled exercises is marked optional in order to also select students without any exercise submissions.
students {{
  student {{
    name { var Name },
    optional exercise {{ score { var Score } }}
  }}
}}
Given the set of answer substitutions of this query term, the following construct term computes the sum of all exercises for each student. If no exercise has been submitted, the sum is 0:
scores {
  all student {
    name { var name },
    total { sum( optional all var Score with default 0 ) }
  }
}
In case no exercise has been submitted (i.e.\ there exists no binding for the variable Score for a student), there exists no instance for all var Score and the default value of 0 is used as the only argument to the aggregation function sum(...). Otherwise, a sequence of scores is created and summed up using sum(...). Since the default value of sum(...) happens to be 0 if the number of arguments is zero (cf.\ Table \ref{table:functions}), the with default clause may also be omitted in this case. Alternatively, the construct term could be written as
scores {
  all student {
    name { var name },
    total { optional sum( all var Score ) with default 0 }
  }
}
In this case, the aggregation function sum(...) is not evaluated if there is no instance for all var Score; the value 0 is substituted without any further computation.
In practise, it is irrelevant whether the optional encloses the grouping construct (as in the examples above) or vice versa; both approaches are reasonable.
Created by wastl
Last modified 2005-07-14 02:56 PM
 

Powered by Plone

This site conforms to the following standards: