View on GitHub

XQuery Summer Institute

Advancing XML-Based Scholarship from Representation to Discovery


The goal of the two week Institute will be for every participant to leave with a sufficient understanding of XQuery and XML databases for them to continue working productively with those tools at their home institutions. While designed for scholars who have already gained some experience with XML-based markup languages, the Institute will not take any background knowledge for granted.

Broadly speaking, the first week will cover the basics of XML, TEI, XHTML and related standards before proceeding to an introduction to XPath and XQuery. The second week will introduce participants to the practical uses of XQuery in XML databases, leading up to a workshop on how to develop full-blown digital projects using nothing other than XQuery and the eXist database. The morning sessions of the Institute will consist of formal instruction while the afternoons will feature workshops and practical exercises.

Week One

The first day will review XML essentials. During the morning session, participants will review the fundamentals of XML. Particular attention will be given to features which trip up intermediate users such as namespaces, character entities, and the use of CDATA sections in XML. In the afternoon workshop, participants will practice encoding a number of plain text documents in XML.

The second day will review the basics of the TEI P5 specification using the Folger Library’s digital texts of Shakespeare as a reference. In the afternoon, participants will mark up texts in TEI and discuss how they made encoding decisions.

The third day will introduce the XML Path Language (XPath). In the morning session, participants will use XPath expressions to navigate XML document hierarchies. The full set of XPath axes will be covered along with their abbreviated syntax. The morning session will also cover how to use node tests and predicates. During the afternoon, participants will work through practical exercises using XPath to navigate XML documents.

The fundamentals of the XQuery language will be introduced on the fourth day of the Institute. After introducing the concept of functional programming languages, participants will learn how to write FLWOR expressions, which form the building blocks of XQuery. During the afternoon, participants will write simple XQuery expressions to query the Folger Library’s editions of Shakespeare in TEI.

The Friday of the first week will conclude with an introduction to functions and operators in XPath and XQuery. The morning session will explore the standard library of functions, highlighting the functions that work on textual data. During the afternoon, the class will write simple XQuery programs that combine FLWOR expressions with a range of built-in functions to render the Shakespeare TEI documents as XHTML.

Week Two

The goal of the second week is to teach participants practical uses of XQuery for the digital humanities. We will reserve time during the second week for participants to discuss their projects and solicit feedback from fellow participants and Institute consultants. Where feasible, the afternoon workshops will deal with challenging problems raised by participants.

On Monday, participants will conduct full-text searches with XQuery in the eXist database. The morning session will introduce eXist and its extension functions. During the afternoon session, participants will use FLWOR expressions to conduct full-text searches of their documents and filter the result set.

The concept of user-defined functions will be explained the following day. Participants will write custom functions and organize those functions into modules in order to build large systems and to share code with others. The morning will also treat the type system and error handling. The class will rewrite the expressions from the previous day as custom functions during the afternoon session.

On Wednesday, the class will learn how to index their documents efficiently in eXist and how to connect them with other sources of information on the web. The morning session will review the fundamentals of indexing in eXist and how to “tune” indexes to different kinds of XML documents. The afternoon session will teach participants how to create mash-ups with their documents by connecting to online application programming interfaces (APIs). As a practical exercise, the class will connect with an online entity extraction API to identify people, places, and events in their XML documents. Participants will reflect on the benefits and drawbacks of using entity extraction services for digital humanities projects.

On Thursday, participants will learn to build a web interface for the Folger Library’s edition of Shakespeare in TEI. The morning session will draw together what the participants have learned to this point to create a rich application interface that accepts search terms and displays formatted results for users. During the afternoon session, participants will have the opportunity to add features to their web interface and to customize it according to their interests.

The final day of class will cover the deployment of eXist databases to the “cloud.” The morning session will show how eXist databases may be moved from participants’ desktops to cloud-based servers. During the afternoon, participants will discuss the advantages of using XQuery and the eXist database to deploy TEI- based projects publicly and we will handle any final questions about deployment.

View the Syllabus