The University of Sheffield
School of Computer Science

Henry Noble Undergraduate Dissertation 2017/18

A Comparative Analysis of Common XML Labelling Schemes

Supervised by S.North

Abstract

The hierarchical structure of XML data allows its interpretation as a tree. Assigning 'labels' to each element in an XML tree can facilitate query processing and provide useful positional information. A correct, efficient labelling scheme expedites query execution whilst minimising the need for additional data storage.

XML files can be described as dynamic if a file is frequently altered, or static if the file is never modified. Labelling static XML is a fairly basic task, but dynamic XML can be difficult to label. Multiple elements are often assigned the same label and newly inserted elements may receive incorrect labels.

In this dissertation, a plethora of labelling schemes for static and dynamic XML are tested for correctness. Each scheme is applied to a selection of XML data and tested for speed and storage efficiency, with the results indicating that labelling schemes can differ greatly in these traits. Additionally, the software has the capability to display a visual representation of an XML tree.