Dependencies in complex-value databases : a dissertation presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems at Massey University

Loading...
Thumbnail Image
Date
2004
DOI
Open Access Location
Journal Title
Journal ISSN
Volume Title
Publisher
Massey University
Rights
The Author
Abstract
The relational data model has been the dominant model in database design for more than three decades. It considers data to be stored in matrices where rows correspond to individuals, columns correspond to attributes, and every cell constains [sic] a single atomic value. However, today's database technology trends, e.g. spatial, genetic or web-based data, require extended data models. Within the last decade new, complex-value data models such as the higher-order entity-relationship model, object-oriented data models, semi-structured data models, and XML have evolved which allow cells to contain lists, sets, multisets, trees, matrices or even more complex type constructors, references to other cells (which lead to infinite structures), and null values (indicating missing, unknown or vague data). Matrices as such allow the storage of inconsistent data, invalid in the semantic sense. As this is not acceptable, additional requirements called dependencies have to be formulated when designing a database. The correct specification and use of dependencies needs a sound mathematical basis. For the relational data model more than 90 different classes of dependencies have been defined and studied intensively. The major problems in dependency theory are the axiomatisability of classes of dependencies, determination of the closure of a chosen set of dependencies (as certain dependencies can be implied by others) and the characterisation of semantically desirable properties for well-designed databases (such as absence of redundancies or abnormal update behavior) by syntactic properties on closed sets of dependencies. With few exceptions research has only dealt with dependencies for the relational data model. Only recently, the emergence of XML as the standard format for web-based data and the rapidly increasing usage of persistent XML databases revealed the lack of a sound mathematical basis for complex-value data models. If they are expected to serve as first class data models they require a theoretical investigation of issues like integrity, consistency, data independence, recovery, redundancy, access rights, views and integration. The goal of this thesis is to develop a dependency theory for complex-value databases that is independent from any individual data model. Therefore, an abstract algebraic approach is taken that can be adapted to the presence of different combinations of type constructors such as records, lists, sets and multisets. Data models are classified according to the data types they support. In this framework the major objective is to initiate research on the following problems - investigate the axiomatisation of important dependency classes, relevant to complex-value data models, by sound and complete sets of inference rules that permit the determination of all dependencies implied by some chosen set of dependencies. - characterise semantically desirable properties by normal forms for complex-value data models and investigate whether these normal forms can always be achieved without violating other desirable properties. - develop efficient algorithms for determining the closure of a chosen set of dependencies and for restructuring databases such that normal forms are satisfied and no information is lost. In a single thesis it is impossible to consider all classes of relational dependencies in all different combinations of type constructors. Therefore the focus is put on extending two popular classes of relational dependencies: functional and multi-valued dependencies. The axiomatisation and implication of functional dependencies is investigated for all combinations of record, list, set and multiset type. Furthermore, a normal form with respect to functional dependencies in the presence of records and lists is proposed and semantically justified. It is also shown how to obtain databases which are in this normal form. Finally, axiomatisation and implication for the class of multi-valued dependencies and the class of functional and multi-valued dependencies are studied in the context of records and lists. The work of this thesis may lead to a unified dependency theory for complex-value data models.
Description
Keywords
Database design, Database management, Relational databases
Citation