Distribution design for complex value databases : a dissertation presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems at Massey University

Thumbnail Image
Open Access Location
Journal Title
Journal ISSN
Volume Title
Massey University
The Author
Distribution design for databases usually addresses the problems of fragmentation, allocation and replication. However, the main purposes of distribution are to improve performance and to increase system reliability. The former aspect is particularly relevant in cases where the desire to distribute data originates from the distributed nature of an organization with many data needs only arising locally, i.e., some data are retrieved and processed at only one or at most very few locations. Therefore, query optimization should be treated as an intrinsic part of distribution design. Due to the interdependencies between fragmentation, allocation and distributed query optimization it is not efficient to study each of the problems in isolation to get overall optimal distribution design. However, the combined problem of fragmentation, allocation and distributed query optimization is NP-hard, and thus requires heuristics to generate efficient solutions. In this thesis the foundations of fragmentation and allocation in databases on query processing are investigated using a query cost model. The considered databases are defined on complex value data models, which capture complex value, object-oriented and XML-based databases. The emphasis on complex value databases enables a large variety of schema fragmentation, while at the same time it imposes restrictions on the way schemata can be fragmented. It is shown that the allocation of locations to the nodes of an optimized query tree is only marginally affected by the allocation of fragments. This implies that optimization of query processing and optimization of fragment allocation are largely orthogonal to each other, leading to several scenarios for fragment allocation. Therefore, it is reasonable to assume that optimized queries are given with subqueries having selection and projection operations applied to leaves. With this assumption some heuristic procedures can be developed to find an “optimal” fragmentation and allocation. In particular, cost-based algorithms for primary horizontal and derived horizontal fragmentation, vertical fragmentation are presented.
Distributed databases