Distribution design for complex value databases : a dissertation presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems at Massey University
Loading...
Date
2007
DOI
Open Access Location
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Massey University
Rights
The Author
Abstract
Distribution design for databases usually addresses the problems of fragmentation, allocation
and replication. However, the main purposes of distribution are to improve performance and
to increase system reliability. The former aspect is particularly relevant in cases where the
desire to distribute data originates from the distributed nature of an organization with many
data needs only arising locally, i.e., some data are retrieved and processed at only one or at
most very few locations. Therefore, query optimization should be treated as an intrinsic part
of distribution design. Due to the interdependencies between fragmentation, allocation and
distributed query optimization it is not efficient to study each of the problems in isolation
to get overall optimal distribution design. However, the combined problem of fragmentation,
allocation and distributed query optimization is NP-hard, and thus requires heuristics to
generate efficient solutions.
In this thesis the foundations of fragmentation and allocation in databases on query processing
are investigated using a query cost model. The considered databases are defined on
complex value data models, which capture complex value, object-oriented and XML-based
databases. The emphasis on complex value databases enables a large variety of schema fragmentation,
while at the same time it imposes restrictions on the way schemata can be fragmented.
It is shown that the allocation of locations to the nodes of an optimized query tree
is only marginally affected by the allocation of fragments. This implies that optimization of
query processing and optimization of fragment allocation are largely orthogonal to each other,
leading to several scenarios for fragment allocation. Therefore, it is reasonable to assume that
optimized queries are given with subqueries having selection and projection operations applied
to leaves. With this assumption some heuristic procedures can be developed to find
an “optimal” fragmentation and allocation. In particular, cost-based algorithms for primary
horizontal and derived horizontal fragmentation, vertical fragmentation are presented.
Description
Keywords
Distributed databases