Managing scientific data is by no means a trivial task even in a single site environment
with a small number of researchers involved. We discuss some issues concerned with posing
well-specified experiments in terms of parameters or instrument settings and the metadata
framework that arises from doing so. We are particularly interested in parallel computer
simulation experiments, where very large quantities of warehouse-able data are involved. We
consider SQL databases and other framework technologies for manipulating experimental data.
Our framework manages the the outputs from parallel runs that arise from large cross-products
of parameter combinations. Considerable useful experiment planning and analysis can be done
with the sparse metadata without fully expanding the parameter cross-products. Extra value
can be obtained from simulation output that can subsequently be data-mined. We have
particular interests in running large scale Monte-Carlo physics model simulations. Finding
ourselves overwhelmed by the problems of managing data and compute ¿resources, we have
built a prototype tool using Java and MySQL that addresses these issues. We use this example
to discuss type-space management and other fundamental ideas for implementing a laboratory
information management system.
James, H.A., Hawick, K.A. (2005), Sparse cross-products of metadata in scientific simulation management, Research Letters in the Information and Mathematical Sciences, 7, 89-115