Journal Articles

Permanent URI for this collectionhttps://mro.massey.ac.nz/handle/10179/7915

Browse

Search Results

Now showing 1 - 4 of 4
  • Item
    Compressing DNA sequence databases with coil
    (BioMed Central, 2008-05-20) White, W. Timothy J.; Hendy, Michael D.
    Background: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results: We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion: coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.
  • Item
    LineageSpecificSeqgen: generating sequence data with lineage-specific variation in the proportion of variable sites
    (Biomed Central, 2008-11-21) Grievink, Liat Shavit; Penny, David; Hendy, Mike D; Holland, Barbara R
    Background: Commonly used phylogenetic models assume a homogeneous evolutionary process throughout the tree. It is known that these homogeneous models are often too simplistic, and that with time some properties of the evolutionary process can change (due to selection or drift). In particular, as constraints on sequences evolve, the proportion of variable sites can vary between lineages. This affects the ability of phylogenetic methods to correctly estimate phylogenetic trees, especially for long timescales. To date there is no phylogenetic model that allows for change in the proportion of variable sites, and the degree to which this affects phylogenetic reconstruction is unknown. Results: We present LineageSpecificSeqgen, an extension to the seq-gen program that allows generation of sequences with both changes in the proportion of variable sites and changes in the rate at which sites switch between being variable and invariable. In contrast to seq-gen and its derivatives to date, we interpret branch lengths as the mean number of substitutions per variable site, as opposed to the mean number of substitutions per site (which is averaged over all sites, including invariable sites). This allows specification of the substitution rates of variable sites, independently of the proportion of invariable sites. Conclusion: LineageSpecificSeqgen allows simulation of DNA and amino acid sequence alignments under a lineage-specific evolutionary process. The program can be used to test current models of evolution on sequences that have undergone lineage-specific evolution. It facilitates the development of both new methods to identify such processes in real data, and means to account for such processes. The program is available at: http://awcmee.massey.ac.nz/downloads.htm.
  • Item
    Point trajectory planning of flexible redundant robot manipulators using genetic algorithms
    (Cambridge, 2002) Yue, Shigang; Henrich, Dominik; Xu, W. L.; Tso, S. K.
    The paper focuses on the problem of point-to-point trajectory planning for flexible redundant robot manipulators (FRM) in joint space. Compared with irredundant flexible manipulators, a FRM possesses additional possibilities during point-to-point trajectory planning due to its kinematics redundancy. A trajectory planning method to minimize vibration and/or executing time of a point-to-point motion is presented for FRMs based on Genetic Algorithms (GAs). Kinematics redundancy is integrated into the presented method as planning variables. Quadrinomial and quintic polynomial are used to describe the segments that connect the initial, intermediate, and final points in joint space. The trajectory planning of FRM is formulated as a problem of optimization with constraints. A planar FRM with three flexible links is used in simulation. Case studies show that the method is applicable.
  • Item
    Internet-based 'social sharing' as a new form of global production: The case of SETI@home
    (Elsevier, 2008) Engelbrecht HA
    Benkler (Sharing nicely: on shareable goods and the emergence of sharing as a modality of economic production, Yale Law Journal, 2004, vol. 114, pp. 273-358) has argued that 'social sharing' via Internet-based distributed computing is a new, so far under-appreciated modality of economic production. This paper presents results from an empirical study of SETI@home (the Search for Extraterrestrial Intelligence), which is the classic example of such a computing project. The aim is to explain SETI@home participation and its intensity in a cross-country setting. The data are for a sample of 172 developed and developing countries for the years 2002-2004. The results indicate that SETI@home participation and its intensity can be explained largely by the degree of ICT access (proxied by the International Telecommunication Union's 'Digital Access Index'), as well as GDP per capita and dummy variables for major country groups. Some other variables, such as the Human Development Index, perform less well. Although SETI@home is a global phenomenon, it is never-the-less mostly concentrated in rich countries. However, there are indications of a slowly narrowing global SETI@home digital divide. © 2006 Elsevier Ltd. All rights reserved.