Massey Documents by Type
Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294
Browse
7 results
Search Results
Item Genome-wide copy number variation in sheep : detection and utility as a genetic marker for quantitative traits, with reference to gastrointestinal nematodiasis : thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Animal Science at Massey University, Palmerston North, New Zealand(Massey University, 2018) Yan, JuncongGastrointestinal nematodes are perhaps the most important parasites of domestic sheep world-wide. Genetic selection for nematode resistance in domestic sheep is being promoted in many countries including New Zealand. There are several strategies to identify genetic markers associated with quantitative traits. Single nucleotide polymorphism (SNP)-based strategies have been widely used in animal breeding. However, SNP cannot explain all the genetic variation for a particular trait. A new kind of variation, copy number variation (CNV) has been identified as contributing to genetic variation in production and disease traits. Compared with other domestic animals, CNV in sheep is poorly investigated. The primary objective of this thesis was to explore the utility of genome-wide CNV as a genetic marker for the analysis of quantitative traits in sheep. Five different studies were undertaken to fulfill the objective. The first two studies used 50 K SNP BeadChip genotype data and next generation sequencing (NGS) data to detect CNV. Extensive CNV differences were evident between breeds as well as detection algorithms. NGS-based detection resulted in better CNV resolution than that by SNP. Subsequently, a genome-wide association study (with a small sample size) using CNV detected from a high density (HD) SNP genotype data identified four CNV regions to be significantly associated with a couple of traits pertaining to gastrointestinal nematodiasis in Romney sheep, while no significant SNP associations were found. Somatic mosaicism of CNV, influenced by age (high in foetuses, compared to adults), individuals, detection algorithm and type of tissue analysed, was also evident in separate study. The final study detected CNV differences and SNP based selection signatures in two Romney lines selected for gastrointestinal nematode resistance or resilience. Several significant SNPs and line-specific CNV regions were identified. However, only one SNP overlapped to a CNV region, indicating that SNP-based selection signatures and CNV could represent different aspects of sheep immunogenetics. Overall, CNV could be a potential genetic marker, albeit with methods for detection and validation needing to be refined. The conclusions from this thesis expand our understanding of CNV in sheep and its potential application prospects for genetic breeding of sheep in the future.Item DNA sequence reading by image processing: a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in computer science at Massey University(Massey University, 1993) Fan, BaozhenThe research described in this thesis is the development of the DNA sequence reading system. Macromolecular sequences of DNA are the encoded form of the genetic information of all living organisms. DNA sequencing has therefore played a significant role in the elucidation of biological systems. DNA sequence reading is a part of DNA sequencing. This project is for reading DNA sequences directly from DNA sequencing gel autoradiographs within a general purpose image processing system. The DNA sequence reading software is developed based on the waterfall software development approach combined with exploratory programming. Requirement analysis, software design, detailed design, implementation, system testing and maintenance are the basic development stages. The feedback from implementation and system testing to detailed design is much stronger in image processing than a lot of other software development. After an image is captured from a gel autoradiograph, the background of the image is normalised and the contrast is enhanced. The captured image consists several lane sets of bands. Each of the lane set represents one part of a DNA sequence. The lane sets are separated automatically into subimages to be read individually. The gap lines between the lane sets are detected for separation. The geometric distortions are corrected by finding the boundaries of the lane set in the subimage. The left boundary of the lane set is used to straighten lane set and the right boundary is used to warp the lane set into a standard width. If separation of the lane sets or geometry correction is unsuccessful by automatical processing, manual selection is used. After the band features are enhanced, the individual bands are extracted and the positions of the bands are determined. The band positions are then converted into the order of the DNA sequence. Different part of a sequence from subsequences are merged into a longer sequence. In most of the cases, the individual lane sets in a captured image are able to be separated automatically. Manual processing is necessary to handle the cases where the lane sets are too close. The system may reach an accuracy of 98% if the bands are clear. Manual checking and correcting the detected bands helps to obtain a reliable sequence. If a lane set on the autoradiograph is indistinct or bands are too close it may reduce the accuracy, in extreme cases to the point where it is unreadable. For a 512x512 image captured from a gel autoradiograph, preprocessing takes 90 seconds, processing each subimage takes 40 seconds on a 33Hz 486 PC. If processing a 430x350 mm autoradiograph with 16 lane sets, assuming 6 images are required, it takes about 40 minutes.Item DNA sequence reading by image processing : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University(Massey University, 1993) Fan, BaozhenThe research described in this thesis is the development of the DNA sequence reading system. Macromolecular sequences of DNA are the encoded form of the genetic information of all living organisms. DNA sequencing has therefore played a significant role in the elucidation of biological systems. DNA sequence reading is a part of DNA sequencing. This project is for reading DNA sequences directly from DNA sequencing gel autoradiographs within a general purpose image processing system. The DNA sequence reading software is developed based on the waterfall software development approach combined with exploratory programming. Requirement analysis, software design, detailed design, implementation, system testing and maintenance are the basic development stages. The feedback from implementation and system testing to detailed design is much stronger in image processing than a lot of other software development. After an image is captured from a gel autoradiograph, the background of the image is normalised and the contrast is enhanced. The captured image consists several lane sets of bands. Each of the lane set represents one part of a DNA sequence. The lane sets are separated automatically into subimages to be read individually. The gap lines between the lane sets are detected for separation. The geometric distortions are corrected by finding the boundaries of the lane set in the subimage. The left boundary of the lane set is used to straighten lane set and the right boundary is used to warp the lane set into a standard width. If separation of the lane sets or geometry correction is unsuccessful by automatical processing, manual selection is used. After the band features are enhanced, the individual bands are extracted and the positions of the bands are determined. The band positions are then converted into the order of the DNA sequence. Different part of a sequence from subsequences are merged into a longer sequence. In most of the cases, the individual lane sets in a captured image are able to be separated automatically. Manual processing is necessary to handle the cases where the lane sets are too close. The system may reach an accuracy of 98% if the bands are clear. Manual checking and correcting the detected bands helps to obtain a reliable sequence. If a lane set on the autoradiograph is indistinct or bands are too close it may reduce the accuracy, in extreme cases to the point where it is unreadable. For a 512x512 image captured from a gel autoradiograph, preprocessing takes 90 seconds, processing each subimage takes 40 seconds on a 33Hz 486 PC. If processing a 430x350 mm autoradiograph with 16 lane sets, assuming 6 images are required, it takes about 40 minutes.Item The determination of nucleotide arrangement in oligonucleotides derived from deoxyribonucleic acid : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Biochemistry at Massey University(Massey University, 1969) Simes, Lynda JoyThe importance of DNA sequences The genetic material of all animals, plants bacteria, and of many animal and bacterial viruses has long been established as deoxyribonucleic acid or DNA, and recent studies (1a) have shown that its major function is to carry the genetic information required by a cell for the synthesis of species specific proteins. This information is stored by the nucleic acid macromolecule in the form of a linear code determined by its intrinsic nucleotide sequence or primary structure. The information is carried in such a way that a specific sequence of three nueleotides has the ability to code for one of every type of amino acid found in protein. In recent years the message corresponding to each nucleotide triplet has been established (2). Besides coding for amino acids, nucleotide sequences exist which are known to code for ribosomal transfer RNA's. There are probably other sequences which are involved in a variety of special roles, the more important being regulating and initiating transcription of the genetic message into functional messenger RNA, and the initiation of DNA replication. Because little information is available about the actual arrangement of bases needed to effect these functions, a knowledge of the complete nucleotide sequence of a biologically active DNA molecule may help to elucidate the nature of these extremely important processes. In addition it in hoped that new approaches can be found towards a better understanding of the actual changes to DNA caused by mutagens and carcinogenic agents, and it is also hoped that some knowledge can be obtained of the extent to which degenerate codons exist in genetic material, and the function of these codons when they do occur.Item A comparison of next-generation sequencing protocols for microbial profiling : a thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Genetics, Massey University, Palmerston North, New Zealand(Massey University, 2016) Fong, Yang (Richard)The introduction of massive parallel sequencing has revolutionized analyses of microbial communities. Illumina and other Whole Genome Shotgun Sequencing (WGS) sequencing protocols have promised improved opportunities for investigation of microbial communities. In the present work, we compared and contrasted the findings from different NGS library preparation protocols (Illumina Nextera, Nextera-XT, NEXTFlex PCR-free and Ion-Xpress-400bp) and two sequencing platforms (MiSeq and Ion-Torrent). Short reads were analysed using the rapid database matching software PAUDA and visualization software MEGAN5, which provides a conservative approach for taxonomic identification and functional analyses. In analyses of a Tamaki River water sample, biological inferences were made and compared across platforms and protocols. For even a relatively small number of reads generated on the MiSeq sequencing platform important pathogens were identified in the water sample. Far greater phylogenetic resolution was obtained with WGS sequencing protocols than has been reported in similar studies that have used 16S rDNA Illumina sequencing protocols. TruSeq and Nextera-XT sequencing protocols produced similar results. The latter protocol offered cheaper, and faster results from less DNA starting material. Proteobacteria (alpha, beta and gamma), Actinobacteria and Bacteroidetes were identified as major microbial elements in the Tamaki River sample. Our findings support the emerging view that short read sequence data and enzymatic library prep protocols provide a cost effective tool for evaluating, cataloguing and monitoring microbial species and communities. This is an approach that complements, and provides additional insight to microbial culture “water testing” protocols routinely used for analysing aquatic environments.Item Beyond BLASTing : ribonucleoprotein evolution via structural prediction and ancestral sequence reconstruction(Massey University, 2016) Daly, Toni KPrimary homology in DNA and protein sequence has long been used to infer a relationship between similar sequences. However gene sequence, and thus protein sequence, can change over time. In evolutionary biology that time can be millions of years and related sequences may become unrecognisable via primary homology. This is demonstrated most effectively in chapter 4a (figure 10). Conversely the number of possible folds that proteins can adopt is limited by the attractions between residues and therefore the number of possible folds is not infinite. This means that folds may arise via convergence between evolutionarily unrelated DNA sequences. This thesis aims to look at a process to will wring more information from the primary protein sequence than is usually used and finds other factors that can support or refute the placement of a protein sequence within the family in question. Two quite different proteins; the Major Vault Protein whose monomers make up the enigmatic vault particle and the argonaute family of proteins (AGO and PIWI) that appear to have a major hand in quelling parasitic nucleic acid and control of endogenous gene expression, are used to demonstrate the flexibility of the workflow. Principally the method relies on prediction of three-dimensional structure. This requires at least a partially solved crystal structure but once one exists this method should be suitable for any protein. Whole genome sequencing is now a routine practice but annotation of the resultant sequence lags behind for lack of skilled personnel. Automated pipeline data does a good job in annotating close homologs but more effort is needed for correct annotation of the exponentially growing data bank of uncharacterised (and wrongly characterised) proteins. Lastly, in deference to budding biologists the world over, I have tried to find free stable software that can be used on an ordinary personal computer and by a researcher with minimal computer literacy to help with this task.Item Exploring biological sequence space : selected problems in sequence analysis and phylogenetics : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computational Biology at Massey University(Massey University, 2012) McComish, Bennet JamesAs the volume and complexity of available sequence data continues to grow at an exponential rate, the need for new sequence analysis techniques becomes more urgent, as does the need to test and to extend the existing techniques. These include, among others, techniques for assembling raw sequence data into usable genomic sequences; for using these sequences to investigate the evolutionary history of genes and species; and for examining the mechanisms by which sequences change over evolutionary time scales. This thesis comprises three projects within the Veld of sequence analysis. It is shown that organelle genome DNA sequences can be assembled de novo using short Illumina reads from a mixture of samples, and deconvoluted bioinformatically, without the added cost of indexing the individual samples. In the course of this work, a novel sequence element is described, that probably could not have been detected with traditional sequencing techniques. The problem of multiple optima of likelihood on phylogenetic trees is examined using biological data. While the prevalence of multiple optima varies widely with real data, trees with multiple optima occur less often among the best trees. Overall, the results provide reassurance that the value of maximum likelihood as a tree selection criterion is not often compromised by the presence of multiple local optima on a single tree. Fundamental mechanisms of mutation are investigated by estimating nucleotide substitution rate matrices for edges of phylogenetic trees. Several large alignments are examined, and the results suggest that the situation may be more complex than we had anticipated. It is likely that genome scale alignments will have to be used to further elucidate this question.
