Massey Documents by Type
Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294
Browse
4 results
Search Results
Item Some statistical techniques for analysing Bluetooth tracking data in traffic modelling : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand(Massey University, 2021) Aslani, GhazalehThe economy and the environment are both affected by traffic congestion. People spend time stuck in traffic, which limits their free time. Every city's road infrastructure is under increased pressure, particularly in large cities, due to population growth and vehicle ownership patterns. Therefore, traffic control and management are crucial to reducing traffic congestion problems and effectively using existing road infrastructure. Bluetooth is a commonly used wireless technology for short-distance data exchange. This technology allows all mobile phones, GPS systems, and in-vehicle applications such as navigation systems to connect with the personal devices of drivers and passengers. A Media Access Control (MAC) address is a unique electronic identifier used by each Bluetooth device. The concept is that, while a Bluetooth-equipped device travels along a road, its MAC address, detection time, and location can be detected anonymously at different locations. Bluetooth technology can be integrated into Intelligent Transportation Systems (ITS) to enable better and more effective traffic monitoring and management, hence reducing traffic congestion. This thesis aims to develop some statistical methods for analysing Bluetooth tracking data in traffic modelling. One of the challenges of using Bluetooth data, particularly for travel time estimation, is multiple Bluetooth detections, which occur when a Bluetooth sensor records a Bluetooth device several times while it passes through the detection zone. We employ cluster analysis to look at the possibility of extracting meaningful traffic information from multiple detections, and the observed gap distribution, which is the time difference between records when multiple detections occur. We also develop a novel regression method to investigate the relationship between data from Bluetooth and Automatic Traffic Counts (ATCs) through weighted regression analysis, in order to explore potential causes of bias in the representativeness of Bluetooth detections. Finally, we seek the practical objective of recovering ATC from Bluetooth data as a statistical calibration problem, following the development of a new time-varying coefficients Poisson regression model.Item Learning statistics at a distance : a thesis presented in partial fulfilment of the requirements for the degree of Master of Educational Studies in Mathematics at Massey University(Massey University, 2001) Curry, Lois MarianThere is evidence from many leading statistics educators that students often find statistics a difficult subject to learn. This is often attributed to the abstract nature of the concepts and the change in thinking required to understand the theory of probability and the innate variation existing around us. For mature-aged students, these difficulties may be compounded by lack of basic mathematical skills and anxiety about learning statistics. In addition, learning at a distance may increase the problems students have in obtaining good understanding of the concepts. The purposes of this qualitative study were to determine the value mature-aged students placed on having a compulsory statistics paper in their business or applied science degree; and to record the difficulties that these students attributed to their choice of the distance mode of learning and their strategies or suggestions for dealing with these. Recommendations for the design of distance courses for mature-aged students were discussed. The main findings were: • The lack of mathematical skills was the main reason that students were tentative about tackling a statistics course. Older students and those with little secondary education may be particularly affected. • Anxiety was not as extensive as had been reported in overseas studies but is still an issue for statistics educators. • Almost all students saw value in having a compulsory statistics course in their degree and were aware of the need to interpret data presented to them in their study, work or everyday life. • The mature-aged students demonstrated good metacognitive skills and other learning strategies. Determination to succeed and high motivation were apparent, although many students found the course unexpectedly difficult. • There was a variety of opinions about the effectiveness of available resources. Support mechanisms were deemed important, as was some face-to-face component in the statistics course and some flexibility in time-frames.Item Analysis of complex surveys : a thesis presented in partial fulfillment of the requirements for the degree of Masterate in Science in Statistics at Massey University(Massey University, 1997) Young, JaneComplex surveys are surveys which involve a survey design other than simple random sampling. In practice sample surveys require a complex design due to many factors such as cost, time and the nature of the population. Standard statistical methods such as linear regression, contingency tables and multivariate analyses are based on data which are independently and identically distributed (IID). That is, the data is assumed to have been selected by a simple random sampling design. The assumptions underlying standard statistical methods are generally not met when the data is from a complex design. A measure of the efficiency of a design was found by the ratio of the variance of the actual design over the variance of a simple random sample (of the same sample size). This is known as the design effect (deff). There are two forms of design effects; one proposed by Kish (1965) and another termed the misspecification effect (meff) by Skinner et al. (1989). Throughout the thesis, the design effect referred to is Skinner et al. (1989)'s misspecification effect. Cluster sampling generally yields a deff greater than one and stratified samples yields a deff less than one. Some researchers have adopted a model based approach for parameter estimation rather than the traditional design based approach. The model based approach is one which each possible respondent has a distribution of possible values, often leading to the equivalent of an infinite background population, called the superpopulation. Both approaches are discussed throughout the thesis. Most of the standard computing packages available have been developed for simple random sample data. Specialized packages are needed to analyse complex survey data correctly. PC CARP and SUDAAN are two such packages. Three examples of statistical analyses on complex sample surveys were explored using the specialized statistical packages. The output from these packages were compared to a standard statistical package, The SAS System. It was found that although SAS produced the correct estimates, the standard errors were much smaller than those from SUDAAN. This led, in regression for example, to a much higher number of variables appearing to be significant when they were not. The examples illustrated the consequences of using a standard statistical package on complex data. Statisticians have long argued the need for appropriate statistics for complex surveys.Item An analysis of the missing data methodology for different types of data : a thesis presented in partial fulfilment of the requirements for the degree of Master of Applied Statistics at Massey University(Massey University, 2000) Scheffer, Judith-AnneMissing data is an eternal problem in data analysis. It is widely recognised that data is costly to collect, and the methods used to deal with missing data in the past relied on case deletion. There is no one overall best fix, but many different methodologies to use in different situations. This study was motivated by the writer's time spent analysing data in the nutrition study, and realising how much data was wasted by case deletion, and subsequently how this could bias inferences formed from the results. A better method (or methods), of dealing with missing data (than case deletion) is required, to ensure valuable information is not lost. What is being done: What is in the literature? The literature on this topic has exploded with new methods in recent times. Algorithms have been written and incorporated based on these methods into a number of statistical packages and add-on libraries. Statistical packages are also reviewed for their practicality and application in this area. The nutrition data is then applied to different methodologies, and software packages to assess different types of imputation. A set of questions are posed; based on type of data, type of missingness, extent of missingness, the required end use of the data, the size of the dataset, and how extensive that analysis needs to be. This can guide the investigator into using an appropriate form of imputation for the type of data at hand. A comparison of imputation methods and results is given with the principal result that imputing missing data is a very worthwhile exercise to reduce bias in survey results, which can be achieved by any researcher analysing their own data. Further to this, a conjecture is given for using Data Augmentation for ordinal data, particularly Likert scales. Previously this has been restricted to either person or item mean imputation, or hot deck methods. Using model based methods for imputation is far superior for other types of data. Model based methods for Likert data are achieved by means of inserting the linear by linear association model into standard missing data methodology.
