Massey Documents by Type

Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    Improving the robustness and privacy of HTTP cookie-based tracking systems within an affiliate marketing context : a thesis presented in fulfilment of the requirements for the degree of Doctor of Philosophy at Massey University, Albany, New Zealand
    (Massey University, 2021) Amarasekara, Bede Ravindra
    E-commerce activities provide a global reach for enterprises large and small. Third parties generate visitor traffic for a fee; through affiliate marketing, search engine marketing, keyword bidding and through organic search, amongst others. Therefore, improving the robustness of the underlying tracking and state management techniques is a vital requirement for the growth and stability of e-commerce. In an inherently stateless ecosystem such as the Internet, HTTP cookies have been the de-facto tracking vector for decades. In a previous study, the thesis author exposed circumstances under which cookie-based tracking system can fail, some due to technical glitches, others due to manipulations made for monetary gain by some fraudulent actors. Following a design science research paradigm, this research explores alternative tracking vectors discussed in previous research studies within a cross-domain tracking environment. It evaluates their efficacy within current context and demonstrates how to use them to improve the robustness of existing tracking techniques. Research outputs include methods, instantiations and a privacy model artefact based on information seeking behaviour of different categories of tracking software, and their resulting privacy intrusion levels. This privacy model provides clarity and is useful for practitioners and regulators to create regulatory frameworks that do not hinder technological advancement, rather they curtail privacy-intrusive tracking practices on the Internet. The method artefacts are instantiated as functional prototypes, available publicly on Internet, to demonstrate the efficacy and utility of the methods through live tests. The research contributes to the theoretical knowledge base through generalisation of empirical findings and to the industry by problem solving design artefacts.
  • Item
    Building privacy-preservation models for distributed processing platforms : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy (Ph.D.) in Computer Science, Massey University, New Zealand
    (Massey University, 2020) Bazai, Sibghat Ullah
    The widespread proliferation of data collection has increased a serious privacy concern in recent years. Data anonymization approaches have been proposed as a privacy-preserving technique to preserve the privacy of data. However, most existing data anonymization approaches have been designed to work with a small number of datasets within a single machine environment thus often not suitable for big data. To resolve these limitations, many scalable data anonymization solutions that can work with the distributed processing platform (e.g., MapReduce and Spark) has emerged to take advantage of scalability and other supports required for big data. However, due to lack of inherent support for the algorithms involved in data anonymization techniques, these existing proposals often encounter many implementation and performance bottlenecks. In the studies presented in this thesis, we propose a set of novel data anonymization approaches that can work well in the most popular distributed processing platforms for big data such as MapReduce and Spark. Our first set of studies address the privacy concerns involved in MapReduce platform that processes sensitive data without an appropriate privacy protection which may allow adversaries to break two very important security principals such as data confidentiality and integrity. Firstly, we propose a privacy-preservation platform as an extra layer on MapReduce to provide a set of privacy services to produce different sets of privacy-preserving anonymized datasets that can be safely processed by MapReduce. Secondly, we also offer a privacy-preserving $k$-NN based classifier for MapReduce. Instead of working with plaintext, our $k$-NN classifier can work on any anonymized datasets to protect the privacy concern of input data while still providing accurate classification results. In our second set of studies, we address the concerns in Apache Spark that lack appropriate supports for many popular data anonymization techniques. We first investigate the requirement for the types of support required for many data anonymization approaches which often demand multiple read and write operations. We argue that existing approaches fail to provide supports for caching intermediate data in memory which found to contribute performance degradation. To address this problem, we propose a Resilient Distributed Dataset (RDD) based data anonymization model that avoids expensive disk I/O. We also argue that many existing methods do not provide support for iterative intensive operations which appear in many data anonymization technique such as subtree generalization. We propose a generic approach for implementing subtree-based data anonymization techniques for Spark that provide more effective support for iteration intensive operations. Extending from this, we also provide a novel hybrid approach that can more effectively apply different data anonymization techniques for multi-dimensional data. We argue that our hybrid approach offers much better control for the expensive RDD creation and the size of partitions attached for each RDD which is much better suited to reduce many overheads such as involved in re-computation, shuffle operations, message exchange, and cache management. The experimental studies confirm that our novel privacy-preserving models implemented on both MapReduce and Spark provide high performance and scalability while supporting high levels of data privacy and utility.