Scalable, high-performance, and generalized subtree data anonymization approach for Apache Spark

Bazai SU; Jang-Jaccard J; Alavizadeh H

doi:10.3390/electronics10050589

Scalable, high-performance, and generalized subtree data anonymization approach for Apache Spark

dc.citation.issue	5
dc.citation.volume	10
dc.contributor.author	Bazai SU
dc.contributor.author	Jang-Jaccard J
dc.contributor.author	Alavizadeh H
dc.contributor.editor	Guitart J
dc.date.accessioned	2023-11-14T23:52:42Z
dc.date.accessioned	2023-11-20T01:37:56Z
dc.date.available	2021-03-03
dc.date.available	2023-11-14T23:52:42Z
dc.date.available	2023-11-20T01:37:56Z
dc.date.issued	2021-03-03
dc.description.abstract	Data anonymization strategies such as subtree generalization have been hailed as techniques that provide a more efficient generalization strategy compared to full-tree generalization counterparts. Many subtree-based generalizations strategies (e.g., top-down, bottom-up, and hybrid) have been implemented on the MapReduce platform to take advantage of scalability and parallelism. However, MapReduce inherent lack support for iteration intensive algorithm implementation such as subtree generalization. This paper proposes Distributed Dataset (RDD)-based implementation for a subtree-based data anonymization technique for Apache Spark to address the issues associated with MapReduce-based counterparts. We describe our RDDs-based approach that offers effective partition management, improved memory usage that uses cache for frequently referenced intermediate values, and enhanced iteration support. Our experimental results provide high performance compared to the existing state-of-the-art privacy preserving approaches and ensure data utility and privacy levels required for any competitive data anonymization techniques.
dc.description.confidential	false
dc.edition.edition	March 2021
dc.format.pagination	1-28
dc.identifier.author-url	http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000628013800001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=c5bb3b2499afac691c2e3c1a83ef6fef
dc.identifier.citation	Bazai SU, Jang-Jaccard J, Alavizadeh H. (2021). Scalable, high-performance, and generalized subtree data anonymization approach for apache spark. Electronics (Switzerland). 10. 5. (pp. 1-28).
dc.identifier.doi	10.3390/electronics10050589
dc.identifier.eissn	2079-9292
dc.identifier.elements-type	journal-article
dc.identifier.number	ARTN 589
dc.identifier.uri	https://mro.massey.ac.nz/handle/10179/69139
dc.language	English
dc.publisher	MDPI (Basel, Switzerland)
dc.publisher.uri	https://www.mdpi.com/2079-9292/10/5/589
dc.relation.isPartOf	Electronics (Switzerland)
dc.rights	(c) 2021 The Author/s
dc.rights	CC BY
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Spark
dc.subject	subtree generalization
dc.subject	privacy
dc.subject	data anonymization
dc.subject	Resilient Distributed Dataset (RDD)
dc.title	Scalable, high-performance, and generalized subtree data anonymization approach for Apache Spark
dc.type	Journal article
pubs.elements-id	441754
pubs.organisational-group	Other

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Published version.pdf
Size:: 721.49 KB
Format:: Adobe Portable Document Format

Download

Collections

Journal Articles