Scalable, high-performance, and generalized subtree data anonymization approach for Apache Spark

dc.citation.issue5
dc.citation.volume10
dc.contributor.authorBazai SU
dc.contributor.authorJang-Jaccard J
dc.contributor.authorAlavizadeh H
dc.contributor.editorGuitart J
dc.date.accessioned2023-11-14T23:52:42Z
dc.date.accessioned2023-11-20T01:37:56Z
dc.date.available2021-03-03
dc.date.available2023-11-14T23:52:42Z
dc.date.available2023-11-20T01:37:56Z
dc.date.issued2021-03-03
dc.description.abstractData anonymization strategies such as subtree generalization have been hailed as techniques that provide a more efficient generalization strategy compared to full-tree generalization counterparts. Many subtree-based generalizations strategies (e.g., top-down, bottom-up, and hybrid) have been implemented on the MapReduce platform to take advantage of scalability and parallelism. However, MapReduce inherent lack support for iteration intensive algorithm implementation such as subtree generalization. This paper proposes Distributed Dataset (RDD)-based implementation for a subtree-based data anonymization technique for Apache Spark to address the issues associated with MapReduce-based counterparts. We describe our RDDs-based approach that offers effective partition management, improved memory usage that uses cache for frequently referenced intermediate values, and enhanced iteration support. Our experimental results provide high performance compared to the existing state-of-the-art privacy preserving approaches and ensure data utility and privacy levels required for any competitive data anonymization techniques.
dc.description.confidentialfalse
dc.edition.editionMarch 2021
dc.format.pagination1-28
dc.identifier.author-urlhttp://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000628013800001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=c5bb3b2499afac691c2e3c1a83ef6fef
dc.identifier.citationBazai SU, Jang-Jaccard J, Alavizadeh H. (2021). Scalable, high-performance, and generalized subtree data anonymization approach for apache spark. Electronics (Switzerland). 10. 5. (pp. 1-28).
dc.identifier.doi10.3390/electronics10050589
dc.identifier.eissn2079-9292
dc.identifier.elements-typejournal-article
dc.identifier.numberARTN 589
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/69139
dc.languageEnglish
dc.publisherMDPI (Basel, Switzerland)
dc.publisher.urihttps://www.mdpi.com/2079-9292/10/5/589
dc.relation.isPartOfElectronics (Switzerland)
dc.rights(c) 2021 The Author/s
dc.rightsCC BY
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectSpark
dc.subjectsubtree generalization
dc.subjectprivacy
dc.subjectdata anonymization
dc.subjectResilient Distributed Dataset (RDD)
dc.titleScalable, high-performance, and generalized subtree data anonymization approach for Apache Spark
dc.typeJournal article
pubs.elements-id441754
pubs.organisational-groupOther
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Published version
Size:
721.49 KB
Format:
Adobe Portable Document Format
Description:
Collections