Maximising the effectiveness of threat responses using data mining : a piracy case study : this thesis presented in partial fulfillment of the requirements for the degree of Master of Information Sciences in Information Technology, School of Engineering and Advanced Technology at Massey University, Albany, Auckland, New Zealand

Thumbnail Image
Open Access Location
Journal Title
Journal ISSN
Volume Title
Massey University
The Author
Companies with limited budgets must decide how best to defend against threats. This thesis presents and develops a robust approach to grouping together threats which present the highest (and lowest) risk, using film piracy as a case study. Techniques like cluster analysis can be used effectively to group together sites based on a wide range of attributes, such as income earned per day and estimated worth. The attributes of high earning and low earning websites could also give some useful insight into policy options which might be effective in reducing earnings by pirate websites. For instance, are all low value sites based in a country with effective internet controls? One of the practical data mining techniques such as a decision tree or classification tree could help rightsholders to interpret these attributes. The purpose of analysing the data in this thesis was to answer three main research questions in this thesis. It was found that, as predicted, there were two natural clusters of the most complained about sites (high income and low income). This means that rightsholders should focus their efforts and resources on only high income sites, and ignore the others. It was also found that the main significant factors or key critical variables for separating high-income vs low-income rogue websites included daily page-views, number of internal and external links, social media shares (i.e. social network engagement) and element of the page structure, including HTML page and JavaScript sizes. Further research should investigate why these factors were important in driving website revenue higher. For instance, why is high revenue associated with smaller HTML pages and less JavaScript? Is it because the pages are simply faster to load? A similar pattern is observed with the number of links. These results could form a study looking into what attributes make e-commerce successful more broadly. It is important to note that this was a preliminary study only looking at the Top 20 rogue websites basically suggested by Google Transparency Report (2015). Whilst these account for the majority of complaints, a different picture may emerge if we analysed more sites, and/or selected them based on different sets of criteria, such the time period, geographic location, content category (software versus movies, for example), and so on. Future research should also extend the clustering technique to other security domains.
Computer security, Computer networks, Security measures, Internet, Data mining