Massey Research Online - Browsing by Author "Susnjak T"

Browsing by Author "Susnjak T"

Now showing 1 - 13 of 13

A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
(BioMed Central Ltd, 2020-12-14) Ahmed N; Barczak ALC; Susnjak T; Rashid MA
Big Data analytics for storing, processing, and analyzing large-scale datasets has become an essential tool for the industry. The advent of distributed computing frameworks such as Hadoop and Spark offers efficient solutions to analyze vast amounts of data. Due to the application programming interface (API) availability and its performance, Spark becomes very popular, even more popular than the MapReduce framework. Both these frameworks have more than 150 parameters, and the combination of these parameters has a massive impact on cluster performance. The default system parameters help the system administrator deploy their system applications without much effort, and they can measure their specific cluster performance with factory-set parameters. However, an open question remains: can new parameter selection improve cluster performance for large datasets? In this regard, this study investigates the most impacting parameters, under resource utilization, input splits, and shuffle, to compare the performance between Hadoop and Spark, using an implemented cluster in our laboratory. We used a trial-and-error approach for tuning these parameters based on a large number of experiments. In order to evaluate the frameworks of comparative analysis, we select two workloads: WordCount and TeraSort. The performance metrics are carried out based on three criteria: execution time, throughput, and speedup. Our experimental results revealed that both system performances heavily depends on input data size and correct parameter selection. The analysis of the results shows that Spark has better performance as compared to Hadoop when data sets are small, achieving up to two times speedup in WordCount workloads and up to 14 times in TeraSort workloads when default parameter values are reconfigured.
Assessment of the local tchebichef moments method for texture classification by fine tuning extraction parameters
(arXiv, 2019-06-01) Barczak A; Reyes N; Susnjak T
Data Quality Challenges in Educational Process Mining: Building Process-Oriented Event Logs from Process-Unaware Online Learning Systems
(Inderscience, 2022-05-04) Umer R; Susnjak T; Mathrani A; Suriadi S
Educational process mining utilizes process-oriented event logs to enable discovery of learning practices that can be used for the learner’s advantage. However, learning platforms are often process-unaware, therefore do not accurately reflect ongoing learner interactions. We demonstrate how contextually relevant process models can be constructed from process-unaware systems. Using a popular learning management system (Moodle), we have extracted stand-alone activities from the underlying database and formatted it to link the learners’ data explicitly to process instances (cases). With a running example that describes quiz-taking activities undertaken by students, we describe how learner interactions can be captured to build process-oriented event logs. This article contributes to the fields of learning analytics and education process mining by providing lessons learned on the extraction and conversion of process-unaware data to event logs for the purpose of analysing online education data.
Fat stigma and body objectification: A text analysis approach using social media content
(SAGE Publications, 2022-08-15) Wanniarachchi V; Scogings C; Susnjak T; Mathrani A
This study investigates how female and male genders are positioned in fat stigmatising discourses that are being conducted over social media. Weight-based linguistic data corpus, extracted from three popular social media (SM) outlets, Twitter, YouTube and Reddit, was examined for fat stigmatising content. A mixed-method analysis comprising sentiment analysis, word co-occurrences and qualitative analysis, assisted our investigation of the corpus for body objectification themes and gender-based differences. Objectification theory provided the underlying framework to examine the experiential consequences of being fat across both genders. Five objectifying themes, namely, attractiveness, physical appearance, lifestyle choices, health and psychological well-being, emerged from the analysis. A deeper investigation into more facets of the social interaction data revealed overall positive and negative attitudes towards obesity, which informed on existing notions of gendered body objectification and weight/fat stigmatisation. Our findings have provided a holistic outlook on weight/fat stigmatising content that is posted online which can further inform policymakers in planning suitable props to facilitate more inclusive SM spaces. This study showcases how lexical analytics can be conducted by combining a variety of data mining methods to draw out insightful subject-related themes that add to the existing knowledge base; therefore, has both practical and theoretical implications.
Hate Speech Patterns in Social Media: A Methodological Framework and Fat Stigma Investigation Incorporating Sentiment Analysis, Topic Modelling and Discourse Analysis
(Australasian Association for Information Systems and Australian Computer Society, 2023-02-08) Wanniarachchi V; Scogings C; Susnjak T; Mathrani A
Social media offers users an online platform to freely express themselves; however, when users post opinionated and offensive comments that target certain individuals or communities, this could instigate animosity towards them. Widespread condemnation of obesity (fatness) has led to much fat stigmatizing content being posted online. A methodological framework that uses a novel mixed-method approach for unearthing hate speech patterns from large text-based corpora gathered from social media is proposed. We explain the use of computer-mediated quantitative methods comprising natural language processing techniques such as sentiment analysis, emotion analysis and topic modelling, along with qualitative discourse analysis. Next, we have applied the framework to a corpus of texts on gendered and weight-based data that have been extracted from Twitter and Reddit. This assisted in the detection of different emotions being expressed, the composition of word frequency patterns and the broader fat-based themes underpinning the hateful content posted online. The framework has provided a synthesis of quantitative and qualitative methods that draw on social science and data mining techniques to build real-world knowledge in hate speech detection. Current information systems research is limited in its use of mixed analytic approaches for studying hate speech in social media. Our study therefore contributes to future research by establishing a roadmap for conducting mixed-method analyses for better comprehension and understanding of hate speech patterns.
Learning analytics dashboard: a tool for providing actionable insights to learners
(BioMed Central Ltd, 2022-02-14) Susnjak T; Ramaswami G; Mathrani A
This study investigates current approaches to learning analytics (LA) dashboarding while highlighting challenges faced by education providers in their operationalization. We analyze recent dashboards for their ability to provide actionable insights which promote informed responses by learners in making adjustments to their learning habits. Our study finds that most LA dashboards merely employ surface-level descriptive analytics, while only few go beyond and use predictive analytics. In response to the identified gaps in recently published dashboards, we propose a state-of-the-art dashboard that not only leverages descriptive analytics components, but also integrates machine learning in a way that enables both predictive and prescriptive analytics. We demonstrate how emerging analytics tools can be used in order to enable learners to adequately interpret the predictive model behavior, and more specifically to understand how a predictive model arrives at a given prediction. We highlight how these capabilities build trust and satisfy emerging regulatory requirements surrounding predictive analytics. Additionally, we show how data-driven prescriptive analytics can be deployed within dashboards in order to provide concrete advice to the learners, and thereby increase the likelihood of triggering behavioral changes. Our proposed dashboard is the first of its kind in terms of breadth of analytics that it integrates, and is currently deployed for trials at a higher education institution.
Masquerade Attacks Against Security Software Exclusion Lists
(AJIIPS, 2019) McIntosh T; Jang-Jaccard J; Watters P; Susnjak T
Security software, commonly known as Antivirus, has evolved from simple virus scanners to become multi-functional security suites. To combat ever-growing malware threats, modern security software utilizes both static and dynamic analysis to assess malware threats, inevitably leading to occasional false positive and false negative reports. To mitigate this, existing state-of-the-art security software offers the feature of Exclusion Lists to allow users to exclude specified files and folders from being scanned or monitored. Through rigorous evaluation, however, we found that some of such products stored their Exclusion Lists as unencrypted cleartexts either in known or predictable locations. In this paper we empirically demonstrate how easy it is to exploit the Exclusion Lists by launching masquerade attacks. We argue that the Exclusion Lists should be better implemented such as using application whitelisting, the contents of the lists to be better safeguarded, and only be readable by authorized entities within a strong access control scheme.
Methodological Aspects in Study of Fat Stigma in Social Media Contexts: A Systematic Literature Review
(MDPI (Basel, Switzerland), 2022-05-17) Wanniarachchi V; Mathrani A; Susnjak T; Scogings C; Moreno, A
With increased obesity rates worldwide and the rising popularity in social media usage, we have witnessed a growth in hate speech towards fat/obese people. The severity of hate content has prompted researchers to study public perceptions that give rise to fat stigma from social media discourses. This article presents a systematic literature review of recent literature published in this domain to gauge the current state of research and identify possible research gaps. We have examined existing research (i.e., peer-reviewed articles that were systematically included using the EBSCO discovery service) to study their methodological aspects by reviewing their context, domain, analytical methods, techniques, tools, features and limitations. Our findings reveal that while recent studies have explored fat stigma content in social media, these mostly acquired manual analytical methods regardless of the evolved machine learning, natural language processing and deep learning methods. Although fat stigma in social media has gained enormous attention in current socio-psychological research, there exists a gap between how such research is conducted and what technologies are being applied, which limits in-depth investigations of fat stigma discussions.
On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining
(MDPI (Basel, Switzerland), 2022-01-07) Ramaswami G; Susnjak T; Mathrani A; Cowling, M; Jha, M
Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets.
Perspectives on the challenges of generalizability, transparency and ethics in predictive learning analytics
(Elsevier Ltd, 2021-11-20) Mathrani A; Susnjak T; Ramaswami G; Barczak A
Educational institutions need to formulate a well-established data-driven plan to get long-term value from their learning analytics (LA) strategy. By tracking learners’ digital traces and measuring learners’ performance, institutions can discern consequential learning trends via use of predictive models to enhance their instructional services. However, questions remain on how the proposed LA system is suitable, meaningful, and justifiable. In this concept paper, we examine generalizability and transparency of the internals of predictive models, alongside the ethical challenges in using learners’ data for building predictive capabilities. Model generalizability or transferability is hindered by inadequate feature representation, small and imbalanced datasets, concept drift, and contextually un-related domains. Additional challenges relate to trustworthiness and social acceptance of these models since algorithmic-driven models are difficult to interpret by themselves. Further, ethical dilemmas are faced in engaging with learners’ data while developing and deploying LA systems at an institutional level. We propose methodologies for apprehending these challenges by establishing efforts for managing transferability and transparency, and further assessing the ethical standing on justifiable use of the LA strategy. This study showcases underlying relationships that exist between constructs pertaining to learners’ data and the predictive model. We suggest the use of appropriate evaluation techniques and setting up research ethics protocols, since without proper controls in place, the model outcome would not be portable, transferable, trustworthy, or admissible as a responsible outcome. This concept paper has theoretical and practical implications for future inquiry in the burgeoning field of learning analytics.
Supporting Students’ Academic Performance Using Explainable Machine Learning with Automated Prescriptive Analytics
(MDPI (Basel, Switzerland), 2022-12) Ramaswami G; Susnjak T; Mathrani A
Learning Analytics (LA) refers to the use of students’ interaction data within educational environments for enhancing teaching and learning environments. To date, the major focus in LA has been on descriptive and predictive analytics. Nevertheless, prescriptive analytics is now seen as a future area of development. Prescriptive analytics is the next step towards increasing LA maturity, leading to proactive decision-making for improving students’ performance. This aims to provide data-driven suggestions to students who are at risk of non-completions or other sub-optimal outcomes. These suggestions are based on what-if modeling, which leverages machine learning to model what the minimal changes to the students’ behavioral and performance patterns would be required to realize a more desirable outcome. The results of the what-if modeling lead to precise suggestions that can be converted into evidence-based advice to students. All existing studies in the educational domain have, until now, predicted students’ performance and have not undertaken further steps that either explain the predictive decisions or explore the generation of prescriptive modeling. Our proposed method extends much of the work performed in this field to date. Firstly, we demonstrate the use of model explainability using anchors to provide reasons and reasoning behind predictive models to enable the transparency of predictive models. Secondly, we show how prescriptive analytics based on what-if counterfactuals can be used to automate student feedback through prescriptive analytics.
Use of Predictive Analytics within Learning Analytics Dashboards: A Review of Case Studies
(Springer Nature BV, 2023-09-01) Ramaswami G; Susnjak T; Mathrani A; Umer R
Learning analytics dashboards (LADs) provide educators and students with a comprehensive snapshot of the learning domain. Visualizations showcasing student learning behavioral patterns can help students gain greater self-awareness of their learning progression, and at the same time assist educators in identifying those students who may be facing learning difficulties. While LADs have gained popularity, existing LADs are still far behind when it comes to employing predictive analytics into their designs. Our systematic literature review has revealed limitations in the utilization of predictive analytics tools among existing LADs. We find that studies leveraging predictive analytics only go as far as identifying the at-risk students and do not employ model interpretation or explainability capabilities. This limits the ability of LADs to offer data-driven prescriptive advice to students that can offer them guidance on appropriate learning adjustments. Further, published studies have mostly described LADs that are still at prototype stages; hence, robust evaluations of how LADs affect student outcomes have not yet been conducted. The evaluations until now are limited to LAD functionalities and usability rather than their effectiveness as a pedagogical treatment. We conclude by making recommendations for the design of advanced dashboards that more fully take advantage of machine learning technologies, while using suitable visualizations to project only relevant information. Finally, we stress the importance of developing dashboards that are ultimately evaluated for their effectiveness.
Using data-driven and process mining techniques for identifying and characterizing problem gamblers in New Zealand
(RTU Press, 2016-12) Suriadi S; Susnjak T; Ponder-Sutton A; Watters P; Schumacher CR

Browsing by Author "Susnjak T"

Results Per Page

Sort Options