THE LYME DISEASE CONTROVERSY: AN AI-DRIVEN DISCOURSE ANALYSIS OF A QUARTER CENTURY OF ACADEMIC DEBATE AND DIVIDES SUBMITTED FOR REVIEW Teo Susnjak ∗, Cole Palffy, Tatiana Zimina, Nazgul Altynbekova School of Mathematical and Computational Sciences, Massey University, Albany, New Zealand Kunal Garg, Leona Gilbert Te?ted Oy, Jyväskylä, Finland April 4, 2025 ABSTRACT The scientific discourse surrounding Chronic Lyme Disease (CLD) and Post-Treatment Lyme Disease Syndrome (PTLDS) has evolved over the past twenty-five years into a complex and polarised debate, shaped by shifting research priorities, institutional influences, and competing explanatory models. This study presents the first large-scale, systematic examination of this discourse using an innovative hybrid AI-driven methodology, combining large language models with structured human validation to analyse thousands of scholarly abstracts spanning 25 years. By integrating computational techniques with expert oversight, we developed a quantitative framework for tracking epistemic shifts in contested medical fields, with applications to other content analysis domains. Our analysis revealed a progressive transition from infection-based models of Lyme disease to immune- mediated explanations for persistent symptoms, a shift that has been particularly pronounced in high-impact clinical and immunology journals. At the same time, research supporting CLD has remained largely confined to hypothesis-driven publications, indicating a persistent asymmetry in how competing perspectives are disseminated and legitimised. The investigation into thematic trends further highlighted the enduring complexity of Lyme disease diagnostics and evolving research focus on therapeutic controversies, even as institutional alignment with PTLDS perspectives continues to grow. This study offers new empirical insights into the structural and epistemic forces shaping Lyme disease research, providing a scalable and replicable methodology for analysing discourse. The findings have implications for policymakers, clinicians, and communication strategists, emphasising the need for more equitable research funding, standardised diagnostic criteria, and improved patient- centred care models. This research also underscores the value of AI-assisted methodologies in social science and medical research by systematically quantifying discourse evolution, offering a foundation for future studies examining other contested conditions and controversies. Keywords Lyme disease controversy · Chronic Lyme Disease (CLD) · Post-Treatment Lyme Disease Syndrome (PTLDS) · Medical controversy · AI in medical research · Large Language Models in academic analysis · Stance detection in medical literature · Lyme disease academic discourse · Science and Technology Studies · Social construction of knowledge 1 Introduction It has been estimated that every year, Lyme disease affects hundreds of thousands in North America and Europe [1, 2], while for over a quarter of a century, the medical and scientific communities have been sharply divided over ∗Corresponding author: t.susnjak@massey.ac.nz . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. https://orcid.org/0000-0001-9416-1435 https://orcid.org/0009-0007-7499-3892 https://orcid.org/0009-0008-3141-443X https://orcid.org/0009-0002-1393-8682 https://orcid.org/0000-0003-4346-027X https://orcid.org/0000-0002-7470-5770 https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW the persistent effects of Lyme disease on patients after standard antibiotic treatments [3–9]. Although most patients recover fully, approximately 25% of patients [10–14] continue to experience symptoms like fatigue, pain, and cognitive difficulties, sparking a debate about the nature of these persistent and debilitating health issues [15–19]. This discourse has polarised into two major schools of thought: one asserts that these symptoms are the result of a post-infectious syndrome that does not involve ongoing bacterial infection [20, 21], while the other posits that a subset of Lyme disease cases may become chronic [3–6], and might require prolonged antibiotic treatment due to signs of a persistent infection [10, 22]. Research indicates that how Lyme disease is perceived is influenced by power dynamics in healthcare, patient advocacy, and media discussions, highlighting conflicts between medical experts and public knowledge [23–25]. These conflicting viewpoints have resulted in a substantial but contentious body of academic work, with researchers, healthcare providers, and patient advocacy groups frequently taking opposing positions [26–29]. Mainstream medical bodies, like the Infectious Diseases Society of America (IDSA), argue that post-treatment symptoms experienced by a subset of patients after completing standard antibiotic therapy can be attributed to what they define as Post-Treatment Lyme Disease Syndrome (PTLDS) [10]. The IDSA suggests that symptoms like fatigue and cognitive issues are probably due to immune responses or tissue damage, not ongoing infection [5]. Conversely, organisations like the International Lyme and Associated Diseases Society (ILADS) advocate for recognising chronic Lyme disease (CLD), contending that ongoing infection may be responsible for these symptoms [30]. ILADS recommends extended antibiotic regimens, pointing to contested evidence of patient improvement [31]. While deeply rooted in scientific inquiry, this debate has also been shaped by patient experiences, public advocacy, and extensive media attention, further complicating efforts to reach a consensus [27, 32–34]. To that end, this scientific controversy is also intricately linked to sociopolitical and economic influences, encompassing insurance reimbursement systems, the regulation of alternative medicine, and the stigmatisation of disputed illnesses [35, 36]. At the heart of it, the Lyme disease controversy exemplifies the sociology of medical knowledge, where grassroots patient movements contest biological authority, promoting alternative diagnosis and treatment approaches [37, 38]. Internet and social media have significantly shaped the narrative around Lyme disease [39–42]. Patients who feel unheard by traditional medicine have discovered online communities to share experiences and explore alternative treatment options. These patients, who perceive conventional healthcare as dismissive, have increasingly sought refuge in online groups, where accounts of medical neglect are validated, and alternative illness models proliferate [43, 44]. These digital platforms serve as counter-publics that challenge prevailing scientific narratives, exemplifying what researchers in Science and Technology Studies (STS) call “epistemic resistance” to dominant biomedical paradigms [45, 46]. The proliferation of conflicting medical assertions inside online Lyme disease forums has underscored the influence of digital platforms on health beliefs and patient choices [47, 48]. These platforms have amplified the voices of those advocating for chronic Lyme disease [49]. Still, according to opposing voices, they have also facilitated the spread of misinformation, further complicating and sharpening the discourse [49]. As a result, the controversy surrounding Lyme disease has extended beyond medical journals into mainstream media [39, 41], shaping public perception and influencing policy decisions. To appreciate the current state of the discourse surrounding this controversy, it is useful to chart its history. Figure 1 captures the main themes over the decades. During the mid-to-late 1970s, an atypical outbreak of arthritis in youngsters from rural Connecticut resulted in the early identification of what would later be termed Lyme disease [52]. Shortly after its formal diagnosis, clinicians noted that several patients exhibited persistent symptoms—such as arthralgia, tiredness, and neurological complications—despite receiving prescribed antibiotic treatment [53]. Medical researchers determined that Lyme disease is an illness triggered by the bacterium Borrelia burgdorferi, transmitted through the bites of infected ticks, which acquire it from animals like mice and deer they feed on [54]. A hallmark of the disease in humans is a bullseye rash known as erythema migrans. Without treatment, the disease can lead to a variety of symptoms, such as joint pain and swelling, meningitis, partial facial paralysis, cognitive impairments, fatigue, headaches, and heart complications [55, 56]. In the subsequent period from the 1990s to the early 2000s, differing scientific perspectives emerged and led to opposing factions about disease chronicity and treatment, coinciding with the rise of patient advocacy and online communities that enriched public discourse. During the period from the mid-2000s to the 2010s, tensions escalated as researchers, clinicians, and patient organisations became increasingly polarised, while media coverage intensified sensationalism surrounding the chronic Lyme disease controversy [20, 26, 50, 51]. At this time, patient organisations began to assert that the treatment protocols established by the IDSA had been employed to refuse medical insurance for those seeking extended antibiotic courses. At the same time, it also penalised physicians who prescribed these treatments [26]. The discourse became inflammatory with terms like “Axis of Evil” used to describe physicians prescribing prolonged antibiotics, “specialty laboratories” offering alternative tests, and the internet’s role in promoting “Lyme hysteria” [50], reflecting the intensifying acrimony and the frustration within the mainstream medical community. While it has been suggested [51] that Lyme disease was receiving an over-proportionate exposure in media coverage, the claim was 2 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Figure 1: A broad and consolidated outline of the discourse themes and tensions on the Lyme disease controversy in media and academia over time as reported in literature [20, 26, 50, 51]. that the coverage often perpetuated this perspective, portraying those advocating for CLD as misguided or promoting unscientific practices. The cooling in tensions began to emerge only from the late 2010s, when the focus shifted to diagnostic problems and patient-centred perspectives, with social media playing a substantial role in shaping public understanding, activism, and patient support [57]. As the debate continued, there has been a further shift in focus in recent years, emphasising the complexities of Lyme disease diagnostics and patients’ subjective experiences. Studies now point out the challenges with current blood tests, especially in the advanced stages of the disease [54]. Researchers acknowledge the need for improved diagnostic tools to address potential underdiagnosis, particularly in cases with atypical symptoms or less common Borrelia burgdorferi species [58]. Qualitative studies using interviews and ethnographic methods offer valuable insights into the experiences of individuals with persistent symptoms attributed to Lyme disease [15]. These studies explore the challenges of navigating a complex medical system while facing scepticism and dismissal from some healthcare providers. What began as a relatively straightforward debate over diagnosis and treatment has expanded into a complex and often polarised controversy that touches on issues of medical authority, patient autonomy, and the role of the media in shaping public understanding of health[59], deserving of a comprehensive investigation. This persistent discourse on CLD and PTLDS is crucial to health communication, as it highlights the downstream effects of academic research on media narratives, which in turn affects public attitudes and, ultimately, behaviours regarding their health choices. It is equally important to consider the public’s trust in scientific expertise and healthcare institutions in the face of stark disagreements, whose erosion of credibility can also result in various health behaviours, including non-compliance with standard treatments, seeking unverified or potentially harmful remedies, and withdrawing from conventional healthcare systems [60]. Since such medical conflicts seldom exist solely within the realm of scientific discourse but are instead socially produced through the interplay of scientific institutions, policymakers, media, and the public [54, 55, 61, 62], Lyme disease is, therefore, a prime example of a contested sickness, illustrating how medical ambiguity generates competing knowledge claims and varied treatment paradigms [24]. This controversy, therefore, is important and extends beyond scientific inquiry, shaping public discourse and healthcare behaviours. Media narratives, spanning news reports, documentaries, and digital platforms, often amplify conflicting perspectives, influencing public attitudes toward diagnosis and treatment [29, 30]. As a result, patients navigating uncertainty may turn to alternative sources when they perceive mainstream reporting as dismissive, potentially leading to non-adherence to conventional medical guidance or the pursuit of extended antibiotic regimens [63, 64]. A critical dimension of this debate is the erosion of trust in scientific expertise and healthcare institutions. Conflicting guidelines 3 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW from organizations such as IDSA and ILADS contribute to uncertainty, influencing patient behaviours that range from scepticism toward standard treatments to reliance on unverified or experimental therapies [28, 60]. This complex interplay between medical uncertainty, media influence, and public trust underscores the need for strategic health communication approaches that integrate evidence-based insights with patient experiences. Contribution and novelty Despite the wealth of research and debate, no comprehensive and systematic synthesis has been conducted to map the evolution of academic discourse on Lyme disease over the past 25 years [65]. The goal of this study has been to fill this gap, provide practical insights into the development of the scientific discourse on this controversy, and identify overarching trends. The key contribution of this study lies in examining over a thousand relevant academic studies spanning the past quarter-century. The novelty is centred around leveraging the latest advancements in AI technologies, employing the most sophisticated LLMs to date to perform stance and viewpoint detection expressed within these studies, allowing the extraction of both explicit positions and more nuanced sentiments contained in these texts surrounding the controversy over CLD and PTLDS. Furthermore, we developed a novel hybrid AI-driven methodology that enabled us to automate the processing of large volumes of text to map the discourse of this controversy and its evolution over time. Our work not only conducted a comprehensive analysis of how viewpoints have shifted over time and how different journals have platformed differing positions, but we also extracted major themes charting their development over time through to the present status of the debate. This timely integration of computational intelligence with medical research provides unprecedented insights into this long-standing controversy and establishes a robust AI-driven methodology to address complex debates in healthcare literature at scale. 2 Related Works In recent years, research on Lyme disease has examined its clinical, epidemiological, and sociocultural dimensions. These studies have ranged from broad narrative and scoping reviews investigating overarching challenges to systematic reviews offering targeted insights into specific aspects of diagnosis, treatment, and disease mechanisms. The literature covered several key areas: treatment and management, pathology and disease mechanisms, diagnostic controversies and challenges, epidemiology and public health as well as the sociological perspective. 2.1 Treatment and Management The efficacy of antibiotic treatment for PTLDS has been a persistent issue where early research questioned the rationale for prolonged antimicrobial therapy. Lantos [66] conducted a systematic review that evaluated the role of chronic co-infections such as Babesia, Anaplasma, and Bartonella, concluding that no compelling evidence supported their role in PTLDS or CLD. This finding challenged earlier assertions that lingering symptoms were due to persistent infections requiring long-term antibiotic regimens. Subsequent studies reinforced these findings. Rebman and Aucott [67] reviewed PTLDS from a mechanistic perspective and argued that persistent symptoms were more likely linked to immune dysfunction and neural sensitisation rather than ongoing infection, suggesting a shift from pathogen-focused treatments. Dersch et al. [68] further challenged the antibiotic paradigm, finding no statistically significant benefit of antimicrobial therapy on quality of life, cognition, or depression while reporting an increased incidence of adverse effects. Beyond clinical efficacy, the risks of overtreatment became more prominent in other reviews. Sébastien et al. [69] conducted a systematic review and found that misdiagnosis of PTLDS was widespread, ranging from 80% to 100% of suspected cases being incorrectly classified, leading to unnecessary and sometimes harmful antibiotic treatment. In addition, Mattingly and Shere-Wolfe [70] evaluated the economic burden of Lyme disease and revealed substantial healthcare costs associated with misdiagnosis and overtreatment. At the same time, Van Hout [71] highlighted that despite the increasing global incidence of Lyme disease, pharmaceutical investment in its treatment was lacking. Additionally, the authors claimed that the evidence and international guidelines for managing CLD remained conflicting and controversial, posing challenges to public health policy and clinical practice. 2.2 Pathology and Disease Mechanisms The biological mechanisms underlying PTLDS and CLD have also remained an area of significant debate over the past few decades. Marques [11] provided an early framework for distinguishing between different patient groups diagnosed with CLD, recognising that many individuals lacked objective evidence of active Borrelia burgdorferi infection. This categorisation laid the groundwork for later investigations into the nature of persistent symptoms. Borgermans et al. [72] expanded upon the early framework by [11] exploring CLD as a multifaceted clinical entity. The review by Borgermans et al. [72] suggested that CLD remained poorly understood, with ongoing debates regarding its definition, diagnosis, 4 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW and treatment. Mac et al. [16] further contributed to this discussion by conducting a systematic review that documented the long-term effects of Lyme disease, reporting that patients frequently experienced fatigue, musculoskeletal pain, and cognitive impairment, though the exact mechanisms remain uncertain. Beyond symptom classification, recent works have examined the complex biological interactions underlying Borrelia burgdorferi persistence and disease progression. Bamm et al. [73] provided an integrative review of Borrelia burgdorferi biology, host-pathogen interactions, and immune evasion strategies, emphasising that the spirochete’s ability to modulate host responses may contribute to prolonged symptoms even after standard treatment. Their findings reinforced earlier hypotheses that PTLDS symptoms could stem from immune dysregulation and chronic inflammation rather than ongoing infection. Focusing on the neuropsychiatric dimensions of Lyme disease, Brackett et al. [74] linked infection to increased risks of cognitive decline, anxiety, and depression. Recently, Bobe et al. [58] reviewed the progress and understanding of Lyme disease in the five years preceding their study and explored the role of immune dysregulation and potential autoimmune triggers, arguing that PTLDS symptoms were more likely driven by sustained inflammation rather than persistent infection. 2.3 Diagnostic Controversies and Challenges A longstanding source of clinical and research debate has also surrounded the challenges in diagnosing Lyme disease. Brunton et al. [57] systematically reviewed stakeholder perspectives and found widespread dissatisfaction with existing diagnostic tools. Their study highlighted the disconnect between clinician scepticism and patient experiences, which, according to the authors, frequently led to diagnostic uncertainty and strained doctor-patient relationships. Studies have investigated the specific limitations of current diagnostic methods, and thus, the diagnosis of Lyme disease remains contentious, which is marked by significant challenges in clinical practice and research. Diagnostic uncertainty frequently arises from dissatisfaction with existing testing methods, highlighting marked discrepancies between clinical and patient experiences [57]. Conventional diagnostic tests, primarily the two-tiered approach of enzyme immunoassay (ELISA) followed by immunoblotting, have notable limitations, especially in early disease stages, leading to delayed treatments and misdiagnoses [75, 76]. Additionally, regional variation in Borrelia genospecies poses substantial obstacles, as standard assays often fail to detect less common strains, further complicating diagnosis [76]. These diagnostic challenges extend into the debate surrounding PTLDS and CLD. Divergent perspectives among medical communities exacerbate the controversy, notably seen in conflicting guidelines from influential organisations such as the IDSA and the ILADS. Such conflicts can result in misdiagnoses, inappropriate therapies, and decreased trust in healthcare institutions [59, 77, 78]. Furthermore, geographical variations in disease presentation, including Lyme-like illnesses without confirmed local Borrelia infections, add layers of complexity to accurate disease identification and management [77]. Studies have noted that resolving these diagnostic and therapeutic challenges necessitates more accurate biomarkers and standardised diagnostic protocols to improve early detection and patient outcomes [75]. 2.4 Epidemiology and Public Health Other research reviews on the study of Lyme disease have focused more on the driving environmental factors, whereas Stone et al. [79] took an ecological perspective, identifying climate change and tick habitat expansion as key factors influencing Lyme disease incidence. Dong et al. [80] conducted a global meta-analysis estimating that 14.5% of the population had been exposed to Borrelia burgdorferi, with the highest seroprevalence in Central Europe (20.7%) and Eastern Asia (15.9%). This study built on earlier epidemiological assessments, such as those by Van Hout [59], who also identified climate change and tick population dynamics as key factors in the disease’s increasing range. Meanwhile, other studies [81] focused on the economic burden imposed by Lyme disease, concluding that it is significant, particularly in the US, and to that end, justifying further research efforts in disease control and management. Mattingly and Shere- Wolfe [70] also evaluated the financial burden of Lyme disease, highlighting significant healthcare costs and productivity losses. Bobe et al. [58] noted in the U.S. that federal funding for Lyme disease research remained disproportionately low relative to its public health impact, with increasing reliance on private philanthropic contributions. 2.5 Sociological Perspective Peretti-Watel et al. [8] examined Lyme disease as a case study in medical controversy, showing how public mistrust in health authorities fuelled competing narratives about the disease’s prevalence and management and exacerbated by ongoing conflicts between the IDSA and ILADS organizations, which has created ambiguity concerning optimal practice guidelines [10, 28]. Olechnowicz et al. [82] introduced a sociological perspective by examining how perceptions of vulnerability influence individual behaviours and trust in information sources regarding Lyme disease. Their study validated a novel vulnerability scale, demonstrating that emotional discomfort and perceived susceptibility shape engagement in preventive behaviours, highlighting the role of psychology and public trust in shaping disease management strategies. 5 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Studies by Pascal et al. [49] and Uzzell et al. [83] emphasised the impact of media and public health communication on societal attitudes, portraying Lyme disease as either a disregarded epidemic or an exaggerated controversy, highlighting its depiction as a socially constructed phenomenon shaped by media narratives, patient advocacy, medical uncertainty, and institutional prejudices. Puppo et al. [84] and Hinds and Sutcliffe [85] brought to focus the distinction between conventional medical institutions and alternative “Lyme-literate” viewpoints, highlighting the knowledge disparity between orthodox and heterodox discourses, while Rebman et al. [15], Baarsma et al. [86] conducted concurrent research on patient experiences, highlighting issues on medical legitimacy, diagnostic ambiguity, and the psychological ramifications of disputed illness status. The increasing confluence of medical authority, patient empowerment, and online activism as examined in [87–89], demonstrated how self-advocacy organisations challenge institutional authority while promoting scientific fragmentation. Meanwhile, Bloor et al. [90] demonstrated the influence of scientific uncertainty and policy inconsistency on institutional decision-making and advocacy around Lyme disease. These studies jointly contributed towards perceiving that Lyme disease is not merely a medical ailment but a politicised and socially contentious illness, highlighting the broader conflicts among research, politics, and patient-centred healthcare. To that end, it has been postulated [54, 55, 61, 62] that medical conflicts seldom exist solely within the realm of scientific discourse; instead, they are socially produced through the interplay of scientific institutions, policymakers, media, and the public. Lyme disease is thus a case in point, illustrating how medical ambiguity generates competing knowledge claims and varied treatment paradigms [24]. Pascal et al. [49] demonstrated how media discourse has significantly contributed to the perception of Lyme disease as a societal issue, bolstering biomedical scepticism and patient activism. Puppo et al. [84] illustrated that Lyme-literate medical professionals had established alternative epistemic networks that challenge prevailing medical paradigms and promote unconventional treatment protocols. Moreover, internet platforms have revolutionised the discourse surrounding Lyme disease, serving as “knowledge enclaves” as characterised by Brown [91], where scientific credibility is reinterpreted through collective patient experiences rather than peer-reviewed research [92]. This corresponded with extensive sociological research about disseminating health-related misinformation in digital contexts, reinforcing health attitudes that deviate from conventional medical guidelines [48, 93]. 2.6 Study Research Questions While previous reviews have addressed the clinical, epidemiological, and sociocultural dimensions of Lyme disease, they have not systematically synthesised how the academic discourse on CLD and PTLDS has evolved. This study fills that gap by employing a large-scale, AI-driven approach to map thematic trends and stance distributions. It sought to identify and track shifts in scholarly perspectives, leading to the following key research questions: • (RQ1) How has the academic discourse on CLD and PTLDS evolved over the past 25 years regarding research volume, thematic focus, and stance distribution? • (RQ2) How do journal specialisation and editorial focus influence the representation of perspectives on CLD and PTLDS in the peer-reviewed literature? • (RQ3) What are the dominant thematic structures within the Lyme disease debate, and how do they correspond to competing explanatory models and levels of scientific consensus? 3 Theoretical and Methodological Grounding in Science and Technology Studies (STS) Framework This study employed a Science and Technology Studies (STS) framework [94] in conjunction with computational AI models to analyse the Lyme disease controversy. STS offers a critical lens for understanding how social, cultural, and historical factors shape scientific knowledge, particularly in biomedicine and contested illnesses [46, 95–97]. For complex and contested illnesses like Lyme disease, an STS framework is particularly valuable in enabling the examination of the interplay of social dynamics, power relations, and the very construction of medical knowledge itself, while pairing this approach with the latest AI-automation and reasoning technologies for information extraction at scale. Key principles and concepts from STS that guided our overall approach in formulating the technical aspects of our methodology in Section 4, include: • Selective Social Construction (STS Principle): STS highlights that social shaping varies across scientific fields. While physics is empirically constrained, biomedicine, addressing complex human systems, is more open to social interpretation. Even within biomedicine, diagnostic categories (PTLDS vs. CLD) are more socially negotiated than, for example, the molecular biology of Borrelia burgdorferi. 6 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Figure 2: Overview of the steps comprising the proposed hybrid AI-driven content analysis methodology. • Empirical Constraint and Real Phenomena (STS Tenet): Our STS framework acknowledges that social construction does not equate to extreme relativism or a dismissal of empirical reality. Empirical observation, clinical data, and technological applications do constrain scientific theories. In the Lyme context, STS helps us analyse how the interpretation of patient symptoms is contested while acknowledging the underlying reality of patient suffering and the biological basis of Lyme infection as legitimate areas of scientific inquiry. • Social Values and Epistemic Commitments in Contested Fields (STS Focus): Lyme disease controversies highlight paradigm clashes and differing epistemic commitments, not just factual disagreements [98]. Compet- ing values, clinical priorities, economic motivations and institutional affiliations influence research agendas and data interpretation, especially in areas lacking scientific consensus. • Discourse Analysis as an STS Methodology (STS Method): Computationally enhanced discourse and framing analysis were essential to reveal social influences in our study. By systematically analysing language, framing, and thematic patterns in Lyme disease literature, guided by an STS framework, we sought to expose the social processes that have historically shaped the scientific conversation and contributed to the enduring Lyme disease controversy. The adoption of the STS framework ensured a theoretically grounded and methodologically rigorous analysis of the Lyme disease debate was conducted. The STS framework informed both our technical methodological implementation, in aspects such as prompt engineering outlined in Section 4.2 as well as in the formulation of the thematic analysis of Section 4.4, but also the interpretation of results, allowing us to analyse the Lyme controversy as a socially constituted and knowledge-producing phenomenon, beyond purely biomedical or clinical perspectives. The integration of STS with medical sociology [99], framing theory [100], and patient experience research [15] collectively offered a rich lens and social science framework for understanding this complex and contested illness. 4 Methodology The technical aspects of our methodology combined the social science framework with cutting-edge AI technologies, specifically focusing on LLMs and their emerging reasoning capabilities that venture well beyond classical machine learning approaches [101]. This methodological choice was motivated by the need to analyse a vast and complex body of text, exceeding the capacity of traditional qualitative methods alone. While qualitative approaches are crucial for in-depth analysis of social meaning, LLMs and their sophisticated reasoning abilities offer a scalable and systematic way to identify semantic patterns and trends across a large corpus of scientific abstracts, enabling analyses of shifts in thematic focus, stance distribution, and journal-level preferences on specific topics that would be challenging through manual review [102]. Increasingly, LLMs are being deployed in research for these kinds of tasks [103]. Since LLMs exceed the capabilities of traditional NLP techniques, their value is found in their ability to surface implicit assumptions and underlying frames within the discourse, which can contribute to a deeper understanding of the social construction of knowledge in this contested medical field [104]. We acknowledge the methodological challenges of using AI in 7 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW social science research [105]; therefore, we prioritise transparency2 and validation throughout our methodology, aiming to complement and provide new approaches for scalable research rather than replace established qualitative approaches to discourse analysis. Our entire methodological approach can be summarized in four key steps that include data acquisition (Step 1), abstract classification using automated approaches (Step 2), validation of the automation process via human experts (Step 3) and finally, the thematic analysis of the dataset including the human-in-the-loop verification (Step 4). These four stages and the sub-steps are visualized in Figure 2, according to which the remainder of this section is organized. 4.1 Dataset acquisition - Step 1 Step 1 in Figure 2 represents the entire data acquisition process in detail. The data was collected for a time period from 2000 to 2024 using the Publish or Perish (PoP) [106] software for paper search 3. Academic databases used for paper searches were Google Scholar, Scopus, PubMed, CrossRef, Web of Science, and Semantic Scholar. Search keyword combinations were used focusing on topics related to chronic Lyme disease and post-treatment Lyme disease syndrome, and the search was restricted to each year individually due to a large number of returned results. The search term lyme was used for paper titles. In contrast, various combinations of terms were used for matches across the paper documents comprising disease, borreliosis, borrelia, chronic, controversy, post-treatment, PTLDS, acute, syndrome, and post-Lyme. The initial search produced a dataset of 84,140 papers covering a 25-year timeframe; however, a large proportion of abstracts were missing from this initial data collection process. Notably, Google Scholar contributed 41% of all the retrieved records. Figure 3 summarises the proportion of papers acquired by the database. The acquired dataset from PoP comprised the following fields for each paper: • publication: the name of the publication source. • authors: the list of authors who contributed to the article. • year: the year the article was published, from 2000 to 2024. • type: paper type, i.e. article or review etc. • abstracts: the text of the abstract for each article - mostly unpopulated from the PoP search results. • cites: the number of times that the paper has been cited. Figure 3: Proportion of retrieved articles per database Data screening and filtering Next, the PRISMA flow diagram in Figure 4 details the pre-processing, screening and the filtering process. Records with missing or duplicated DOIs (digital object identifier) were excluded from the dataset. More than 38,000 records were removed, leaving 45,271 records for screening. Subsequently, approximately 27,180 records were excluded due to missing publication names, article titles or abstracts. Python scripts were written to automatically retrieve missing abstracts where possible using APIs and DOIs 4 resulting in 7,360 previously missing abstracts. Approximately 7,528 records were excluded due to irretrievable abstracts, non-English text, and lacking relevant search terms. Consequently, the resulting dataset comprised 8829 potentially relevant abstracts, requiring more detailed screening and analysis for relevance. 2The datasets, LLM prompts and outputs can be found at https://github.com/teosusnjak/Lyme-disease-controversy 3This data collection builds on and extends the earlier work [107]. 4Scopus API Springer Nature API 8 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Records identified from databases (n = 84140): - CrossRef (n = 17756) - Google Scholar (n= 36990) - PubMed (n = 13320) - Scopus (n = 11162) - Web of Science Starter (n = 1648) - Semantic Scholar (n = 3264) Search Terms: ‘Lyme Disease’ (‘Lyme’ for CrossRef), ‘borrelia’, ’borreliosis’, lyme and (chronic, controversy, post treatment, PTLDS, acute, Syndrome, Post Lyme) (for Scopus, CrossRef, Web of Science Starter, Semantic Scholar) Records removed before screening: - Records without a DOI (n = 30011) - Duplicate records from same database (n = 8858) Records screened (n = 45271) Records excluded: - Non journal articles (n = 16471) - Missing journal or article names (n = 39) - Truncated journal names (n = 577) - Duplicated records when merged (n = 10093) Abstracts retrieved with record (n = 10731) Abstracts sought for retrieval (n = 7360) Abstracts not successfully retrieved 1 (n = 7528) Abstracts were assessed for eligibility (n = 10563) Abstracts excluded: - Non-English abstracts (n = 158) - Lacking relevant terms2 (n = 1576) Abstracts included (n = 8829): - Potentially Related to CLD/PTLDS (n = 4359) - Definitely Unrelated (n = 3274) - Animal Study (n = 1196) Identification of studies via databases and registers S c re e n in g In c lu d e d Id e n ti fi c a ti o n Figure 4: PRISMA flow diagram for new systematic reviews, which included searches of databases and registers only. 1This number includes 2088 abstracts with less than 300 characters in length which were rejected due to the information content being too low for analysis. 2Lyme, Borrelia*, burgdorferi, Ixodes, Erythema, migrans, tick-borne, tickborne, tick borne. 4.2 Abstract Classification - Step 2 Following the initial screening and filtering process that relied on a straightforward mechanical application of inclusion and exclusion rules, 8829 records remained, which needed a deeper semantic analysis of the content to be relevant to the PTLDs/CLD controversy. Three sub-steps with increasing sophistication were applied. Initially, an automated pre-screening classification (Step 2a) approach was applied leveraging a reasoning LLM (OpenAI’s GPT-4o-mini). The purpose of this task was to eliminate the most obviously irrelevant abstracts for which a very high level of confidence in their rejection was achieved while retaining others for a deeper analysis. The aim was to remove all abstracts with a clear focus on animal studies and those that focus on Lyme disease, but hold no relevance to the PTLDS/CLD discourse. Therefore, each abstract was classified into one of three predefined classes: Potentially Related to CLD/PTLDS, Definitely Unrelated, or Animal Study. This approach is supported by literature, where machine learning has been successfully used to classify medical texts for some time [108]. As more sophisticated models have emerged, medical text classification has improved considerably [109], and LLM models have thus been applied to aid in systematic reviews of the literature [110, 111]. Notably, due to the speed and cost, LLMs have even been found to be more accurate overall than expert annotators at classifying texts [112]. Each abstract was processed individually via API calls using the classification prompt in Appendix A. The model generated a JSON output for every input containing the abstract’s index, a classification, and a confidence score (High, Medium, or Low). Abstracts classified as Potentially Related to CLD/PTLDS were retained for further, more refined processing and classification. Low-confidence non-Potentially Related to CLD/PTLDS classifications was flagged for additional classification validation to ensure comprehensive coverage and minimise false exclusions. This low-resolution pre-screening classification step effectively categorized the 8,630 abstracts as follows: 4,160 (48.2%) Potentially Related to CLD/PTLDS, 3,274 (37.9%) Definitely Unrelated, and 1,196 (13.9%) Animal Study. Thus, all 9 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Animal Study and Definitely Unrelated abstracts receiving a Medium to High classification confidence level were eliminated from further analysis. Stance-Framing Classification - Step 2b In Step 2b, the actual classification of the remaining 4,160 abstracts into target categories that support the study’s goals commenced. In this step, the authors’ implicit or explicitly stated position on the controversy needed to be identified, requiring both sophisticated reasoning and domain knowledge. From Stance to Frame Detection Sentiment analysis or opinion mining is often used to interpret an author’s feelings on a subject [113]. Frequently, the sentiment is determined by the number and strength of positive and negative words and which parts of speech are used. However, traditional sentiment analysis cannot recognise implicit meaning in text, frequently resulting in incorrect interpretations [113]. Sentiment analysis is ill-suited for the task of this study also because academic articles are expected to employ near-neutral expressions, while domain-specific terms such as ’disease’ are neutral and objective in this context, while interpreted negatively by standard sentiment analysis models, thus skewing results for this oft appearing term. Stance detection, on the other hand, aims to use machine learning models to automatically determine the position or attitude expressed in a text towards a specific target concept, event, or entity and seeks to identify whether the text favours, is against, or is neutral towards the target [114]. Recent studies have demonstrated the effectiveness of LLMs in stance detection tasks with high reliability and accuracy [115]. Therefore, this research also leveraged technology for this task. Moreover, in Step 2b, our study conceptualised ’stance’ as reflecting a more complex orientation towards the Lyme disease controversy. Drawing on framing theory [100] and discourse analysis [116], we defined ’stance’ more broadly as the underlying perspective or interpretive frame and underlying perspectives adopted by authors in relation to the CLD/PTLDS debate – not merely positive/negative sentiment. Therefore, this included explicit agreement or disagreement with particular positions and implicit assumptions, preferred modes of reasoning, and the values and priorities emphasised in the authors’ arguments. For example, a ’PTLDS-supporting stance’ might be characterised by framing persistent symptoms as primarily immune-mediated, relying on epidemiological evidence, and prioritising mainstream clinical guidelines. Conversely, a CLD-supporting stance might frame persistent symptoms as indicative of ongoing infection, emphasise patient narratives and anecdotal evidence, and prioritise alternative treatment approaches. Therefore, by using LLMs for stance or frame detection, we aim to identify these cues and more nuanced framing patterns within the academic discourse, revealing the underlying epistemological and ideological dimensions of the Lyme disease controversy. To accomplish this, we engaged with subject experts in an iterative refinement process (highlighted in Step 2b in Figure 2) to design an LLM prompt capable of performing this complex task, which also considers our STS framework. The final classification prompt can be seen in Appendix A, which set definitions, classification criteria and few-shot in-context learning, to classify 4,160 abstracts into the following categories: “Supports PTLDS”, “Supports CLD”, “Neutral”, “Unrelated”, or “Animal Study”. For each classification, the LLM needed to provide an accompanying justification text explaining the reasoning for its classification decision and a confidence level. An example of two abstracts, together with classification and confidence categories, as well as the justification texts, can be seen in Appendix B in Figures 11 and 10). The results from this classification step yielded a total of 1,033 abstracts of interest that fell into the target categories of “Supports PTLDS”, “Supports CLD” or “Neutral”. Self-Reflection Classification - Step 2c The classifications and the justifications from Step 2b were then reevaluated and reassessed using automated methods and corrected where necessary to improve accuracy. To achieve this, self-reflection prompting technique was used [117]. Self-reflection prompting is a technique where LLMs re-assess and refine their initial outputs to identify and correct errors, improving overall reasoning and decision-making capabilities. This method has significantly improved problem-solving performance in LLMs [117]. After executing this step, we found that only 5.3% (229 out of 4359) of the classifications changed. The largest categories to undergo re-classifications were “Neutral” (98), where 60% flipped to “Supports PTLDS” upon revision, and the “Unrelated” category (64), where 89% subsequently became “Neutral”. Only two abstracts underwent a significant revision from initially “Supports CLD” to subsequently “Supports PTLDS”, thus confirming the stability and consistency of the classifications and the approach used. Several examples of abstract classifications and their adjustments in both the final label and justification texts can be seen in Appendix A.1. 4.3 Human Validation - Step 3 Following the final classifications, we assessed their reliability and their overall alignment with human expert judgments. To accomplish this, we conducted a comprehensive Inter-Rater Reliability (IRR) analysis [118, 119]. We sought to 10 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW compare and establish the degree of agreement between the classifications (1) of our chosen LLM and those of two subject-expert raters, (2) between the two human raters, and (3) between all classifications, including six additional LLMs. Therefore, this evaluation included both pairwise agreement (using Cohen’s Kappa [118]) and multi-rater agreement (using Fleiss’ Kappa [119]). Ultimately, the goal was to determine whether the LLM-based classifications exhibited sufficient reliability comparable to human expert judgement, which would validate the methodology and the experimental results. The six additional cutting-edge and most advanced LLMs to date included in this validation were Google’s Gemini 2.0 Flash (plus the model in the Thinking Mode), Anthropic’s Claude 3.5 Sonnet, DeepSeek’s R1 model, Alibaba’s Qwen 2.5 Max, X’s Grok-3. The key metric used to measure inter-rater agreement between two raters while adjusting for chance agreement was Cohen’s Kappa [118], defined as: κ = Po − Pe 1− Pe (1) where Po is the observed agreement, and Pe is the expected agreement by chance. It ranges from -1 (complete disagreement) to 1 (perfect agreement), with standard interpretation thresholds [120] from poor ( κ < 0.00 ) to almost perfect ( κ ≥ 0.80 ). Fleiss’ Kappa generalises Cohen’s Kappa to multiple raters, providing a single measure of agreement across all LLMs and human raters. Higher kappa values (max 0.8) indicate strong agreement, reinforcing the validity of automated classification. Low values (< 0.4) suggest significant inconsistencies, highlighting areas of marked divergence. Agreement Between Human Raters and LLMs To establish a baseline for IRR, we first assessed agreement between two domain-expert human raters from the author team. Prior to classification, both raters underwent training to ensure consistency in applying the classification framework (see stance-framing classification prompt in Appendix A ). This entailed meetings and the provision of a classification training pack. The human raters were provided with the same prompts, criteria, and assumptions used by the LLMs, along with examples of their classifications for reference. A custom web application was developed for independent annotation (see Appendix B, Figures 11 and 10), which enabled raters to classify a random sample of 150 abstracts. For each abstract, the raters selected a classification and chose between two possible justification options for their most suitable decision. Cohen’s Kappa (κ = 0.501) indicated moderate agreement, aligning with established IRR benchmarks in literature [121–123] and prior studies in subjective classification tasks such as qualitative content analysis, medical diagnoses, and thematic coding [124]. Perfect agreement is rarely expected in complex classification tasks due to interpretive differences and variations in emphasis on textual elements. Next, we evaluated the agreement between the final revised classifications (produced via GPT-4o-mini’s self-reflection process referred to below as ’GPT’) and a set of alternative LLM-generated classifications (Gemini, Gemini-Thinking, Claude, DeepSeek, Qwen, Grok-3), alongside the original LLM classification and those of the two human raters. Table 1 presents Cohen’s Kappa for each pairwise comparison. Comparison Cohen’s Kappa GPT vs. Original GPT Classification 0.767 GPT vs. Gemini 0.717 GPT vs. Gemini-Thinking 0.608 GPT vs. Qwen 0.600 GPT vs. Claude 0.592 GPT vs. Human Interrater 1 0.583 GPT vs. Human Interrater 2 0.508 GPT vs. DeepSeek 0.475 GPT vs. Grok-3 0.458 Human-Human Agreement 0.501 Table 1: Cohen’s Kappa agreement between GPT revised classifications after self-reflection and all other raters/classifi- cations Key observations emerge from validation results in Table 1: 11 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW • The strongest agreement occurred between the original and GPT’s revised classifications (κ = 0.767), indicat- ing that the LLM self-reflection process refined classifications to a limited degree but did not fundamentally alter them. This suggests that initial classifications were internally consistent, requiring only minor adjustments. This, therefore, supports the claim that there is evidence of an underlying stability in the approach used, confirming both the leveraging of LLMs for this task and the suitability of the designed prompts. • Human vs. GPT’s Revised Classification: The agreement between GPT’s revised classifications and human raters (κ = 0.583 for Interrater 1, κ = 0.508 for Interrater 2) is comparable to human-human agreement (κ = 0.501). This means that the GPT’s revised classifications are at least as consistent with human judgment as human raters are with each other, thus reinforcing the validity of LLM’s classifications. Additionally, on the subset of abstracts where both human raters agree, their IRR with GPT’s classification was very high at κ = 0.709. Notably, with respect to LLM model choices, we found that the highest IRR values for both human raters were with GPT and Original GPT classifications (κ = 0.583 and κ = 0.538), thus also confirming the suitability of the chosen type of LLM for the classification tasks. • The highest agreement with an alternative model occurred with Gemini (κ = 0.717), followed by Qwen (κ = 0.600). This suggests that these models exhibit classification patterns closest to the GPT’s revised outputs, likely due to similarities in training data or reasoning heuristics. • The lowest agreement is observed with Grok-3 (κ = 0.458) and DeepSeek (κ = 0.475), indicating notable divergences in their classification outputs. These discrepancies may reflect different conceptual representations of Lyme disease controversies across models and differences in the underlying datasets used for their training, suggesting that these models are not the best candidates for this task. For completeness, to assess global agreement among all human and automated classifiers, we computed Fleiss’ Kappa (κ = 0.537). This result indicates moderate agreement, aligning with the baseline human-human agreement levels and again reinforcing the consistency of the classification framework as a whole. Fleiss’ Kappa further validates the LLM-assisted classification methodology by showing that the aggregate classification structure remains coherent despite inherent variability across all models. This aligns with prior research in natural language processing and thematic coding, where moderate agreement is a reasonable outcome in subjective classification tasks [123, 124]. Given the comparable and even higher IRR values between human raters and the GPT’s classifications concerning agreement values between the human experts, the IRR analysis confirmed the reliability and methodological rigour of our classification process and the underlying prompts. The alignment between human experts, LLM classifiers, and self-refined classifications demonstrates that LLM-assisted classification of stance or framing in the abstracts is systematic and replicable. Finally, we also examined the validity of the justification texts generated by the LLM to provide a rationale for the classifications given to each abstract. The analysis found that the human raters achieved a Cohen’s Kappa of κ = 0.61, while both human raters scored κ = 0.71 against the LLM, again demonstrating that the human raters agreed to a higher degree with the LLM’s outputs, then with each other’s choices. 4.4 Thematic Analysis - Step 4 Identification and Classification of Overarching Themes After all the abstracts were classified into predefined stance categories (Supports PTLDS, Supports CLD, and Neutral), resulting in 1,033 abstracts, a hybrid computational and a theoretically informed thematic analysis overseen by subject experts was conducted to extract deeper conceptual insights into the dominant lines of reasoning and underlying social dynamics within the Lyme disease controversy. This analysis aimed to identify recurrent patterns within the justifications for classification and the abstracts themselves, ensuring a systematic, reproducible, and theoretically grounded approach to the study of contested medical narratives consistent with established social science methodologies for discourse analysis. Step 4 in Figure 2 depicts this multi-stage, iterative thematic identification process we employed for combining automated reasoning using multiple LLMs with structured reconciliation and expert human validation that balances computational scalability with interpretive depth. The hybrid approach enabled us to leverage the strengths of both computational pattern recognition at scale and qualitative interpretation of LLMs together with that of the critical human review of subject experts, while integrating refinement and validation in the process to ensure social science validity and conceptual richness. This approach acknowledges that while LLMs can efficiently process large volumes of text, human expertise remains crucial for interpreting social meaning and ensuring theoretical coherence within complex, contested domains like medical controversies. 12 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Methodological Approach Description The thematic identification process proceeded in four structured and iterative phases, aiming to ensure methodological rigour, transparency, and conceptual validity, drawing upon established qualitative thematic analysis principles [125, 126]: 1. Multiple-Thematic Identification (Step 4a): • A textual dataset was compiled to automate the theme identification and extraction from the corpus, comprising justification texts from the abstract classification phase in Steps 2b and 2c. Due to context window limitations inherent in the LLM models, the dataset was reduced to a random sample of 800 (out of 1,033) justification texts. This sample size was the maximum feasible for effective LLM processing within the technical constraints while providing a substantial and representative corpus for thematic exploration. • Multiple, diverse and most powerful reasoning LLMs were used for theme identification. This ensured independence in theme identification and mitigated potential biases inherent in any single LLM. Three advanced reasoning LLMs, —GPT preview-o1, Gemini 2.0 Flash, and DeepSeek R1—were each separately tasked with identifying overarching themes within the dataset. • Each LLM was instructed to identify overarching themes without being constrained to a predefined number of themes to extract. This allowed LLMs freedom to enable thematic structures to emerge organically from the data. LLMs were merely instructed to identify “half a dozen or more overarching themes” based on clustering semantic patterns and recurrent arguments within the text, mimicking a human researcher’s inductive thematic coding process. • This process produced three sets of independent themes identified by each LLM, requiring reconciliation. 2. Thematic Consolidation (Step 4b): • Remarkably, despite operating independently and with different underlying architectures, all three models converged on exactly eight overarching themes (Table 2), suggesting internal consistency in the underlying thematic structures of the Lyme disease discourse and reinforcing the robustness and potential validity of the results beyond any single model’s idiosyncrasies. GPT preview-o1 Gemini 2.0 DeepSeek R1 Persistence vs. Resolution of Infec- tion Persistence vs. Resolution of Infec- tion Etiological Mechanisms: Persistent Infection vs. Post-Infectious Im- mune Responses Diagnostic Uncertainty and Misdi- agnosis Diagnostic Uncertainty and Misdi- agnosis Diagnostic Complexity and Biomarker Development Effectiveness of Antibiotic Therapy Effectiveness of Antibiotic Therapy Clinical Management Controversies Role of Immune Dysregulation Immune Dysregulation Autoimmune Pathways and Resid- ual Antigenic Debris Psychological vs. Biological Basis Neurocognitive and Neuropsychi- atric Manifestations Long-Term Outcomes and Symptom Heterogeneity Subjectivity vs. Objectivity of Symptoms Patient-Centered Experiences Advocacy and Psychosocial Burden Sociocultural and Ethical Factors Sociocultural and Ethical Factors Sociocultural and Institutional Influ- ences Mechanisms of Pathogen Persis- tence Mechanisms of Pathogen Persis- tence Bacterial Pathogenesis and Host In- teractions Table 2: Summary of themes independently derived from three LLMs and their mapping. • To establish face validity, two co-authors, possessing subject-expertise in Lyme disease controversies and social science discourse analysis, independently reviewed the initial LLM-generated themes. This review phase aimed to ensure the themes resonated with existing knowledge of the Lyme debate and relevant social science concepts. The reviewers investigated conceptual overlaps and divergences across the three models, noting areas of agreement and disagreement in thematic identification. Consideration was given to comparing emergent themes against existing scholarly literature on medical controversies, contested illnesses, and the specific dynamics of the Lyme disease debate, ensuring alignment with established knowledge. Also, an evaluation was made as to whether themes were mutually exclusive, conceptually 13 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW distinct, and analytically useful for capturing the key dimensions of the Lyme disease discourse from a social science perspective. • Next, the reconciliation process systematically mapped all significant conceptual domains identified by the initial LLM-generated themes. The results can also be seen in Table 2, which shows the alignment of all themes across the outputs of each LLM. The cross-model thematic reconciliation was also performed via an iterative inductive-deductive hybrid approach [125, 126], blending automated pattern recognition with expert human judgment. This consisted of two stages, where we first used the advanced GPT preview-o1 LLM to generate a consolidated thematic structure across the three models. This LLM- assisted step provided an initial synthesis, highlighting commonalities and potential redundancies for human review. Next, the LLM-reconciled output was manually examined and validated by the same two subject-expert co-authors. This involved discussions and critical evaluation to ensure conceptual clarity, eliminate redundancies, and, crucially, refine the themes to align with established discourse frameworks in the social sciences, particularly STS [98]. • The final task in Step 4b was to develop a consolidated thematic framework, presented in Table 3, which defines each theme alongside its theoretical grounding in social science. This framework aimed to integrate insights from STS, illustrating how social, cultural, and political dynamics shape scientific knowledge production and interpretation, extending beyond biomedical perspectives. By embedding established social science perspectives on Lyme disease discourse, the framework sought to reflect the layered complexity of the controversy, its epistemological tensions, and its broader societal implications, ensuring alignment with existing literature on medical controversies and contested illnesses. To ensure methodological rigour and social science validity, the thematic classification process was explicitly grounded in discourse analysis, the sociology of medical knowledge, and medical controversy research [95, 127, 128]. This process systematically integrated LLM-based pattern recognition with expert qualitative interpretation, ensuring that biomedical and sociocultural narratives were meaningfully captured while maintaining conceptual coherence. The hybrid approach preserved the interpretability of computational classifications within broader social and epistemological contexts, ensuring that automated analyses were methodologically sound and aligned with expert human judgment in the study of contested medical knowledge. 3. Abstract Thematic Labelling (Step 4c): With the themes identified, the last classification task was conducted, to classify all 1,033 abstracts and their classification justifications into the derived themes, which could then aid in conducting a deeper analysis. Automation was once again used for this. GPT-o1-mini was tasked with assigning each abstract/justification pair two most suitable thematic categories. The methodological choice was made recognising the multidimensional nature of Lyme disease discourse and acknowledging the potential for abstracts to address multiple thematic dimensions simultaneously. 4. Expert Validation (Step 4d): Finally, we selected a random sample of 50 abstract/justification pairs comprising 100 thematic classifications from Step 4c and asked subject experts to validate the assignment of themes that rounded off the last human-in-the-loop verification of the thematic classification before we proceeded to analysis. In this step, the human validators assessed the classifications for errors. Despite the inherent subjectivity of the task and interpretative overlaps, the human evaluators agreed with 96% of thematic assignments, thus confirming the validity of this process as a whole. 5 Results Figure 5 visualizes the yearly distribution of stance classifications, confirming a notable increase in publication volume over time, particularly after 2014, with perceivable surges in 2015, 2019, and 2021. This trend, viewed through an STS lens, underscores the growing societal and academic relevance of the Lyme disease controversy, signalling its transformation into a more debated area within biomedical and public health discourse. The predominance of the Neutral stance, consistently the largest category, suggests general epistemic caution or perhaps strategic neutrality among researchers. Given the persistent diagnostic and therapeutic uncertainties or publication prospects of the studies, this may reflect a field-wide hesitancy to endorse polarised positions. While fewer studies explicitly endorse CLD, their consistent presence indicates a sustained, albeit marginalised, counter-narrative challenging mainstream PTLDS frameworks. The increase in abstracts supporting the PTLDS perspective, especially post-2010, reveals a dynamic evolution of the scientific dialogue, with a shifting centre of gravity towards the PTLDS framework, even as CLD-aligned discourse persists as a significant dissenting voice. Figure 5 thus highlights the entrenched polarisation and the evolving and contested nature of scientific knowledge production within the Lyme debate. Figure 6 confirms these trends by presenting the overall stance distribution across the 25-year dataset. The Neutral stance indeed constitutes the largest proportion (42%), followed by Supports PTLDS (34%) and Supports CLD (24%). 14 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Final Consolidated Theme Description Social Science Rationale/Relevance Active Infection vs. Post-Infectious Immune Activity Examines whether persistent symptoms are due to an ongoing Borrelia burgdorferi infection or a post-infectious immune response, a central debate in Lyme disease. Epistemological Divide & Paradigm Clash: Reflects the fundamental epistemological divide in the Lyme controversy, highlighting competing paradigms of disease causation (biological vs. immunological). Connects to STS concepts of scientific controversy, paradigm clashes, and the social construction of scientific facts. Diagnostic Complexity and Uncertainty Investigates challenges in Lyme disease diagnosis, including limitations of serological testing, potential misdiagnosis, and the absence of definitive biomarkers. Social Construction of Diagnosis & Medical Uncertainty: Illustrates the social construction of diagnostic categories, the inherent uncertainty in medical knowledge, and the limitations of biomedical reductionism in complex illnesses. Relevant to medical sociology’s focus on the patients’ experience of diagnostic ambiguity, patient navigation of complex medical systems, and the social impact of contested diagnoses. Therapeutic Controversies and Antibiotic Efficacy Addresses the contentious issue of prolonged antibiotic therapy, conflicting treatment guidelines, and the clinical efficacy of alternative interventions. Bioethics & Medical Pluralism: Highlights the social and ethical dimensions of treatment decisions in contested illnesses, including the balance between evidence-based medicine, patient autonomy, and the influence of advocacy groups on treatment choices. Relates to bioethics, the sociology of medical practice, and the study of medical pluralism and treatment-seeking behaviours. Immune Dysregulation and Autoimmune Mechanisms Explores the hypothesis that post-treatment Lyme disease symptoms may stem from immune dysfunction, autoimmunity, or persistent inflammation rather than active infection. Biomedical Framing & Shifting Paradigms: Represents a biomedical framing of persistent symptoms within established immunological paradigms, potentially reflecting a shift away from infection-centric models. Connects to STS analysis of how biomedical frameworks shape research agendas and the legitimation of certain types of medical knowledge over others. Neurocognitive and Neuropsychiatric Manifestations Focuses on Neurological and Psychiatric Sequelae Linked to Lyme Disease, such as cognitive impairment, neuroinflammation, and psychiatric symptoms. Psychosocial Impact of Contested Illness & Stigma: Underscores the psychosocial impact of Lyme disease, including cognitive and mental health challenges, and how these are understood and contested within the controversy. Relevant to medical sociology and medical anthropology’s interest in the patient experience of chronic illness, stigma, and the social construction of mental health in contested medical conditions. Patient-centred Experiences and Advocacy Highlights patient narratives, diagnostic challenges, disparities in medical recognition, and the broader psychosocial context of Lyme disease. Patient Agency & Challenging Medical Authority: Centres on patient perspectives, highlighting patient agency in navigating contested diagnoses, challenging medical authority, and advocating for recognition and alternative treatments. Connects to patient advocacy studies, sociology of patient experience, and the role of online health communities in shaping health discourses. Sociocultural and Ethical Factors Examines the role of advocacy groups, public discourse, legal frameworks, and media representations in shaping Lyme disease perceptions and policies. Social Construction of Health Policy & Media Influence: Explicitly addresses the broader sociocultural and ethical dimensions of the Lyme controversy, examining the influence of advocacy groups, media framing, and legal frameworks on shaping health policy and public understanding. Directly relevant to SSM’s focus on the social determinants of health, health policy analysis, and media studies of health controversies. Mechanisms of Pathogen Persistence and Biofilm Formation Investigates microbial survival mechanisms, including persister cells and biofilms, which may contribute to treatment resistance and chronic symptoms. Marginalised Biomedical Research & Alternative Paradigms: Represents a more marginalised biomedical research perspective, often associated with CLD advocacy, focusing on mechanisms that challenge mainstream views of pathogen eradication and treatment failure. Connects to STS analysis of scientific marginalisation, the sociology of scientific knowledge production in contested fields, and the dynamics of alternative medical paradigms. Table 3: Key overarching themes extracted from the dataset of abstracts, their classifications, and their definition and grounding in the Science and Technology Studies framework. Sociologically interpreted, this aggregate distribution reveals a key structural feature: a substantial portion of the literature strategically navigates the epistemic fault lines of the controversy without fully committing to either pole. However, the combined proportion of PTLDS and CLD-supporting studies (58%) underscores that the majority of research nonetheless remains deeply polarised, reflecting the enduring scientific and clinical divides within the Lyme disease landscape. 15 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Figure 5: Number of relevant studies by year (2000-2004) and their classification. Classifications consisted of Neutral (blue), Supports PTLDS (orange), and Supports CLD (green). Figure 6: Percentage distribution of classifications. Neutral, Supports PTLDS, and Supports CLD are depicted in blue, orange and green, respectively. Figure 7 provides a more granular and deconstructed view of stance distribution, explicitly showing yearly publication counts per classification and confirming earlier observations. However, Figure 7 reveals notably that Neutral and Supports PTLDS studies have driven this growth, while Supports CLD publications have remained comparatively flat, even declining recently. This differential growth, viewed through an STS lens, underscores a shifting centre of gravity in the discourse; whereas the academic conversation expands, increasingly it has centred on Neutral or PTLDS-aligned perspectives. The sustained prominence of the Neutral category across years, now visually explicit in Figure 7, further highlights strategic epistemic caution within Lyme disease research. Yearly data show Neutral studies consistently as the largest single category, often exceeding the combined count of polarised stances. This sustained neutrality, particularly amidst intense debate, likely reflects methodological conservatism where researchers may be prioritising cautious, evidence-based approaches, avoiding definitive stances given persistent diagnostic and therapeutic uncertainties. Conversely, the relatively flat trajectory of Supports CLD publications, now clearly differentiated, underscores the persistent marginalization of this perspective. Despite overall research growth, CLD-advocating publications have not seen comparable increases, exhibiting a recent decline. This visual trend reinforces the interpretation of a field where CLD viewpoints, though consistently present, remain a minority, failing to gain broader traction in mainstream academic discourse. In contrast, Supports PTLDS publications show a pronounced upward trend with a distinct acceleration, especially post-2014, suggesting the institutionalisation of the PTLDS framework. This trend indicates a strengthening consensus around immune-mediated explanations and increasing alignment with mainstream medical guidelines. 16 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW Figure 7: Study classifications on Lyme disease from 2000 to 2025. The yearly count of abstracts on Lyme disease from 2000 to 2025 was classified into three classifications: Neutral, Supports PTLDS, and Supports CLD, which are depicted as blue, orange, and green, respectively. Figure 8: Smoothed trends in classification percentages by year using the Savitzky-Golay [129] filter. Lines represent smoothed trends for Neutral and PTLDS-CLD classifications over the period 2000 to 2025. Figure 8, presenting smoothed trends 5 in classification percentages, offers a longitudinal perspective on stance evolution. The solid line confirms the enduring prominence of neutrality as a defining feature of the discourse. Its upward trajectory, particularly in recent years, reinforces the interpretation of field-wide epistemic caution, reflecting researchers’ strategic navigation of persistent uncertainties. The dashed line represents the smoothed difference in percentage points between the Supports PTLDS and Supports CLD classifications. Values above the zero threshold on the figure represent a greater predominance of PTLDS-supporting studies versus CLD and vice versa for negative values. The figure reveals a more dynamic, fluctuating pattern indicative of the shifting balance of power between competing paradigms. The negative dip in the early to mid-2000s, where CLD-supporting studies held a relative majority, suggests a historically contingent phase of alternative viewpoints gaining temporary traction, potentially fuelled by early patient activism challenging established biomedical narratives. However, the subsequent and sustained shift into positive territory, accelerating in recent years, once more empirically substantiates the institutionalisation of the PTLDS framework as the increasingly dominant paradigm. This longitudinal shift signifies a strengthening alignment with mainstream medical consensus and biomedical authority. While fluctuations persist within the PTLDS-CLD difference, the overall trend underscores the dynamic and continuously evolving nature of the Lyme disease controversy, reflected in shifts in research conclusions and publication volume. Figure 8 thus provides a macro-level view of these long-term shifts, complementing the 5We applied the Savitzky-Golay [129] filter kernel to reduce noise while preserving the underlying signal characteristics, using a second-order polynomial and a window size 10. 17 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW granular yearly data presented in Figure 7 and offering a broader temporal context for understanding the evolving stance landscape. Shifting the focus to potentially latent journal-level biases, Figure 9 reveals a stratified epistemic landscape shaped by publication venue. The figure displays the difference in percentage points between Supports PTLDS and Supports CLD classifications across the top 20 journals by publication volume within our target dataset. Positive values indicate a higher proportion of PTLDS-supporting studies, while negative values suggest a more significant presence of CLD-supporting studies in the given journals’ outputs. This journal-centric view, analysed through STS and Bourdieu’s field theory [130] specifically, underscores how a journal’s specialisation structures the Lyme controversy, channelling and differentiating the representation of competing knowledge claims. Leading infectious disease and clinical medicine journals—Clinical Infectious Diseases, The Journal of Infectious Diseases, and The American Journal of Medicine—predominantly publish PTLDS-supporting studies (positive values) with respect to our dataset. This sociologically interpreted proclivity reflects these field-defining venues reinforcing mainstream medical consensus and legitimising the PTLDS framework. Aligned with established clinical guidelines and institutional authorities like the IDSA, these journals can be seen as functioning as epistemic gatekeepers, preferentially disseminating research congruent with dominant biomedical narratives. Figure 9: Top 20 journals by volume representing our dataset and depicting their potential bias. Values are the difference in percentage points between published papers classified as supporting PTLDS or CLD. Positive values indicate a higher proportion of PTLDS-supporting studies for a journal and vice-versa for CLD. Conversely, journals with broader, hypothesis-driven scopes—Medical Hypotheses and Antibiotics (Basel, Switzer- land)—exhibit a contrasting bias towards CLD-supporting studies (negative values). This suggests that CLD-aligned research, challenging mainstream paradigms, often finds outlets outside core infectious disease venues. While offering platforms for heterodox perspectives, these journals occupy a more marginalised position within the biomedical field, potentially limiting the broader impact of CLD-supporting research on clinical practice. The PTLDS preference in immunology and neurology-focused journals—The Journal of Immunology and The European Journal of Neurol- ogy—further highlights disciplinary influences on potential journal bias, reflecting a research emphasis on immune and neurological dysfunction. Meanwhile, the slightly more balanced stance in Ticks and Tick-Borne Diseases and BMC Infectious Diseases likely stems from their broader scope beyond Lyme-specific controversies. Figure 9 thus reveals a segmented epistemic landscape, where journal specialisation and editorial practices actively shape and reinforce distinct perspectives, contributing to the enduring polarisation of the Lyme disease controversy. To further probe the epistemic influence of publications in Figure 9, we complement it with the analysis of citation impact data. Our analysis shows that the top 20 most-cited abstracts—a mere 2% of the dataset—account for 45% of all citations, thus highlighting a skewed distribution of epistemic power towards a small, highly influential subset of publications. This concentration, viewed through STS, underscores how a limited number of studies disproportionately shape the discourse’s trajectory and impact. Analysing stance distribution within these top 20 papers reveals a clear bias: Supports PTLDS studies dominate (13 of 20), eclipsing Neutral (4) and Supports CLD (3) papers. Sociologically interpreted, this citation bias suggests a reinforcement of the PTLDS framework as the most impactful paradigm. The concentration of citations within PTLDS-supporting studies amplifies their visibility, credibility, and perceived scientific authority. This pattern also extends to the overall citation share across the whole dataset. While PTLDS-supporting papers constitute 34% of studies in our dataset, they accrue a disproportionate 52% of citations. Conversely, Neutral studies, the largest abstract proportion (42%), garner a smaller citation share (26%), and CLD-supporting papers, representing 24% of abstracts, receive the smallest share (22%). This discrepancy between publication volume and 18 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW citation impact, analysed through the prism of Bourdieu [130], reveals an epistemic hierarchy. While Neutral studies represent the largest research output, and CLD-supporting studies maintain a consistent presence, the PTLDS framework commands greater epistemic influence, attracting disproportionate scholarly attention and citations. Citation analysis thus further substantiates the institutionalisation of the PTLDS framework as the dominant paradigm, shaping not only publication trends but also the perceived impact and influence of Lyme disease research. 5.1 Thematic Analysis To systematically assess the dominant conceptual dimensions in the Lyme disease debate, we categorised each abstract (see Step 4 in Figure 2) according to overarching themes (defined in Table 3) that capture distinct aspects of the controversy. Table 4 presents the thematic distribution of 1,033 classified papers, illustrating their absolute counts and proportional representation. Additionally, it highlights how each theme aligns with the broader debate by indicating the proportion of papers classified as Neutral, Supports PTLDS, or Supports CLD. This analysis provides a structured view of how different scientific perspectives are distributed across key areas of contention. Neutral Supports Supports Theme Papers Percent % PTLDS (%) CLD (%) Active Infection vs. Post-Infectious Immune Activity 579 56.1 11.7 49.4 38.9 Diagnostic Complexity and Uncertainty 530 51.3 77.0 16.2 6.8 Therapeutic Controversies and Antibiotic Efficacy 365 35.3 20.8 40.5 38.6 Neurocognitive and Neuropsychiatric Manifestations 196 19.0 61.7 21.4 16.8 Immune Dysregulation and Autoimmune Mechanisms 192 18.6 28.1 62.5 9.4 Patient-Centered Experiences and Advocacy 149 14.4 82.6 8.1 9.4 Mechanisms of Pathogen Persistence and Biofilm Formation 30 2.9 10.0 0.0 90.0 Sociocultural and Ethical Factors 25 2.4 76.0 24.0 0.0 Table 4: Distribution of studies by theme and classifications supporting PTLDS/CLD or neutrality. Among the themes in Table 4, Active Infection vs. Post-Infectious Immune Activity emerged as the dominant topic associated with more than half of the studies. This theme reflects the fundamental divide in Lyme disease research, where one faction attributes persistent symptoms to immune dysfunction (Supports PTLDS, 49.4%). At the same time, the other endorses the hypothesis of bacterial persistence (Supports CLD, 38.9%). The relatively low proportion of neutral papers (11.7%) compared to the overall percentage (42% refer to Figure 6) underscores the polarising nature of this theme, as most studies align explicitly with one of the two competing explanatory models. Closely related to this core controversy is the Diagnostic Complexity and Uncertainty theme, which also appears in half of the target studies. In contrast to the previous theme, here there is an overwhelming neutrality of studies in this category (77%), highlighting the ongoing challenges in establishing clear diagnostic criteria over the last quarter of a century and is a problem that has been exacerbated by the limitations of serological tests and the absence of universally accepted biomarkers [57, 75]. Despite the overwhelming neutrality that has sustained the controversy amidst diagnostic ambiguity, a larger proportion of studies lean towards PTLDS (16%), while only 6.8% align with CLD, indicating that, while uncertainty dominates, the prevailing inclination, although modest, favours an immune-mediated rather than infection-driven explanation for persistent symptoms. Discussions surrounding treatment strategies are equally contentious, as reflected in the Therapeutic Controversies and Antibiotic Efficacy theme, which accounts for around a third of the studies. The classification breakdown reveals a near-even split between PTLDS-supporting (40.5%) and CLD-supporting (38.6%) papers, with a significantly smaller than average distribution to a neutral stance, illustrating the ongoing debate over whether extended antibiotic regimens provide a therapeutic benefit or pose unnecessary risks. This division again mirrors the long-standing disagreements between major health organisations regarding treatment guidelines, further reinforcing the unresolved nature of this issue. Beyond these core debates, several secondary themes provide insights into the broader implications of Lyme disease research and discourse. The theme of Immune Dysregulation and Autoimmune Mechanisms, which is identified in a fifth of studies, is predominantly associated with PTLDS (62.5%), suggesting a growing recognition of immune dysfunction as a plausible driver of persistent symptoms. Conversely, bacterial persistence receives significantly less support within this thematic category (9.4%), confirming that research into immune dysregulation tends to align more closely with the PTLDS framework than the CLD perspective. 19 . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2025. ; https://doi.org/10.1101/2025.04.03.25325216doi: medRxiv preprint https://doi.org/10.1101/2025.04.03.25325216 http://creativecommons.org/licenses/by-nc-nd/4.0/ AI-Driven Discourse Analysis of the Lyme Disease Controversy SUBMITTED FOR REVIEW A related but distinct theme, Neurocognitive and Neuropsychiatric Manifestations, also emerges in a fifth of the studies. A significant majority of these, however, remain neutral (61.7%), reflecting the ambiguity surrounding whether neurocognitive symptoms arise from residual infection, immune-mediated mechanisms, or other secondary effects. While PTLDS-supporting papers (21.4%) slightly outnumber CLD-supporting papers (16.8%), the relatively balanced distribution suggests that both explanatory models remain viable considerations in this domain in the presence of predominant uncertainty. Outside of the biomedical discourse, the role of patient experiences and advocacy remains a notable but underexplored dimension. The theme of Patient-Centered Experiences and Advocacy represents approximately an eighth of all studies characterised by overwhelming neutrality (82.6%). This high neutrality suggests that, while patient narratives are acknowledged, few studies explicitly frame them within either the PTLDS or CLD paradigm. Nonetheless, small proportions of studies equally support PTLDS (8.1%) or CLD (9.4%), reflecting the residual tensions between scientific discourse and patient advocacy efforts. Although the theme of Mechanisms of Pathogen Persistence and Biofilm Formation is among the least represented (2.9% of studies), it stands out for its strong association with CLD (90.0%). This overwhelming alignment suggests that research on bacterial persistence may primarily be conducted within the CLD framework, reinforcing its position as the central scientific rationale for the chronic infection hypothesis. However, the limited number of papers in this category indicates that, despite its prominence in CLD discourse, empirical investigations into biofilms and persister cells remain relatively scarce. Lastly, Sociocultural and Ethical Factors constitute the least explored theme, appearing in only 2.4% of the studies. The high proportion of neutral papers (76.0%) suggests that while sociopolitical influences are acknowledged, they are seldom the primary focus of scientific inquiry in this controversy. However, within this small subset of studies, PTLDS receives more explicit support (24.0%) than CLD (0%), potentially reflecting how institutional and ethical discussions more frequently align with the mainstream medical consensus rather than the alternative chronic Lyme paradigm. Table 5 recasts the above analysis into a temporal frame. The table, therefore, reveals temporal shifts in thematic focus across the past three decades. The thematic analysis from this table identifies shifts in the scientific focus of Lyme disease research and discourse over time, normalising the percentages to account for the incompleteness of the current decade. The most studied theme, Active Infection vs. Post-Infectious Immune Activity, has declined from 33% of papers in the 2000s to 24% in the 2020s, reflecting a broader transition from infection-centric to immune-mediated explanations. Meanwhile, Diagnostic Complexity and Uncertainty have grown (21% → 25% → 29%), pointing toward continued and unresolved challenges in establishing reliable biomarkers and diagnostic criteria. Similarly, Therapeutic Controversies and Antibiotic Efficacy, though declining in prevalence (20% → 20% → 14%), remains one of the most contested areas as discussed previously. Theme 2000s (%) 2010s (%) 2020s (%) Active Infection vs. Post-Infectious Immune Activity 33 28 24 Diagnostic Complexity and Uncertainty 21 25 29 Therapeutic Controversies and Antibiotic Efficacy 20 20 14 Neurocognitive and Neuropsychiatric Manifestations 11 8 11 Immune Dysregulation and Autoimmune Mechanisms 10 8 10 Patient-Centered Experiences and Advocacy 3 8 10 Sociocultural and Ethical Factors 1 1 2 Mechanisms of Pathogen Persistence and Biofilm Formation 1 2 1 Table 5: Shifts in the percentage of studies by thematic focus across decades. These findings highlight distinct thematic patterns in Lyme disease research, reinforcing the entrenched divisions that define the controversy. The central debate over persistent infection versus immune-mediated pathology remains a dominant axis of discourse. Yet, its proportional representation has declined as research has increasingly shifted toward diagnostic complexity and uncertainty. Studies focusing on diagnostic complexity have steadily increased over the decades, now representing the most expanding area of research, underscoring the persistent difficulty in establishing definitive diagnostic criteria. Treatment strategies remain a significant point of contention, though research on antibiotic efficacy has proportionally declined, suggesting a decreased emphasis on therapeutic debates over time. While immune dysregulation and neurocognitive symptoms continue to be actively explored within the PTLDS framework, research on bacterial persistence and biofilm formation remains confined to a niche subset of CLD-focused studies, reflecting its limited expansion within mainstream biomedical discourse. The high neutrality observed in patient-centred and sociocultural discussions underscores the ma