What level of automation is “good enough”? A benchmark of large language models for meta-analysis data extraction

Automating data extraction from full-text randomized controlled trials for meta-analysis remains a significant challenge. This study evaluates the practical performance of three large language models (LLMs) (Gemini-2.0-flash, Grok-3, and GPT-4o-mini) across tasks involving statistical results, risk-of-bias assessments, and study-level characteristics in three medical domains: hypertension, diabetes, and orthopaedics. We tested four distinct prompting strategies (basic prompting, self-reflective prompting, model ensemble, and customized prompts) to determine how to improve extraction quality. All models demonstrate high precision but consistently suffer from poor recall by omitting key information. We found that customized prompts were the most effective, boosting recall by up to 15%. Based on this analysis, we propose a three-tiered set of guidelines for using LLMs in data extraction, matching data types to appropriate levels of automation based on task complexity and risk. Our study offers practical advice for automating data extraction in real-world meta-analyses, balancing LLM efficiency with expert oversight through targeted, task-specific automation.

Keywords

automated meta-analysis, data extraction, evidence synthesis, human-in-the-loop, large language models (LLMs), prompt engineering

Citation

Li L, Mathrani A, Susnjak T. (2026). What level of automation is “good enough”? A benchmark of large language models for meta-analysis data extraction. Research Synthesis Methods. Online first.

URI

https://mro.massey.ac.nz/handle/10179/74213

Collections

Journal Articles

Creative Commons license

Except where otherwised noted, this item's license is described as (c) The author/s

Full item page

What level of automation is “good enough”? A benchmark of large language models for meta-analysis data extraction

Files

Date

DOI

Open Access Location

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Rights

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license