Leveraging Large Language Models for Translational Research Classification

Zheng, Zhejun

Leveraging Large Language Models for Translational Research Classification

Zheng, Zhejun

Permalink

https://hdl.handle.net/2142/133001

Description

Title

Leveraging Large Language Models for Translational Research Classification

Author(s)

Zheng, Zhejun

Issue Date

2026-03-12

Keyword(s)

Translational medicine
Large language models
Text classification
Prompt engineering

Abstract

Introduction. Classifying biomedical publications along the translational research spectrum(T0–T4) is essential for research evaluation, yet remains challenging due to inconsistent stage definitions and the labor-intensive nature of manual annotation. Although Surkis(2016) developed a 34-item checklist and trained machine learning classifiers, these statistical models are susceptible to model drift and cannot explicitly encode expert-defined classification rules. Large Language Models(LLMs) present a compelling alternative by enabling rule-based classification through prompt engineering. Method. We transformed the 34-item checklist into structured prompt templates corresponding to five translational categories. Seven LLMs(gpt-oss_20b, glm-4.5, deepseek-reasoner, and qwen3 variants) were evaluated on 296 expert-annotated publications using zero-shot, one-shot, and three-shot prompting strategies. Performance was assessed across three binary classification tasks(T0, T1/T2, T3/T4) using precision, recall, F1-score, and AUC. Results. DeepSeek-Reasoner achieved the highest F1-scores for T0(0.888) and T1/T2(0.886), while GLM-4.5 performed best on T3/T4(0.729). The top-performing models exceeded the original baselines, attaining AUCs of 0.987(T0) and 0.946(T1/T2) compared to the previously reported 0.94 and 0.84. However, 40–60% of publications received either multiple labels or no label due to the independent prompting strategy for each category. Conclusion. LLM-based classification effectively operationalizes expert-defined rules and outperforms traditional machine learning approaches for early translational stages.

Publisher

iSchools

Series/Report Name or Number

iConference 2026 Proceedings

Type of Resource

Other

Genre of Resource

Conference Poster

Language

eng

Permalink

https://hdl.handle.net/2142/133001

Copyright and License Information

Owning Collections

iConference 2026 Posters PRIMARY

Posters presented at the 2026 iConference https://www.ischools.org/iconference

Leveraging Large Language Models for Translational Research Classification

Zheng, Zhejun

Permalink

Description

Owning Collections

iConference 2026 Posters PRIMARY

Log In