Publications
2025
BioResearcher: 自动化生物医学研究系统
From intention to implementation: Automating biomedical research via llms
SCIENCE CHINA Information Sciences
We develop BioResearcher, an end-to-end automated biomedical research system. The system adopts a modular design and hierarchical learning approach to automate the entire research workflow. Given a research objective, BioResearcher autonomously executes a series of research processes, including literature and dataset retrieval, extraction and analysis of relevant experimental reports, as well as experimental design and programming implementation.
DQABench: 数据库问答基准与评估
Revolutionizing Database Q&A with Large Language Models
KDD 2025
We design a comprehensive evaluation benchmark for database QA and a LLM-based database QA system. The benchmark includes a dataset, a GPT-4-based benchmark construction method, and an evaluation framework designed to assess LLMs' expertise, retrieval capabilities, and tool utilization skills in the database domain. The complete QA system consists of: a) a carefully pre-trained and fine-tuned LLM, b) a question classification routing module, c) a prompt template engineering (PTE) module, d) a retrieval-augmented generation (RAG) module, and e) a tool invocation module. The benchmark thoroughly quantifies the database QA capabilities of nine different LLMs in both Chinese and English. The full QA system enhances the overall performance of a 13B-parameter LLM by 44% in database QA tasks, achieving close or even superior performance to advanced models like GPT-4 in certain domains.
2024
Guide-Align: 基于指南库的安全输出
Ensuring Safe and High-Quality Outputs: A Guideline Library Approach
NAACL 2024
We propose Guide-Align, an innovative approach to address the significant risks of bias, privacy leakage, and harmful content generation in LLMs. Existing rule-based alignment techniques often suffer from three key limitations: imprecise rule definitions, insufficient coverage, and limited risk perception capabilities. Guide-Align overcomes these challenges through three key steps: 1) A safety-trained language model analyzes input data for potential risks and generates corresponding safety guidelines, building a comprehensive guideline library; 2) A retrieval model is trained to match input content with corresponding guidelines; 3) During inference, the retrieval model extracts applicable rules from the library to steer LLM outputs toward safe, high-quality responses. Furthermore, we leverage open datasets to generate alignment data for fine-tuning, yielding our final model, Labrador.
Iter-CoT: 迭代引导的思维链增强
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping
NAACL 2024 Findings
We propose Iter-CoT, a novel iterative self-correction framework. This method enables the model to refine its reasoning chains autonomously, thereby producing more accurate and comprehensive reasoning chains. Additionally, Iter-CoT selectively incorporates challenging yet solvable questions as exemplars, enhancing the model's generalization capability across problems of varying difficulty levels.