Ph.D. @ UIUC · Language-model post-training

Teaching language models to learn, reason,
and discover.

I study what actually makes language models generalize — and turn that understanding into better post-training data, algorithms, and agents that can push past what they were taught.

I'm Dylan Zhang, a CS Ph.D. student at the University of Illinois Urbana-Champaign 🌽, advised by Prof. Hao Peng in the ALTA lab.

Dylan Zhang
Open to Summer 2026 research internships
Research vision

I want language models that don't just recall what we know, but help discover what we don't.

My research is on the post-training of language models — the data and algorithms that turn a pretrained model into a capable, reliable reasoner and agent. I work from a data-centric view: much of what looks like a modeling problem is really a question of which experience a model learns from, and how.

That lens runs through my work. I've shown that instruction diversity — not sheer volume — drives generalization; that the best supervised fine-tuning prepares a model for reinforcement learning rather than merely imitating it; and that self-improving agents can quietly corrupt their own memory as they accumulate experience. The connecting thread is understanding the mechanisms behind generalization well enough to engineer it.

Looking forward, I'm most excited about agents that move from recalling knowledge to discovering it: offline-to-online RL, incentivizing proactive reasoning for knowledge discovery, and foundation-model agents that can be dropped into a novel environment and learn it by experiment. The next frontier for post-training, I believe, is building models that extend the frontier of human knowledge — not just compress it.

Post-training data & algorithms

What data and objectives actually make models generalize — instruction diversity, SFT-for-RL, data selection & reweighting.

RL & reasoning

Offline-to-online reinforcement learning and incentivizing proactive, verifiable reasoning behaviors.

Self-improving agents

How agents learn from their own experience — and the failure modes when memory is continually rewritten.

AI for scientific discovery

Agents that experiment, form hypotheses, and recover mechanisms — a step toward AI scientists.

Recent news

What's new

Jun 2026
New · Writing New interactive write-up: CausaLab — Can LLM Agents Discover Causal Mechanisms by Experiment? Putting agents in a synthetic lab to see whether they can intervene, observe, and revise like scientists.
May 2026
Writing New interactive write-up: Useful Memories Become Faulty When Continuously Updated by LLMs — why agents that compress experience into text can end up worse than no memory at all.
Feb 2026
Sep 2025
NeurIPS 2025 Spotlight GRAPE was accepted to NeurIPS 2025 as a Spotlight! 🎉
May 2025
Started my Student Researcher journey with Google. 🚀
Selected works

Publications

2026
Junlin Yang*, Dylan Zhang*, Xiangchen Song, Qirun Dai, Xiao Liu, Yuen Chen, Aniket Vashishtha, Jing Shi, Chenhao Tan, Hao Peng (*equal contribution, project leads)
ICML 2026
Dylan Zhang, Yufeng Xu, Haojin Wang, Qingzhi Chen, Hao Peng
2025
NeurIPS 2025 · Spotlight
Dylan Zhang, et al.
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang, Qirun Dai, Hao Peng
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
Qirun Dai, Dylan Zhang, Jiaqi W. Ma, Hao Peng
Entropy-Regularized Process Reward Model
Hanning Zhang*, Pengcheng Wang*, Shizhe Diao*, Yong Lin, Rui Pan, Dylan Zhang, Pavlo Molchanov, Tong Zhang
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Dylan Zhang*, Rui Pan*, Hanning Zhang*, Xingyuan Pan*, Minrui Xu, Jipeng Zhang, Renjie Pi, Xiaoyu Wang, Tong Zhang (*equal contribution)
2024
Only-IF: Revealing the Decisive Effect of Instruction Diversity on Generalization
Dylan Zhang, Justin Wang, Francois Charton
SciCode: A Research Coding Benchmark Curated by Scientists
Minyang Tian*, Luyu Gao*, Dylan Zhang, Xinan Chen, … (multi-institution collaboration)
2023
Making Large Language Models Better Reasoners with Step-Aware Verifier
Yifei Li, Zeqi Lin, Dylan Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen
2021
Pre-training Co-evolutionary Protein Representation via a Pairwise Masked Language Model
Liang He, Dylan Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, …, Tie-Yan Liu
Experience

Where I've worked

Student Researcher
Google
May 2025 – Present
Mountain View, CA
Research Intern
Microsoft Research
May 2024 – Aug 2024
Redmond, WA
Research Intern
Microsoft Research
May 2023 – Aug 2023
Redmond, WA
Education
🎓
Ph.D. in Computer Science
University of Illinois Urbana-Champaign · Advisor: Prof. Hao Peng (ALTA)
2022 – Present
🌽 Champaign, IL