Yaqi Duan
Welcome to my homepage! I am an Assistant Professor in the Department of Technology, Operations, and Statistics at Stern School of Business at New York University. My primary research interests sit at the intersection of statistical machine learning and operations, centered on data-driven sequential decision making. Recently, I have been studying reinforcement learning for post-training large language models, exploring how data collection, exploration, and optimization dynamics shape learning.
I graduated with a Ph.D. degree from the Department of Operations Research and Financial Engineering at Princeton University in 2022. From 2022 to 2023, I was a postdoctoral researcher at the Laboratory for Information & Decision Systems at Massachusetts Institute of Technology, working with Professor Martin J. Wainwright. Prior to my doctoral studies, I received a B.S. in Mathematics from Peking University.
If you are interested in working together, please feel free to reach out!
đź“§ yaqi.duan [At] stern [Dot] nyu [Dot] edu
📬 KMC 8-54, 44 West 4th Street, New York, NY 10012
News
| Jan 2026 | Paper Stability through curvature: A framework for fast convergence in reinforcement learning has received a Minor Revision decision from Operations Research after the first round of review. |
|---|---|
| Dec 2025 |
New paper Ask, clarify, optimize: Human-LLM agent collaboration for smarter inventory control posted on arXiv. We position LLMs as intelligent collaborators rather than replacements for operations research, enabling effective human–LLM coordination for inventory control through structured agentic design. |
| Oct 2025 | Talk at the 2025 INFORMS Annual Meeting. |
| Oct 2025 |
New paper Don’t waste mistakes: Leveraging negative RL-groups via confidence reweighting
posted on arXiv. We show that negative groups in RLVR can be exploited without extra supervision by reinterpreting MLE as a policy gradient with confidence-weighted penalties, leading to the LENS algorithm. |
| Oct 2025 |
New paper On the optimization dynamics of RLVR: Gradient gap and step size thresholds
posted on arXiv. We develop a theoretical understanding of RLVR optimization dynamics by introducing the Gradient Gap, which characterizes convergence directions and sharp step-size thresholds underlying stability and failure. |
| Aug 2025 | Talk at the 2025 Joint Statistical Meetings. |
| Jul 2025 | I am honored to receive the LSE–NYU Research Seed Fund. I am grateful to collaborate with Professor Qiwei Yao from LSE and thankful for this opportunity. Look forward to our work together! |
| Jul 2025 | Excited to welcome Joe Suk as a Postdoctoral Researcher and look forward to our collaboration! |
| Jul 2025 | Talk at the 2025 INFORMS Applied Probability Conference. |
| May 2025 | Paper PILAF: Optimal human preference sampling for reward modeling accepted by ICML 2025. |
| Apr 2025 | Talk at the Optimization and Statistical Learning Workshop, Columbia University. |
| Feb 2025 | Talk at the CILVR Seminar. |
| Feb 2025 |
New paper PILAF: Optimal human preference sampling for reward modeling posted on arXiv! We introduce PILAF, a simple yet effective algorithm for data collection in RLHF, showing its efficiency both theoretically and empirically. |
| Dec 2024 | New paper Localized exploration in contextual dynamic pricing achieves dimension-free regret posted on arXiv. |
| Dec 2024 | Talk at the RL Theory Seminar. |
| Oct 2024 | Talk at the Department of Statistics, Rutgers University. |
| Sep 2024 | Paper Taming “data-hungry” reinforcement learning? Stability in continuous state-action spaces accepted by NeurIPS 2024. |
| Sep 2024 | Talk at the S. S. Wilks Memorial Seminar in Statistics, Princeton University. |
| Aug 2024 | I am honored to receive my first NSF grant. Grateful for this opportunity! |
| May 2024 | Paper Optimal policy evaluation using kernel-based temporal difference methods accepted by the Annals of Statistics. |
| Feb 2024 | Talk at the Math & Data (MaD) Seminar, New York University. |
| Jan 2024 | New paper Taming “data-hungry” reinforcement learning? Stability in continuous state-action spaces posted on arXiv. |
| Dec 2023 | Paper Adaptive and robust multi-task learning accepted by the Annals of Statistics. |
| Aug 2023 | I’ve joined NYU Stern School of Business as an Assistant Professor in the Department of Technology, Operations, and Statistics. Thrilled to embark on this new academic journey! |
| Nov 2022 | New paper Policy evaluation from a single path: Multi-step methods, mixing and mis-specification posted on arXiv. |
| Oct 2022 | I am honored to receive the 2023 IMS Lawrence D. Brown Ph.D. Student Award. |
Selected publications
- Stability through curvature: A framework for fast convergence in reinforcement learningMinor revision (first round), Operations Research 2025