Yaqi Duan
  • about
  • CV
  • publications
    publications by year
    Google Scholar
  • teaching
  • ~misc.

Arxiv_2502_04270

February 7, 2025

New paper PILAF: Optimal human preference sampling for reward modeling posted on arXiv!
We introduce PILAF, a simple yet effective algorithm for data collection in RLHF, showing its efficiency both theoretically and empirically.

© Copyright 2025 Yaqi Duan.