Arxiv_2502_04270
New paper PILAF: Optimal human preference sampling for reward modeling posted on arXiv!
We introduce PILAF, a simple yet effective algorithm for data collection in RLHF, showing its efficiency both theoretically and empirically.
New paper PILAF: Optimal human preference sampling for reward modeling posted on arXiv!
We introduce PILAF, a simple yet effective algorithm for data collection in RLHF, showing its efficiency both theoretically and empirically.