Fine-grained Natural Feedback

How to build better rewards, i.e. proxy models of human preferences?

Currently the prevailing form of human preference data collection is as thus: humans are presented with two or more outputs and asked to select one or rank them. This signal is then used to train a reward model, which computes a single scalar reward for each LM-generated sequence. The LM is then trained with RL to optimize the reward it receives (from the reward model). Such a reward provides a relatively sparse training signal, especially for tasks that require the generation of long-form text—making RLHF in such domains unreliable. This project focuses on what it would take to move towards more natural (language), fine-grained rewards along various dimensions.

References

2023

  1. Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
    Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, and Hannaneh Hajishirzi
    In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

2022

  1. INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions
    Zeqiu Wu, Ryu Parish, Hao Cheng, Sewon Min, Prithviraj Ammanabrolu, Mari Ostendorf, and Hannaneh Hajishirzi
    Transactions of the Association for Computational Linguistics (TACL), 2022