Fine-grained Natural Feedback | PEARLS Lab @ UCSD

Currently the prevailing form of human preference data collection is as thus: humans are presented with two or more outputs and asked to select one or rank them. This signal is then used to train a reward model, which computes a single scalar reward for each LM-generated sequence. The LM is then trained with RL to optimize the reward it receives (from the reward model). Such a reward provides a relatively sparse training signal, especially for tasks that require the generation of long-form text—making RLHF in such domains unreliable. This project focuses on what it would take to move towards more natural (language), fine-grained rewards along various dimensions.

References

2023

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, and Hannaneh Hajishirzi

In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

Bib Paper Website

@inproceedings{wu2023finegrained,
  title = {Fine-Grained Human Feedback Gives Better Rewards for Language Model Training},
  author = {Wu, Zeqiu and Hu, Yushi and Shi, Weijia and Dziri, Nouha and Suhr, Alane and Ammanabrolu, Prithviraj and Smith, Noah A. and Ostendorf, Mari and Hajishirzi, Hannaneh},
  booktitle = {Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS)},
  year = {2023},
  url = {https://arxiv.org/abs/2306.01693},
}

2022

INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions

Zeqiu Wu, Ryu Parish, Hao Cheng, Sewon Min, Prithviraj Ammanabrolu, Mari Ostendorf, and Hannaneh Hajishirzi

Transactions of the Association for Computational Linguistics (TACL), 2022

Bib Paper

@article{wu2022inscit,
  title = {INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions},
  author = {Wu, Zeqiu and Parish, Ryu and Cheng, Hao and Min, Sewon and Ammanabrolu, Prithviraj and Ostendorf, Mari and Hajishirzi, Hannaneh},
  journal = {Transactions of the Association for Computational Linguistics (TACL)},
  volume = {},
  url = {https://arxiv.org/abs/2207.00746},
  year = {2022},
}