Multimodal Language Grounding

Grounding language to other modalities

How do we effectively synergize learning from language by grounding to other modalities such as vision and motor control?

References

2025

  1. Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning
    Isadora White, Kolby Nottingham, Ayush Maniar, Max Robinson, Hansen Lillemark, Mehul Maheshwari, Lianhui Qin, and Prithviraj Ammanabrolu
    arXiv preprint arXiv:2504.17950, 2025

2023

  1. Multimodal Knowledge Alignment with Reinforcement Learning
    Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, JaeSung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, and Yejin Choi
    In Conference on Computer Vision and Pattern Recognition (CVPR), 2023
  2. Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling
    Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, and Roy Fox
    In International Conference on Machine Learning (ICML), 2023