Tue, Jan 7, 2025 |
Intro
[slides]
|
|
Thu, Jan 9, 2025 |
What are Agents?
[slides]
|
Stuart Russell and Peter Norvig,
AIMA Chapter 1
|
Tue, Jan 14, 2025 |
Simulated Environments and Reality
[slides]
|
Peter A. Jansen,
A Systematic Survey of Text Worlds as Embodied Natural Language Environments
Jared Sorensen,
Action Castle
|
Thu, Jan 16, 2025 |
How to Make a Simulation?
[slides]
|
Fares Alaboud,
Intro to PDDL
Graham Nelson,
Intro to Inform7
Andrew Plotkin,
The Visible Zorker
Optional:
Jason Scott,
GET LAMP: The Text Adventure Documentary
|
Tue, Jan 21, 2025 |
HW0 Due
|
|
Tue, Jan 21, 2025 |
Search for Planning in Simulations
[slides]
|
Kory Becker,
Intro to STRIPS
Stuart Russell and Peter Norvig,
AIMA Chapter 11
Rich Sutton and Andrew Barto,
RL Book Chapter 8.1, 8.9-8.11
|
Wed, Jan 22, 2025 |
Finalize Project Groups
|
|
Thu, Jan 23, 2025 |
Classical Control, Pre-Deep Learning
[slides]
|
Rich Sutton and Andrew Barto,
RL Book Chapter 4, 5
|
Tue, Jan 28, 2025 |
Deep Reinforcement Learning, Pre-LLMs
[slides]
|
Rich Sutton and Andrew Barto,
RL Book Chapter 6.1-6.5
Minh et al. 2013,
Playing Atari with Deep Reinforcement Learning
Optional:
Minh et al. 2015,
(Nature version) Human-level control through deep reinforcement learning
|
Thu, Jan 30, 2025 |
Project Workshopping
|
|
Tue, Feb 4, 2025 |
Project Pitches
|
|
Thu, Feb 6, 2025 |
HW1 Due
|
|
Fri, Feb 7, 2025 |
Project Pitch Slide Decks Due
|
|
Thu, Feb 6, 2025 |
Reinforcement Learning and Search Combined
[slides]
|
Silver et al. 2017,
(Alpha Zero) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
Ammanabrolu et al. 2019,
How to Avoid Being Eaten by a Grue Structured Exploration Strategies for Textual Worlds
Optional:
Schrittwieser et al. 2019,
(Mu Zero) Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
|
Tue, Feb 11, 2025 |
Attention and Language Modeling
[slides]
|
Vaswani et al.,
Attention is All You Need
Jay Alammar,
The Illustrated Transformer
Brown et al.,
Language Models are Few Shot Learners
|
Thu, Feb 13, 2025 |
RL for Language Agents Pt 1 (Online RL for NLP, RLHF)
[slides]
|
Rich Sutton and Andrew Barto,
RL Book Chapter 13
Ramamurthy*, Ammanabrolu* et al.,
Is Reinforcement Learning (Not) for Natural Language Processing? Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Ouyang et al.,
Training Language Models to Follow Instructions with Human Feedback
|
Tue, Feb 18, 2025 |
RL for Language Agents Pt 2 (Rewards and Closed Form Methods)
[slides]
|
Lightman et al.,
Let's Verify Step by Step
Wu et al.,
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Rafailov et al.,
Direct Preference Optimization Your Language Model is Secretly a Reward Model
|
Thu, Feb 20, 2025 |
HW2 Due
|
|
Thu, Feb 20, 2025 |
Prompt Optimization
[slides]
|
Yao et al.,
ReAct Synergizing Reasoning and Acting in Language Models
Lewis et al.,
(RAG) Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Anthropic,
Building Effective Agents
Optional:
Khattab et al.,
DsPy Compiling Declarative Language Model Calls into Self-Improving Pipelines
|
Tue, Feb 25, 2025 |
Neurosymbolic Tool Use Methods and Agent Reasoning
[slides]
|
Wang et al.,
Behavior Cloned Transformers are Neurosymbolic Reasoners
Liu et al.,
LLM+P Empowering Large Language Models with Optimal Planning Proficiency
Patil et al.,
Gorilla Large Language Model Connected with Massive APIs
Valmeekam et al.,
PlanBench An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
|
Thu, Feb 27, 2025 |
Agent Reasoning and Inference Time Methods
[slides]
|
Zelikman et al.,
Quiet-STaR Language Models Can Teach Themselves to Think Before Speaking
Snell et al.,
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Gandhi et al.,
Stream of Search (SoS) Learning to Search in Language
|
Tue, Mar 4, 2025 |
Multi-Agent Systems and Human-AI Collaboration
[slides]
|
Urbanek et al.,
LIGHT - A large-scale crowdsourced fantasy text game
Park et al.,
Generative Agents Interactive Simulacra of Human Behavior
Vats et al.,
A Survey on Human-AI Teaming with Large Pre-Trained Models
|
Thu, Mar 6, 2025 |
HW3 Due
|
|
Thu, Mar 6, 2025 |
Societal Impacts and Agent Safety
[slides]
|
Tamkin et al.,
Evaluating and Mitigating Discrimination in Language Model Decisions
Labunets et al.,
Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API
Bowman et al.,
Measuring Progress on Scalable Oversight for Large Language Models
Optional:
Chakrabarty et al.,
Creativity Support in the Age of Large Language Models An Empirical Study Involving Emerging Writers
Optional:
Ammanabrolu et al.,
Aligning to Social Norms and Values in Interactive Narratives
|
Tue, Mar 11, 2025 |
Final Presentations Pt 1
|
|
Thu, Mar 13, 2025 |
Final Presentations Pt 2
|
|
Mon, Mar 17, 2025 |
Final Project Writeups Due
|
|