COLUMBIA UNIVERSITY COMS 6113

Information

Staff

Prereqs

Grading

Overview

LLMs have opened new possibilities of automated agents that plan and complete tasks on the user’s behalf. Such agents have the potential to usher in a new industrial revolution by automating organizational processes. However, agents are currently limited to soft-edge tasks that have large tolerances for error, and are too unreliable for hard-edge tasks, like in healthcare or enterprises, where accuracy and reliability are paramount. In short, what does it take for agents to be used in enterprises?

This graduate-level course will cut across the technology stack to examine the research questions that need to be answered for agents to be possible in real tasks that matter. Each session will review 1-3 papers or systems, and discuss research opportunities that arise from the gap between existing research and enterprise requirements. Topics will span systems (data systems and ML systems), AI (LLMs, agent-based planning), HCI, and theory (reinforcement learning, markets).

Broad questions include

Class Structure

Recent Announcements

Tentative Schedule

What We Want

1/21: Introduction & a quick history of agents - Eugene & Kostis

1/23: Tutorial: Agents Overview - Xiao Yu, Columbia Toggle Bio

1/28: Tutorial: Agent Planning - Xiao Yu, Columbia

WHERE WE ARE

1/30: Now: SWEBench - John Yang, Stanford

02/04: Now: Agents at Google - Fatma Ozcan, Google Research

02/06: Use Case: Bureaucracy Jeffrey Schlegelmilch, National Center for Disaster Preparedness

02/13: Use Case: Agents in Systems Optimization - Shreya Shankar PhD, UC Berkeley

02/18: Now: Agent Frameworks - Phil Calçado, Outropy

02/20: Now: Simulation for embodied agents - Yunzhu

02/25: Now: Servings - Kostis Kaffes

02/27: Use Case: TBA

WHAT WE WANT: RELIABILITY VIA SIMULATION

3/4: TBA

3/6: Models: Neurosymbolic training - Baishakhi Ray, Columbia

3/11: HAI: Hand-offs with humans and context - Lydia Chilton, Columbia

3/13: Use Case: TBA

3/25: Models: Planning - Shipra Agrawal, Columbia

WHAT WE WANT: SAFEGUARDS AND USABILITY

3/27: Systems: Lineage and Data-flow policies - Eugene Wu, Columbia

4/1: Use Case: Coding (AutoCodeRover) - Yuntong Zhang, NUS

4/3: HAI: Evaluating agent outcomes - TBA

4/8: HAI: Schema and Process Induction - TBA

4/10: Systems: ML for systems configuration - TBA

4/15: Systems: Performance Hints - TBA

4/17: Models: Long context LLM - Kuntai Du, UChicago

4/22: Systems: Monitoring - TBA

4/24: TBA

What You Did

4/29: Presentations

5/1: Presentations