COLUMBIA UNIVERSITY COMS 6113

Information

Staff

Prereqs

Grading

Overview

LLMs have opened new possibilities of automated agents that plan and complete tasks on the user’s behalf. Such agents have the potential to usher in a new industrial revolution by automating organizational processes. However, agents are currently limited to soft-edge tasks that have large tolerances for error, and are too unreliable for hard-edge tasks, like in healthcare or enterprises, where accuracy and reliability are paramount. In short, what does it take for agents to be used in enterprises?

This graduate-level course will cut across the technology stack to examine the research questions that need to be answered for agents to be possible in real tasks that matter. Each session will review 1-3 papers or systems, and discuss research opportunities that arise from the gap between existing research and enterprise requirements. Topics will span systems (data systems and ML systems), AI (LLMs, agent-based planning), HCI, and theory (reinforcement learning, markets).

Broad questions include

Class Structure

Project ideas

For 6113 Students Only: Google doc with suggestions. You are welcome to add your own ideas!

Tentative Schedule

1/21 Introduction & a quick history of agents - Eugene & Kostis

WHERE WE ARE

1/23 Tutorial: Agents Overview - Xiao Yu, Columbia

1/28 Tutorial: Agent Planning - Xiao Yu, Columbia

01/30 Background: SWEBench - John Yang, Stanford

02/04 Programming Foundation Models Thomas Joshi, Columbia

02/06 Use Case: Navigating US Disaster Recovery Bureaucracy - Jeffrey Schlegelmilch, National Center for Disaster Preparedness

02/11 No Lecture: Project Proposal Submission - Instructors will be around to discuss and provide feedback

02/13 Use Case: Agents in Systems Optimization - Shreya Shankar PhD, UC Berkeley

02/18 Background: Agent Frameworks - Phil Calçado, Outropy

02/20 Background: Foundation Models for Embodied/Physical Agents - Yunzhu Li, Columbia

Foundation models, such as GPT-4 Vision, have marked significant achievements in the fields of natural language and vision, demonstrating exceptional abilities to adapt to new tasks and scenarios. However, physical/embodied interaction—such as cooking, cleaning, or caregiving—remains a frontier where foundation models and robotic systems have yet to achieve the desired level of adaptability and generalization.

In this talk, I will discuss the opportunities for incorporating foundation models into classic robotic pipelines to endow robots/physical agents with capabilities beyond those achievable with traditional robotic tools. The talk will focus on three key improvements in (1) task specification, (2) low-level, and (3) high-level scene modeling. The central idea behind this research is to translate the commonsense knowledge embedded in foundation models into structural priors that can be integrated into robot learning systems. This approach leverages the strengths of different modules (e.g., VLM for task interpretation and constrained optimization for motion planning), achieving the best of both worlds. I will demonstrate how such integration enables robots to interpret instructions provided in free-form natural language, and how foundation models can be augmented with additional memory mechanisms, such as an action-conditioned scene graph, to handle a wide range of real-world manipulation tasks.

Toward the end of the talk, I will discuss the limitations of the current foundation models, challenges that still lie ahead, and potential avenues to address these challenges

02/25 Agent-Ready Systems - Jerry/Nikos/Peter/Eugene/Kostis, Columbia

02/27 Systems: Lineage and Data-flow policies - Eugene Wu, Columbia

WHAT WE WANT

03/04

03/06 Models: Neurosymbolic training - Baishakhi Ray, Columbia

03/11 TBA - Danielle Perszyk, Amazon

03/13 No Class! Attend the Agents for Work workshop on 3/12 (link in slack)

03/25 Models: Planning - Shipra Agrawal, Columbia

03/27 Use Case: Financial Products - Raman Jatkar, Intellect Design

04/01 HAI: Process Mining and Agents - Wil van der Aalst (the father of process mining), RWTH Aachen University

04/03 Use Case: Coding (AutoCodeRover) - Yuntong Zhang, NUS

04/08 TBA - Shankar Bhargava, WalmartLabs

04/10 Models: TBA - Shunyu Yao, OpenAI

04/15 Agents at Google - Fatma Ozcan, Google Research

04/17 Systems: Security - Weiliang Zhao, Columbia

04/22 Models: Long context LLM - Kuntai Du, UChicago

04/24 Model Context Protocol - David Sorria Parra and Elie Schoppik, Anthropic

What You Did

04/29 Presentations

05/01 Presentations