Agentic AI for Visual Media

Create stunning content, generate ideas, and automate tasks with our suite of AI-powered tools designed for creators and businesses.

Location

Mile High 1EF, Denver, Colorado

Date and time

Wednesday, June 3, 2026

Admission

in conjunction with CVPR 2026

Workshop on Agentic AI for Visual Media

About Workshop

Recent advances in image and video processing, editing, and generation have transformed how visual content is created and consumed. While individual models excel at specific tasks, real-world workflows often require flexible composition, adaptive reasoning, and multi-step decision-making—capabilities that remain largely human-driven.

The rise of large language models (LLMs), multimodal LLMs, and agentic AI systems has made it possible to develop autonomous, tool-using agents that can orchestrate perception, generation, editing, and evaluation in unified pipelines. This paradigm shift opens new opportunities for creative media, industrial vision applications, and interactive systems.

The Workshop on Agentic AI for Visual Media will bring together researchers and practitioners at the intersection of computer vision, generative modeling, multimodal AI, and agent systems. Our goal is to explore cutting-edge research, practical deployments, evaluation methods, and future challenges in building next-generation vision agents.

Timeline

2026-06-03

Workshop Date. Wednesday, 3 June 2026, in conjunction with CVPR 2026 in Denver, Colorado.

2026-05

Keynote Lineup Updated. Seven keynote speakers confirmed (one afternoon slot TBA). See workshop schedule.

2026-03-31

Notification of Acceptance. Authors will be notified of paper decisions.

2026-03-20

Paper Submission Deadline. Last day to submit long papers and extended abstracts.

2026-03

Call for Papers Open. Submissions are open on OpenReview. See submission guidelines.

2026-03

Sponsorship. Our workshop received generous sponsorship from Snap Inc., Adobe, and Tencent.

2026-01

Workshop Website Launched. Official website for the Workshop on Agentic AI for Visual Media is now live.

Keynote Speakers (Tentative)

Location: Mile High 1EF

Zhengzhong Tu

Prof. Zhengzhong Tu

Assistant Professor, Texas A&M University
Ranjay Krishna

Prof. Ranjay Krishna

University of Washington
Manling Li

Prof. Manling Li

Northwestern University
Christine Hu

Christine Hu

Co-founder & CEO, Philo Labs
Jack Parker-Holder

Dr. Jack Parker-Holder

Google DeepMind
Yeying Jin

Dr. Yeying Jin

Tencent
Xihui Liu

Prof. Xihui Liu

University of Hong Kong

Workshop Schedule

Full-day program on Wednesday, June 3, 2026 at CVPR 2026 with seven keynote talks and afternoon sessions.

Workshop Day — Wednesday, June 3, 2026

Morning

8:50 AM - 9:00 AM

Opening

Welcome remarks and overview of the workshop program.

9:00 AM - 9:40 AM

From Pixels to Systems: Agentic Visual Intelligence for Real-World Computer Vision

Keynote talk.

Speaker
Zhengzhong Tu
Prof. Zhengzhong Tu
Assistant Professor, Texas A&M University
9:40 AM - 10:20 AM

From Vision to Action: Extracting Structure and Agency from Flat Pixels

Keynote talk.

Speaker
Ranjay Krishna
Prof. Ranjay Krishna
University of Washington
10:20 AM - 10:50 AM

Coffee Break

Refreshments and networking.

10:50 AM - 11:30 AM

Interactive and Multimodal Visual Generation towards World Models

Keynote talk.

Speaker
Xihui Liu
Prof. Xihui Liu
University of Hong Kong
11:30 AM - 12:10 PM

Agent and World, in One Model

Most of the field treats agent and world model research as two programs. We don't. A world model is an agent whose policy you called dynamics. An agent is a world model you queried for actions. The cut between them is something you draw.

This talk covers:

  • How we're improving VLMs' agentic capabilities in video AI.
  • How we're improving video gen models as environments.
  • What happens at the seam, and why we think this is also the right framing for safety.
Speaker
Christine Hu
Christine Hu
Co-founder & CEO, Philo Labs

Afternoon

12:10 PM - 1:45 PM

Lunch

Lunch break on your own or with fellow attendees.

1:45 PM - 2:25 PM

Keynote

Talk details to be announced.

Speaker
Jack Parker-Holder
Dr. Jack Parker-Holder
Google DeepMind
2:25 PM - 3:05 PM

Game World Model

Keynote talk.

Speaker
Yeying Jin
Dr. Yeying Jin
Tencent
3:05 PM - 3:45 PM

Failure Modes of VLM Agents: A Reinforcement Learning Perspective

The failures of a reinforcement-learned agent are not random bugs to be patched away; they are systematic signatures of how the agent was trained. This talk reads recent vision-language agents through their failure modes, organized along three axes. The first is reasoning collapse, where optimizing for reward erodes the very multi-step reasoning that made the agent capable. The second is multi-turn instability, where reinforcing world-model reasoning across turns helps yet stays fragile, and where action unfolds over many steps, whether exploring a scene or generating an image stroke by stroke. The third is unsafe planning, where language-driven control introduces systematic safety risks in the physical world. I will close with embodied priors and online reinforcement learning as one route toward agents that fail less by design.

Speaker
Manling Li
Prof. Manling Li
Northwestern University
3:45 PM - 4:15 PM

Coffee Break

Refreshments and networking.

4:15 PM - 5:40 PM

Poster Session

Poster viewing for accepted workshop papers.

5:40 PM - 5:50 PM

Closing

Closing remarks and thank you.

Organizers

Dr. Jinjin Gu

img

Dr. Lei Sun

img

Zhendong Li

img

Dr. Zhenfei Yin

img

Prof. Anyi Rao

img

Dr. Yeying Jin

img

Dr. Jing Shao

img

Dr. Enze Xie

img

Dr. He Zhang

img

Dr. Jian Wang

img

Dr. Danda Pani Paudel

img

Prof. Philip Torr

img

Prof. Luc Van Gool

img

Sponsors

Contact

Get in touch for any questions about the workshop.

avatar

Jinjin Gu

INSAIT
avatar

Lei Sun

INSAIT