PhotoFlow: Agentic 3D Virtual Photography Missions

A Director-Reviewer-Reflector agent for language-conditioned virtual photography in controllable Blender scenes.

Jiarui Guo1,2, Haojia Wei3, Yiming Zhang4, Yifei Liu6, Yuning Gong5, Hongjie Zhang6, Xue Yang1, Zhihang Zhong1,*
1Shanghai Jiao Tong University   2Northeastern University   3University of California, Los Angeles
4Cornell University   5Sichuan University   6Shanghai AI Laboratory   *Corresponding author
PhotoFlow virtual photography teaser

Virtual Photography as Spatial-Aesthetic Search

PhotoFlow asks an agent to enter a prepared 3D scene with no preselected camera pose, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph.

47 open-license Blender scenes
141 language-conditioned missions
3 mission families
6 round search budget

Method Overview

PhotoFlow treats camera placement as finite-horizon feedback-driven search over executable camera states, including pose, look-at target, lens, aperture, and aspect ratio.

PhotoFlow method pipeline
The closed-loop Director-Reviewer-Reflector pipeline.
Director Builds a soft photographic blueprint and proposes diverse candidate camera states.
Reviewer Scores rendered previews with rule checks, visual critique, and pairwise incumbent selection.
Reflector Turns failures into region memory, dead-zone suppression, and high-explore relocation.

Resources

The paper and benchmark are public. VPhotoBench is hosted on Hugging Face with the scene files, task specifications, asset metadata, licenses, and attribution tables.

VPhotoBench

Hugging Face dataset

Code

PhotoFlow agent code and evaluation scripts will be released in this repository.