PhotoFlow: Agentic 3D Virtual Photography Missions

A Director-Reviewer-Reflector agent for language-conditioned virtual photography in controllable Blender scenes.

Jiarui Guo1,2, Haojia Wei3, Yiming Zhang4, Yifei Liu6, Yuning Gong5, Hongjie Zhang6, Xue Yang1, Zhihang Zhong1,*
1Shanghai Jiao Tong University   2Northeastern University   3University of California, Los Angeles
4Cornell University   5Sichuan University   6Shanghai AI Laboratory   *Corresponding author
PhotoFlow virtual photography teaser

Virtual Photography as Spatial-Aesthetic Search

PhotoFlow asks an agent to enter a prepared 3D scene with no preselected camera pose, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph.

47 open-license Blender scenes
141 language-conditioned missions
3 mission families
6 round search budget

Method Overview

PhotoFlow treats camera placement as finite-horizon feedback-driven search over executable camera states, including pose, look-at target, lens, aperture, and aspect ratio.

PhotoFlow method pipeline
The closed-loop Director-Reviewer-Reflector pipeline.
Director Builds a soft photographic blueprint and proposes diverse candidate camera states.
Reviewer Scores rendered previews with rule checks, visual critique, and pairwise incumbent selection.
Reflector Turns failures into region memory, dead-zone suppression, and high-explore relocation.

Public Release Plan

The repository is being organized for a staged release of the agent, benchmark metadata, evaluation scripts, and reproduction materials.

Agent Code

Director-Reviewer-Reflector implementation, prompts, JSON schemas, and run configurations.

VPhotoBench

Scene registry, task specifications, benchmark construction notes, and asset metadata.

Evaluation

External metric aggregation, baseline configs, ablation summaries, and selected logs.