Test-time Scaling of Diffusions with Flow Maps

Method	Mean	Single Obj.	Two Obj.	Counting	Colors	Position	Attr. Binding
FLUX.1 [dev]	0.65	0.99	0.78	0.70	0.78	0.18	0.45
FLUX.1 [dev] + Best-of-32	0.75	0.99	0.94	0.83	0.86	0.26	0.57
FMTT (1-step denoiser look-ahead)	0.75	0.99	0.90	0.87	0.87	0.26	0.59
FMTT (4-step diffusion look-ahead)	0.75	0.99	0.93	0.86	0.89	0.27	0.57
FMTT (4-step flow map look-ahead)	0.79	1.0	0.97	0.90	0.91	0.30	0.64

Masked Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Prompt: "A tiny desert landscape with sand dunes, a crescent moon above, and a lone camel silhouette, all inside a transparent glass orb located in the top right of the image on a black background"

Note how the glass orb is not located within the masked region (top right) in the original FLUX.1 [dev] generations.

Masked Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Prompt: "A miniature forest with tall pine trees, a glowing campfire, and fireflies drifting in the night sky, all inside a keyhole on a black background"

Note how the scene is not located within the masked region (keyhole) in the original FLUX.1 [dev] generations.

Masked Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Prompt: "A tiny ocean wave curling with a sailboat riding the crest, under a glowing sunset sky, all inside an infinity loop symbol on a black background"

Note how the scene is not located within the masked region (infinity symbol) in the original FLUX.1 [dev] generations.

Symmetric Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Prompt: "A symmetrical image of a miniature cosmic scene with planets orbiting, sealed inside a glass teardrop pendant on a white background"

Note how the image is not quite horizontally symmetric in the original FLUX.1 [dev] generations.

Anti-Symmetric Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Prompt: "An anti-symmetric black and white cat split down the middle, black on the left and white on the right"

Note how the image is not horizontally anti-symmetric in the original FLUX.1 [dev] generations (both eyes being the same color, pure white background).

Rotation Invariant Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Prompt: "A clean and minimal logo of two koi fish circling each other"

Note how the image is not invariant with respect to a 180° rotation in the original FLUX.1 [dev] generations (koi fish having different colors, or being symmetrically mirrored instead of rotated).

Test-time Scaling of Diffusions with Flow Maps

Introduction

Sampling from the Reward-Tilted Distribution

Text-to-Image Generation Results

Human Preference Rewards

Geometric Rewards

Masked Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Masked Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Masked Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Symmetric Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Anti-Symmetric Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

Rotation Invariant Generation

FLUX.1 [dev]

FLUX + FMTT (Ours)

VLM-Based Rewards

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

FLUX + FMTT (Ours)

FLUX.1 [dev]

Reference Image

FLUX + FMTT

Citation