FlowOpt: Fast Optimization Through Whole Flow Processes for Training-Free Editing

Technion – Israel Institute of Technology

Abstract

The remarkable success of diffusion and flow-matching models has ignited a surge of works on adapting them at test time for controlled generation tasks. Examples range from image editing to restoration, compression and personalization. However, due to the iterative nature of the sampling process in those models, it is computationally impractical to use gradient-based optimization to directly control the image generated at the end of the process. As a result, existing methods typically resort to manipulating each timestep separately. In this work we introduce FlowOpt - a zero-order (gradient-free) optimization framework that treats the entire diffusion/flow process as a black box, enabling optimization through the whole process without backpropagation through the model. Our method is both highly efficient and allows users to monitor the intermediate optimization results and perform early stopping if desired. We prove a sufficient condition on FlowOpt's step-size, under which convergence to the global optimum is guaranteed. We further show how to empirically estimate this upper bound so as to choose an appropriate step-size. We demonstrate the effectiveness of FlowOpt in the context of image editing, showcasing two use cases: (i) inversion (determining the initial noise that generates a given image), and (ii) directly steering the edited image to be similar to the source image while conforming to the target text prompt. In both settings, our method achieves state-of-the-art results while using roughly the same number of neural function evaluations (NFEs) as existing methods.

Overview

FlowOpt is a zero-order (gradient-free) optimization framework that allows optimizing through whole flow processes for the purpose of performing editing with pre-trained flow models. FlowOpt treats the flow process as a black box function \(f\), as illustrated in the figure above. This function accepts an initial noise map \(\boldsymbol{z}_{1}\) and optionally a text prompt \(c\) and generates the image \(\boldsymbol{z}_{0}\) at the end of the flow path as \(\boldsymbol{z}_{0} = f(\boldsymbol{z}_{1}, c)\).
For any given source image \(\boldsymbol{y}\) one wishes to edit, FlowOpt can be used to solve the optimization problem

\[ \boldsymbol{z}_1^* = \argmin_{\boldsymbol{z}_1} \; \frac{1}{2} \left\lVert f(\boldsymbol{z}_1, c) - \boldsymbol{y} \right\rVert^2 \]

without using the gradients of \(f\). This optimization problem can be used for both inversion (recovering the initial noise \(\boldsymbol{z}_{1}\) that causes the flow process to generate the image \(\boldsymbol{y}\) when the text \(c\) describes that image) and direct editing (generating a modified image that is as similar as possible to \(\boldsymbol{y}\) but which conforms to a text prompt \(c\) describing a desired edit). A key advantage of FlowOpt is that, unlike most inversion methods, it can work with a small number of diffusion timesteps. Combined with the fact that only a small number of optimization iterations are required for minimizing the loss, this translates to a total number of NFEs that is comparable to other methods.
The following figures demonstrate the reconstruction quality of FlowOpt for the inversion task for both FLUX and Stable Diffusion 3 (SD3), compared to other methods as a function of the number of NFEs, where the RMSE is an average over a dataset:

FLUX

SD3

See our paper for more details.


Inversion

Intermediate samples attained during our zero-order optimization. As the iterations progress, the reconstruction converges to the ground truth image.


Real Image Editing

A man jumping in the air

A robot jumping in the air

A man meditating

A man in Anime style meditating

Two penguins

Two penguins made out of lego bricks

Two golden retriever puppies

Two golden retriever puppies made out of crochet

A pig is standing in a grassy field

A bronze sculpture of a pig standing in a grassy field

A coconut shell filled with splashing water

A cup filled with splashing water

A cat sitting on a fridge

A cat made out of origami sitting on a fridge

A cow laying in a grassy field

A bear laying in a grassy field

A vase filled with a bouquet of pink, red and white flowers

A vase filled with a bouquet of blue, purple and white flowers

A cow standing in a grassy field

A cow made out of wooden blocks standing in a grassy field

A woman sitting on a rocky hill

A comics pane of a woman sitting on a rocky hill

A wolf in the snow

A sculpture of a wolf in the snow



Comparisons

FLUX

A white horse running through a grassy field

A zebra running through a grassy field

A large brown bear walking through a stream of water

A large brown bear in Studio Ghibli style walking through a stream of water

A man sitting on the ground

A golden sculpture of Buddha sitting on the ground

A river flowing through a valley, surrounded by a forest

A comics pane of a river flowing through a valley, surrounded by a forest

A small kitten sitting in a grassy field

A small kitten made out of lego bricks is sitting in a grassy field

A large orange lizard sitting on a rock near the ocean

A large dragon sitting on a rock near the ocean

Two golden retriever puppies sitting in a grassy field

Two husky puppies sitting in a grassy field

A small brown and white rabbit sitting in a grassy field

A small brown and white rabbit in Studio Ghibli style sitting in a grassy field

Stable Diffusion 3

A large green hill

A large egyptian pyramid

A young boy running through a grassy field

A sculpture of a young boy running through a grassy field

A man jumping in the air

A man in Pixar style jumping in the air

A white dog sitting on the grass, and a cat laying on the car

A gray wolf sitting on the grass, and a tiger cub laying on the car

A gray cat sitting on a black cloth, wearing a crown

A ginger cat sitting on a black cloth, wearing a crown

A stack of rocks, on a brown and gray beach

A stack of colorful wooden blocks, on a brown and gray beach


Bibtex

@article{TBD }

Acknowledgements

This webpage was originally made by Matan Kleiner with the help of Hila Manor. The code for the original template can be found here.
Icons are taken from font awesome or from Academicons.