Editing-as-Reasoning (EAR) turns visual planning from step-by-step generation into a single-step image transformation.
Violation (↓): Percentage of predicted path cells that fall in non-GT cells (%).
Coverage (↑): Percentage of predicted path cells that fall in GT cells (%).
MSE In (↓): MSE in gt path region.
MSE Out (↓): MSE in non-gt path region.
Pass@1 (↑): Percentage of samples that generated a valid path in one generation.
Pass@5 (↑): Percentage of samples that generated a valid path at least five times.
| Model | Violation ↓ | Coverage ↑ | MSE In ↓ | MSE Out ↓ | Pass@1 ↑ | Pass@5 ↑ |
|---|---|---|---|---|---|---|
| closed-source models | ||||||
|
Closed
GPT-image-1
closed-source
|
62.88 | 58.97 | 41.16 | 52.76 |
5.40
|
6.06
|
|
Closed
NanoBanana-Pro
closed-source
|
47.76 | 64.21 | 24.20 | 17.21 |
4.82
|
9.28
|
|
Closed
Seedream-4.5
closed-source
|
16.90 | 25.67 | 28.82 | 30.96 |
2.14
|
3.21
|
| open-source models | ||||||
|
Open
Flux-Kontext-Dev
open-source
|
23.84 | 30.24 | 30.96 | 18.31 |
0.36
|
3.57
|
|
Open
Qwen-Image-Edit
open-source
|
19.37 | 28.51 | 18.82 | 5.70 |
1.43
|
2.14
|
|
Open
Bagel
open-source (base)
|
28.91 | 27.15 | 11.64 | 5.84 |
0.00
|
1.00
|
|
Open
Janus-Pro
open-source (base)
|
5.41 | 1.85 | 57.47 | 76.80 |
0.00
|
0.00
|
| fine-tuned models | ||||||
|
FT
Bagel (fine-tuned)
SFT on 3×3
|
12.21 | 51.02 | 8.66 | 3.07 |
11.54
|
23.64
|
|
FT
Janus-Pro (fine-tuned)
SFT on 3×3
|
35.60 | 23.33 | 55.99 | 50.94 |
1.43
|
2.22
|