Editing as Reasoning — Amaze-Bench Leaderboard

Editing-as-Reasoning (EAR) turns visual planning from step-by-step generation into a single-step image transformation.

HuggingFace Dataset Github Code Paper (Coming Soon)

Violation & Coverage

Violation (↓): Percentage of predicted path cells that fall in non-GT cells (%).

Coverage (↑): Percentage of predicted path cells that fall in GT cells (%).

MSE In & MSE Out

MSE In (↓): MSE in gt path region.

MSE Out (↓): MSE in non-gt path region.

Pass@1 & Pass@5

Pass@1 (↑): Percentage of samples that generated a valid path in one generation.

Pass@5 (↑): Percentage of samples that generated a valid path at least five times.

BenchmarkAmaze

Tip: Switch between Maze / Queen tasks, then click table headers to sort within each group.