Diffusion-MPC for Tetris: A Case Study in Discrete Planning Challenges
A new research paper introduces Diffusion-MPC (Model Predictive Control) for discrete combinatorial domains, using the classic game of Tetris as a rigorous testbed. The study, published on arXiv (2603.02348v1), investigates a planner that generates candidate action sequences with a MaskGIT-style discrete denoiser and selects the best actions through reranking. The findings reveal critical structural challenges for diffusion-based planners in discrete environments, particularly around action feasibility and critic alignment, offering practical diagnostics for future development.
Core Methodology: Sampling and Reranking
The Diffusion-MPC framework operates by first sampling potential sequences of Tetris piece placements. It employs a MaskGIT-style denoising model to generate these candidates. The planner then applies a critical reranking step to select the optimal action from the sampled set. The research meticulously analyzes three pivotal factors in this pipeline that determine overall performance and reliability.
Key Factor 1: The Necessity of Feasibility Masking
In discrete domains like Tetris, a significant portion of sampled actions can be invalid. The study implemented feasibility-constrained sampling via logit masking, which restricts the denoiser to only propose placements that are physically possible on the board. This approach removed a substantial 46% of invalid action mass from the sampling distribution. The impact was direct and significant, yielding a 6.8% improvement in game score and a 5.6% improvement in survival time compared to unconstrained sampling, establishing masking as a non-negotiable component for effective discrete planning.
Key Factor 2: The Pitfalls of Critic Misalignment
The research evaluated several reranking strategies to choose the best action sequence: a simple heuristic score, a pretrained Deep Q-Network (DQN) critic, and a hybrid of the two. A central finding was that the naive use of the DQN critic for reranking was systematically misaligned with actual rollout quality. This misranking led to high decision regret, with a mean regret of 17.6 and a 90th percentile (p90) as high as 36.6. This indicates that a critic's value estimates, even if pretrained, may not reliably rank the quality of *imagined* sequences from a different generative model, creating a major integration challenge.
Key Factor 3: Compute Scaling and Horizon Effects
The analysis of compute scaling examined the trade-offs between the number of candidates sampled (K) and the planning horizon length (H). The study found that shorter planning horizons outperformed longer ones in Tetris's sparse and delayed reward setting. This suggests that uncertainty compounds in long imagined rollouts, reducing their usefulness. The compute budget dictates the dominant failure mode: a small K limits the quality of the best candidate found, while a larger H amplifies issues of model mismatch and critic misranking.
Why This Matters: Implications for AI Planning
This case study extends the understanding of diffusion models beyond continuous domains like images into structured, discrete decision-making. The insights are crucial for advancing model-based reinforcement learning and generative planning.
- Feasibility is Foundational: In discrete action spaces, constraining the generative model to valid actions is not just an optimization but a requirement for functional performance.
- Critic Integration is Non-Trivial: Simply using a pretrained value function to score generated plans can lead to significant misalignment and poor decisions, highlighting a key research problem.
- Planning Horizon is Context-Dependent: Longer rollouts are not inherently better; in environments with compounding uncertainty, shorter, more certain horizons can be more effective.
- Provides Practical Diagnostics: The paper offers a clear framework—analyzing failure modes through the lens of K and H—for engineers to debug and improve their own diffusion-based planning systems.
Overall, the research underscores that successfully deploying diffusion planners in combinatorial settings requires careful co-design of the generative model, the feasibility constraints, and the evaluation critic, moving beyond straightforward adaptations from continuous domains.