Fine-Tuning Diffusion Models via Intermediate Distribution Shaping

A new unified mathematical framework for fine-tuning pre-trained diffusion and flow models demonstrates significant improvements in generative AI tasks. The research introduces GRAFT and P-GRAFT methods that shape intermediate distributions, achieving an 8.81% relative improvement in text-to-image generation with Stable Diffusion v2. The framework also corrects learning errors in flow models through inverse noise correction, enhancing output quality while reducing computational costs.

Fine-Tuning Diffusion Models via Intermediate Distribution Shaping

New AI Fine-Tuning Framework Enhances Diffusion and Flow Models

A new research paper introduces a unified mathematical framework for fine-tuning pre-trained diffusion and flow models, demonstrating significant improvements in text-to-image generation, layout design, and molecule synthesis. The work, detailed in the preprint arXiv:2510.02692v3, refines existing fine-tuning methods and proposes novel algorithms to correct learning errors and shape distributions at intermediate noise levels, leading to higher-quality outputs with greater efficiency.

Unifying and Advancing Fine-Tuning with GRAFT and P-GRAFT

The research first unifies existing variants of Rejection sAmpling based Fine-Tuning (RAFT) under a single framework termed GRAFT. The analysis reveals that these methods implicitly perform KL-regularized reward maximization, a process that balances improving output quality against deviating too far from the original model. Building on this insight, the authors introduce P-GRAFT, a novel approach designed to explicitly shape the probability distributions at intermediate stages of the diffusion process.

This targeted shaping at specific noise levels is shown to be a more effective fine-tuning strategy. The paper explains this efficacy through a bias-variance tradeoff: by intervening during generation, P-GRAFT can reduce errors (bias) without excessively increasing instability (variance), leading to more controlled and higher-quality model outputs.

Correcting Errors in Flow Models with Inverse Noise Correction

Leveraging the same mathematical principles, the framework extends to correcting learning errors in pre-trained flow models. The authors propose a novel algorithm named inverse noise correction, which improves model quality without requiring an explicit reward function. This method is particularly valuable for refining models where defining a precise reward metric is challenging, allowing for enhancement based on the model's own internal structure and the data distribution.

Empirical Results Show Significant Performance Gains

The proposed methods were rigorously evaluated across multiple generative tasks. In text-to-image (T2I) generation, applying the framework to Stable Diffusion v2 resulted in an 8.81% relative improvement over the base model and outperformed standard policy gradient methods on established benchmarks as measured by VQAScore, a metric for visual quality assessment.

For unconditional image generation, the inverse noise correction algorithm improved the Fréchet Inception Distance (FID)—a measure of image realism—while requiring lower computational cost (FLOPs) per generated image. Successful applications were also demonstrated in specialized domains like layout generation and molecule generation, underscoring the framework's versatility.

Why This Matters for AI Development

  • Enhances Model Utility: Provides more effective tools for aligning powerful generative models like Stable Diffusion with specific downstream applications or correcting inherent flaws.
  • Improves Efficiency: The inverse noise correction method boosts output quality (better FID scores) while reducing computational overhead, making advanced AI more accessible.
  • Offers a Unified Theory: The GRAFT framework creates a cohesive mathematical understanding of fine-tuning, guiding future research and development in generative AI.
  • Broad Applicability: Proven success across diverse fields—from creative arts to scientific discovery—highlights the framework's potential as a foundational tool for the next generation of AI models.

常见问题