ComfyUI HiDiffusion

Diffusion models have become a prevalent approach to generating high-resolution images. However, direct creation of high-resolution images from pre-trained diffusion models involves significant object replication and a considerable increase in generation time.

To address these issues, the proposing team presents an innovative high-resolution framework called HiDiffusion, which does not require tuning. HiDiffusion specifically includes Resolution-Aware U-Net (RAU-Net), which dynamically adjusts the size of the feature map to solve the object replication problem, and makes use of Modified Shifted Window Multi-head Self-Attention (MSW-MSA), which exploits optimized window attention to reduce computational load.

HiDiffusion can be incorporated into various pre-trained diffusion models to increase the resolutions of the generated images up to 4096×4096, with an increase in inference speed of 1.5- to 6-fold over previous methods. Experiments show that this approach successfully copes with object replication and heavy computation, reaching the state of the art in high-resolution image synthesis tasks.

This set of 4 custom nodes enables the use of HiDiffusion within the ComfyUI workspace. The settings are minimized to allow experimental use of the results, with the intention of developing more specific nodes to be integrated into more complex and comprehensive workflows in the future.

Some example images generated with HiDiffusion SDXL

Other projects