Unique architecture Modules

The forward diffusion process

Our raw-feature based autoencoder uses a raw-feature initialization mechanism, which has been proven to be more effective than red-green-blue (or RGB-based) diffusion methods. This raw-feature initialization allows the model to capture more detailed and nuanced information about the input data, resulting in higher-quality generation results.

The reverse diffusion process

We use a low-rank approximation denoising method to achieve dimension reduction in the latent space for the large matrix's semantic features. This low-rank approximation method significantly reduces the computation cost compared to other methods such as stable diffusion. This allows the model to be more efficient and cost-effective.

Accuracy control for coherent visual generation

For better controllability and consistency of the generated contents (especially human characters), we provide an integrated solution that includes our efforts in diffusion models and experience in past video/3D avatar/multiview stereo/segmentation projects. Besides modifying the diffusion methods, the autoencoder, and inserting new CLIP, a variety of proprietary algorithms are coded to provide extra conditioning beyond plain images and texts. This yields results that are not only visually appealing and diverse, but coherent and consistent with real-world perception.

Last updated