Unique architecture Modules
Last updated
Last updated
Our raw-feature based autoencoder uses a raw-feature initialization mechanism, which has been proven to be more effective than red-green-blue (or RGB-based) diffusion methods. This raw-feature initialization allows the model to capture more detailed and nuanced information about the input data, resulting in higher-quality generation results.
We use a low-rank approximation denoising method to achieve dimension reduction in the latent space for the large matrix's semantic features. This low-rank approximation method significantly reduces the computation cost compared to other methods such as stable diffusion. This allows the model to be more efficient and cost-effective.
For better controllability and consistency of the generated contents (especially human characters), we provide an integrated solution that includes our efforts in diffusion models and experience in past video/3D avatar/multiview stereo/segmentation projects. Besides modifying the diffusion methods, the autoencoder, and inserting new CLIP, a variety of proprietary algorithms are coded to provide extra conditioning beyond plain images and texts. This yields results that are not only visually appealing and diverse, but coherent and consistent with real-world perception.