Stable Diffusion XL 1.0 model
https://stable-diffusion-art.com/sdxl-model/
Stable Diffusion XL (SDXL) is the latest AI image model that can generate realistic people, legible text, and diverse art styles with excellent image composition. It is a larger and better version of the celebrated Stable Diffusion v1.5 model, and hence the name SDXL.
realistic /ˌriːəˈlɪstɪk/ adj. 现实的;实际的;逼真的;实事求是的;明智的;栩栩如生的;能够实现的;恰如其分的
legible /ˈledʒəbl/ adj. 清晰的;清楚的;清晰可读的
composition /ˌkɒmpəˈzɪʃn/ n. 作文;构成;成分;构图;(音乐、艺术、诗歌的) 作品;创作;作曲;组合方式;作曲艺术
celebrate /ˈselɪbreɪt/ v. 庆祝;庆贺;赞美;颂扬;歌颂;主持宗教仪式
As described in the article “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis” by Podell and coworkers, the Stable Diffusion XL is in every way better than the v1.5 model.
The improvements are
- Higher quality images
- Follows the prompt more closely
- More fine details
- Larger image size
- Ability to generate legible text
- Ability to produce darker images
1. What is the Stable Diffusion XL model?
The Stable Diffusion XL (SDXL) model is the official upgrade to the v1.5 model. The model is released as open-source software.
1.1. Number of parameters
It is a much larger model. In the AI world, we can expect it to be better. The total number of parameters of the SDXL model is
- 3.5 billion (SDXL Base model)
- 6.6 billion (SDXL Base + refiner model)
This is compared with 0.98 billion for the v1.5 model.
1.2. Differences between SDXL and v1.5 models
The SDXL model consists of two models - The base model and the refiner model.
The SDXL model is, in practice, two models. You run the base model, followed by the refiner model. The base model sets the global composition. The refiner model adds finer details. (You can optionally run the base model alone.)
The language model (the module that understands your prompts) is a combination of the largest OpenClip model (ViT-G/14) and OpenAI’s proprietary CLIP ViT-L. This is a smart choice because Stable Diffusion v2 uses OpenClip alone and is hard to prompt. Bringing back OpenAI’s CLIP makes prompting easier. The prompts that work on v1.5 will have a good chance to work on SDXL.
The SDXL model has a new image size conditioning that aims to use training images smaller than 256 × \times × 256. This significantly increases the training data by not discarding 39% of the images.
SDXL 模型新增了图像尺寸调节功能,旨在使用小于 256 × \times × 256 的训练图像。这显著增加了训练数据量,避免了 39% 的图像被丢弃。
conditioning /kənˈdɪʃnɪŋ , kənˈdɪʃənɪŋ/ n. 训练;条件作用;熏陶
The U-Net, the most crucial part of the diffusion model, is now 3 times larger. Together with the larger language model, the SDXL model generates high-quality images matching the prompt closely.
The default image size of SDXL is 1024 × \times × 1024. This is 4 times larger than v1.5 model’s 512 × \times × 512.
2. Sample images from SDXL
Users overwhelmingly prefer the SDXL model over the 1.5 model.
overwhelmingly /ˌəʊvə(r)'welmɪŋli/ adv. 压倒性地
According to Stability AI’s own study, most users prefer the images from the SDXL model over the v1.5 base model. You will find a series of images generated with the same prompts from the v1.5 and SDXL models. You can decide for yourself.
3. Tips on using SDXL 1.0 model
A Stability AI’s staff has shared some tips on using the SDXL 1.0 model. Here’s the summary.
3.1. Image size
The native size is 1024 × \times × 1024. SDXL supports different aspect ratios but the quality is sensitive to size. Here are the image sizes used in DreamStudio, Stability AI’s official image generator
- 1:1 - 1024 x 1024
- 5:4 - 1152 x 896
- 3:2 - 1216 x 832
- 16:9 - 1344 x 768
- 21:9 - 1536 x 640
Use the Aspect Ratio Selector extension to conveniently switch to these image sizes. Add the following lines to resolutions.txt
in the extension’s folder (stable-diffusion-webui\extensions\sd-webui-ar
).
Here are the recommended image sizes for different aspect ratios.
XL1:1, 1024, 1024
XL5:4, 1152, 896
XL3:2, 1216, 832
XL16:9, 1344, 768
XL21:9, 1536, 640
Aspect Ratio selector presets for SDXL.
3.2. Negative prompt
Negative prompts are not as necessary in the 1.5 and 2.0 models. Many common negative terms are useless, e.g. Extra fingers.
在 1.5 和 2.0 版本中,negative prompts 不再那么必要。许多常见的 negative prompts 都毫无用处。
extra /ˈekstrə/ adj. 额外的;附加的;外加的;分外的 n. (电影里的) 临时演员,群众演员;额外的事物;另外收费的事物 adv. 额外;特别;另外;格外;外加;分外
3.3. Keyword weight
You don’t need to use a high keyword weight like the v1 models. 1.5 is very high for the SDXL model. You may need to reduce the weights when you reuse the prompt from v1 models. Lowering a weight works better than increasing a weight.
3.4. Safetensor
Always use the safetensor version, not the checkpoint version. It is safer and won’t execute codes on your machine.
3.5. Refiner strength
Use a low refiner strength for the best outcome.
3.6. Refiner
Use a noisy image to get the best out of the refiner.
4. Some notes about SDXL
Make sure to use an image size of 1024 × \times × 1024 or similar. 512 × \times × 512 doesn’t work well with SDXL.
You normally don’t use the refiner model with a fine-tuned SDXL model. The style may not be compatible.
References
[1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/