Stable Diffusion XL 1.0 model-EW帮帮网

Stable Diffusion XL 1.0 model

1. What is the Stable Diffusion XL model?
- 1.1. Number of parameters
- 1.2. Differences between SDXL and v1.5 models
2. Sample images from SDXL
3. Tips on using SDXL 1.0 model
4. Some notes about SDXL
References

https://stable-diffusion-art.com/sdxl-model/

Stable Diffusion XL (SDXL) is the latest AI image model that can generate realistic people, legible text, and diverse art styles with excellent image composition. It is a larger and better version of the celebrated Stable Diffusion v1.5 model, and hence the name SDXL.

realistic /ˌriːəˈlɪstɪk/ adj. 现实的；实际的；逼真的；实事求是的；明智的；栩栩如生的；能够实现的；恰如其分的
legible /ˈledʒəbl/ adj. 清晰的；清楚的；清晰可读的
composition /ˌkɒmpəˈzɪʃn/ n. 作文；构成；成分；构图；(音乐、艺术、诗歌的) 作品；创作；作曲；组合方式；作曲艺术
celebrate /ˈselɪbreɪt/ v. 庆祝；庆贺；赞美；颂扬；歌颂；主持宗教仪式

As described in the article “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis” by Podell and coworkers, the Stable Diffusion XL is in every way better than the v1.5 model.

The improvements are

Higher quality images
Follows the prompt more closely
More fine details
Larger image size
Ability to generate legible text
Ability to produce darker images

1. What is the Stable Diffusion XL model?

The Stable Diffusion XL (SDXL) model is the official upgrade to the v1.5 model. The model is released as open-source software.

1.1. Number of parameters

It is a much larger model. In the AI world, we can expect it to be better. The total number of parameters of the SDXL model is

3.5 billion (SDXL Base model)
6.6 billion (SDXL Base + refiner model)

This is compared with 0.98 billion for the v1.5 model.

1.2. Differences between SDXL and v1.5 models

在这里插入图片描述

The SDXL model consists of two models - The base model and the refiner model.

The SDXL model is, in practice, two models. You run the base model, followed by the refiner model. The base model sets the global composition. The refiner model adds finer details. (You can optionally run the base model alone.)

The language model (the module that understands your prompts) is a combination of the largest OpenClip model (ViT-G/14) and OpenAI’s proprietary CLIP ViT-L. This is a smart choice because Stable Diffusion v2 uses OpenClip alone and is hard to prompt. Bringing back OpenAI’s CLIP makes prompting easier. The prompts that work on v1.5 will have a good chance to work on SDXL.

The SDXL model has a new image size conditioning that aims to use training images smaller than 256 $\times$ 256. This significantly increases the training data by not discarding 39% of the images.
SDXL 模型新增了图像尺寸调节功能，旨在使用小于 256 $\times$ 256 的训练图像。这显著增加了训练数据量，避免了 39% 的图像被丢弃。

conditioning /kənˈdɪʃnɪŋ , kənˈdɪʃənɪŋ/ n. 训练；条件作用；熏陶

The U-Net, the most crucial part of the diffusion model, is now 3 times larger. Together with the larger language model, the SDXL model generates high-quality images matching the prompt closely.

The default image size of SDXL is 1024 $\times$ 1024. This is 4 times larger than v1.5 model’s 512 $\times$ 512.

2. Sample images from SDXL

在这里插入图片描述
Users overwhelmingly prefer the SDXL model over the 1.5 model.

overwhelmingly /ˌəʊvə(r)'welmɪŋli/ adv. 压倒性地

According to Stability AI’s own study, most users prefer the images from the SDXL model over the v1.5 base model. You will find a series of images generated with the same prompts from the v1.5 and SDXL models. You can decide for yourself.

3. Tips on using SDXL 1.0 model

A Stability AI’s staff has shared some tips on using the SDXL 1.0 model. Here’s the summary.

3.1. Image size

The native size is 1024 $\times$ 1024. SDXL supports different aspect ratios but the quality is sensitive to size. Here are the image sizes used in DreamStudio, Stability AI’s official image generator

1:1 - 1024 x 1024
5:4 - 1152 x 896
3:2 - 1216 x 832
16:9 - 1344 x 768
21:9 - 1536 x 640

Use the Aspect Ratio Selector extension to conveniently switch to these image sizes. Add the following lines to resolutions.txt in the extension’s folder (stable-diffusion-webui\extensions\sd-webui-ar).

Here are the recommended image sizes for different aspect ratios.

XL1:1, 1024, 1024
XL5:4, 1152, 896
XL3:2, 1216, 832
XL16:9, 1344, 768
XL21:9, 1536, 640

在这里插入图片描述
Aspect Ratio selector presets for SDXL.

3.2. Negative prompt

Negative prompts are not as necessary in the 1.5 and 2.0 models. Many common negative terms are useless, e.g. Extra fingers.
在 1.5 和 2.0 版本中，negative prompts 不再那么必要。许多常见的 negative prompts 都毫无用处。

extra /ˈekstrə/ adj. 额外的；附加的；外加的；分外的 n. (电影里的) 临时演员，群众演员；额外的事物；另外收费的事物 adv. 额外；特别；另外；格外；外加；分外

3.3. Keyword weight

You don’t need to use a high keyword weight like the v1 models. 1.5 is very high for the SDXL model. You may need to reduce the weights when you reuse the prompt from v1 models. Lowering a weight works better than increasing a weight.

3.4. Safetensor

Always use the safetensor version, not the checkpoint version. It is safer and won’t execute codes on your machine.

3.5. Refiner strength

Use a low refiner strength for the best outcome.

3.6. Refiner

Use a noisy image to get the best out of the refiner.

4. Some notes about SDXL

Make sure to use an image size of 1024 $\times$ 1024 or similar. 512 $\times$ 512 doesn’t work well with SDXL.

You normally don’t use the refiner model with a fine-tuned SDXL model. The style may not be compatible.

References

[1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/

Stable Diffusion XL 1.0 model

Stable Diffusion XL 1.0 model

1. What is the Stable Diffusion XL model?

1.1. Number of parameters

1.2. Differences between SDXL and v1.5 models

2. Sample images from SDXL

3. Tips on using SDXL 1.0 model

3.1. Image size

3.2. Negative prompt

3.3. Keyword weight

3.4. Safetensor

3.5. Refiner strength

3.6. Refiner

4. Some notes about SDXL

References

网站公告

今日签到

热门文章

最新发布