1、环境依赖
cann |
>= 8.0.0 |
vllm-ascend 和 torch-npu 所需 |
torch-npu |
>= 2.5.1.dev20250308 |
vllm-ascend 必需 |
torch |
>= 2.5.1 |
torch-npu 和 vllm 所需 |
2、vllm 和 vllm- ascend设置
您可以从预先构建的轮子vllm
进行安装(尚未发布,请从源代码构建):vllm-ascend
# Install vllm-project/vllm from pypi
pip install vllm==0.7.3
# Install vllm-project/vllm-ascend from pypi.
pip install vllm-ascend==0.7.3rc1 --extra-index https://download.pytorch.org/whl/cpu/
备注:vllm安装失败,报ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects
vllm-ascend安装成功
当前版本依赖于未发布的版本torch-npu
,您需要手动安装:
# Once the packages are installed, you need to install `torch-npu` manually,
# because that vllm-ascend relies on an unreleased version of torch-npu.
# This step will be removed in the next vllm-ascend release.
#
# Here we take python 3.10 on aarch64 as an example. Feel free to install the correct version for your environment. See:
#
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py39.tar.gz
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py310.tar.gz
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py311.tar.gz
#
mkdir pta
cd pta
wget https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py310.tar.gz
tar -xvf pytorch_v2.5.1_py310.tar.gz
pip install ./torch_npu-2.5.1.dev20250308-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
3、验证
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")