VLLM x Ascend框架

发布于:2025-03-30 ⋅ 阅读:(36) ⋅ 点赞:(0)

参考:Installation — vllm-ascend

1、环境依赖

cann

>= 8.0.0

vllm-ascend 和 torch-npu 所需

torch-npu

>= 2.5.1.dev20250308

vllm-ascend 必需

torch

>= 2.5.1

torch-npu 和 vllm 所需

2、vllm 和 vllm- ascend设置

您可以从预先构建的轮子vllm进行安装(尚未发布,请从源代码构建):vllm-ascend

# Install vllm-project/vllm from pypi
pip install vllm==0.7.3

# Install vllm-project/vllm-ascend from pypi.
pip install vllm-ascend==0.7.3rc1 --extra-index https://download.pytorch.org/whl/cpu/

备注:vllm安装失败,报ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects
vllm-ascend安装成功

当前版本依赖于未发布的版本torch-npu,您需要手动安装:

# Once the packages are installed, you need to install `torch-npu` manually,
# because that vllm-ascend relies on an unreleased version of torch-npu.
# This step will be removed in the next vllm-ascend release.
# 
# Here we take python 3.10 on aarch64 as an example. Feel free to install the correct version for your environment. See:
#
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py39.tar.gz
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py310.tar.gz
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py311.tar.gz
#
mkdir pta
cd pta
wget https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250308.3/pytorch_v2.5.1_py310.tar.gz
tar -xvf pytorch_v2.5.1_py310.tar.gz
pip install ./torch_npu-2.5.1.dev20250308-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

3、验证

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")