onnxruntime配置开启ACL加速Arm上的模型推理-EW帮帮网

介绍

onnxruntime官网：Execution Providers | onnxruntime

在ONNX Runtime中，Execution Provider（EP，执行提供程序）是一个模块化的后端引擎，负责将ONNX模型的计算任务适配到具体的硬件或软件加速平台上执行。它的核心作用是桥接ONNX模型的通用计算图与底层硬件/优化库，从而实现高性能推理。通过指定不同的Execution Provider，ONNX Runtime将模型计算委托给不同的硬件加速器或优化库，以在硬件平台上最佳地执行ONNX模型，例如：

CPU (默认提供)
CUDA (NVIDIA GPU)
DirectML (Windows 上的 DirectX 12)
TensorRT (NVIDIA 的优化推理引擎)
OpenVINO (Intel 硬件加速)
CoreML (Apple 设备)
等等

其中，Arm Compute Library （ACL）库提供了Arm平台的快速计算模式，可以提高模型的性能。在onnxruntime中对应ACLExcutionProvider，默认为关闭状态。本教程介绍如何通过设置并编译onnxrunitme源码，开启ACL。

官网教程：Arm - ACL | onnxruntime

测试平台：Ubuntu 20.04.2（虚拟机 x86）、Debian 10.2.1-6 （RK3568 Arm64）

一、下载Onnxruntime

历史版本：https://github.com/microsoft/onnxruntime/releases

本文下载的是onnxruntime-1.18.1版本。

解压后查看源码CMakeLists.txt中支持的ACL版本，发现其支持的版本为1902、1908、2002、2308。


# ACL
if (onnxruntime_USE_ACL OR onnxruntime_USE_ACL_1902 OR onnxruntime_USE_ACL_1905 OR onnxruntime_USE_ACL_1908 OR onnxruntime_USE_ACL_2002 OR onnxruntime_USE_ACL_2308)
  set(onnxruntime_USE_ACL ON)
  if (onnxruntime_USE_ACL_1902)
    add_definitions(-DACL_1902=1)
  else()
    if (onnxruntime_USE_ACL_1908)
      add_definitions(-DACL_1908=1)
    else()
      if (onnxruntime_USE_ACL_2002)
        add_definitions(-DACL_2002=1)
      else()
	if (onnxruntime_USE_ACL_2308)
	  add_definitions(-DACL_2308=1)
	else()
          add_definitions(-DACL_1905=1)
	endif()
      endif()
    endif()
  endif()

二、编译Arm Compute Library （ACL）库

1. 到GitHub下载ACL

根据前面查看的对应版本ACL，下载其中一个19.02、19.08、20.02、23.08。

历史版本：https://github.com/ARM-software/ComputeLibrary/releases

2. 直接下载编译好的库，可以跳过步骤3。

本文下载历史版本里提供的arm_compute-v23.08-bin-linux-arm64-v8a-neon-cl.tar.gz，已经集成好了neon和cl。

3. 下载源码编译ACL

如果是用下载编译好的ACL则跳过。

如果是本地编译：

a.安装编译器gcc/g++和编译工具scon。

# 大多数Linux发行版都提供了GCC的预编译包，查看是否存在
sudo gcc --version
sudo g++ --version

# 不存在可以用apt下载，build-essential包含了GCC编译器、C++编译器以及其他开发工具。
sudo apt update
sudo apt install build-essential

# 安装scons
sudo apt-get install scons

b.进入文件夹，使用scons执行编译。

# 进入源码文件夹
cd ComputeLibrary

# 用scons进行编译，设置好目标平台(如armv8a)等参数
scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 os=linux arch=armv8a -j2

如果是交叉编译（这里编译平台架构是x86，目标平台架构是arm64）：

a. 安装交叉编译工具

# 根据目标架构，安装对应的工具链。
# 常见的工具链包名格式为 gcc-<架构>-linux-gnu 或 g++-<架构>-linux-gnu
# ARM 64位（AArch64）：
sudo apt install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu

# 如果是 ARM 32位（如 Raspberry Pi）：
sudo apt install gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf


# 特定版本请到官网自行下载
# https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads
# 下载好后解压，并将bin加入环境变量
sudo tar -xvf gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz -C /opt
ls /opt
# 设置临时变量
export PATH="$PATH:/opt/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/lib"
# 设置全局变量
sudo vim /etc/profile
export PATH=$PATH:/opt/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/lib
source /etc/profile

# 安装好后查看版本
aarch64-none-linux-gnu-gcc --version
aarch64-none-linux-gnu-g++ --version

b. 进入文件夹，执行scons进行交叉编译。

# 进入源码文件夹
cd ComputeLibrary

# scons编译
scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 \
os=linux arch=armv8a build=cross_compile \
toolchain_prefix=aarch64-linux-gnu- compiler_prefix=aarch64-linux-gnu- \
extra_link_flags="-static-libstdc++ -static-libgcc" \
-j4

注意：不同的ACL版本需要不同版本的编译器（gcc/g++、aarch64-none-linux-gnu-gcc/aarch64-none-linux-gnu-++），请根据需要选择ACL版本，或者升级或降级编译器。

二、编译onnxruntime

1. 查看编译可选配置参数，打开tools\ci_build\build.py搜索配置相关参数。

比如搜索ACL，找到use_acl参数，该参数控制是否打开ACL以及指定ACL版本。

其他参数同理。

2. 在命令行运行build.sh带上参数，进行编译。

a. 如果是本地编译：

./build.sh \
--allow_running_as_root \
--use_acl "ACL_2308" \
--acl_home "/home/alientek/arm/ComputeLibrary-23.08/" \
--acl_libs "/home/alientek/arm/ComputeLibrary-23.08/build/" \
--config Release \
--build_shared_lib \
--skip_submodule_sync \
--skip_tests \
--parallel 2

b. 如果是交叉编译：

交叉编译需要编写交叉编译配置文件.cmake。

我这里直接使用onnxruntime提供的onnxruntime-1.18.1/cmake/linux_arm64_crosscompile_toolchain.cmake

然后在build.sh 带的参数中加上这个文件配置。

./build.sh \
--allow_running_as_root \
--use_acl "ACL_2308" \
--acl_home "/home/alientek/arm/ComputeLibrary-23.08/" \
--acl_libs "/home/alientek/arm/ComputeLibrary-23.08/build/" \
--config Release \
--build_shared_lib \
--skip_submodule_sync \
--skip_tests \
--parallel 2 \
--cmake_extra_defines "CMAKE_TOOLCHAIN_FILE=/home/alientek/arm/onnxruntime-1.18.1/cmake/linux_arm64_crosscompile_toolchain.cmake"

三、使用onnxruntime进行模型推理

将编译生成的onnxruntime/build库文件，onnxruntime/include头文件、以及编译生成的ACL/build库文件，放到自己的模型推理项目中进行编译运行，具体操作请参考我的另一篇教程Onnx模型部署到Arm64进行推理-CSDN博客。

额外说明：onnxruntime编译过程中如果报错，请参考：onnxruntime-1.22.0交叉编译arm64目标平台_onnxruntime 在rk3588上编译-CSDN博客

sherpa-onnx AI语音框架添加acl加速库实践-CSDN博客

onnxruntime配置开启ACL加速Arm上的模型推理

介绍

一、下载Onnxruntime

二、编译Arm Compute Library （ACL）库

二、编译onnxruntime

三、使用onnxruntime进行模型推理

网站公告

今日签到

热门文章

最新发布