RTX3070的yolo训练模型迁移到NVIDIA JETSON XAVIER NX 上的踩坑经验,时机部署避雷点

发布于:2024-10-18 ⋅ 阅读:(77) ⋅ 点赞:(0)

NVIDIA JETSON XAVIER NX 的yolo环境部署

首先为了保证yolo的权重模型pt文件可以顺利迁移过去,要保证torch和cuda的版本一致

如何在NX上安装torch?

1.用 jtop工具 实时查看和控制板子状态

安装:

sudo -H pip3 install jetson-stats

使用:

sudo jtop

在这里是为了确保了解自己的板子的cuda和cudnn是否已经装好了

2.安装pytorch

PyTorch for Jetson - Jetson & Embedded Systems / Announcements - NVIDIA Developer Forums

这里选择自己自己架构适合的pytorch版本号

红色部分为安装的步骤流程,蓝色部分为需要注意更改的版本号

以及得注意pytorch和torchvision版本得对齐

验证部分也可以遵循下方的Verification

如何在RTX3070上安装对应的torch版本?

我安装了

# CUDA 10.2
pip install torch==1.12.0+cu102 torchvision==0.13.0+cu102 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu102

问题:一直报错,显示的是cuda和pytorch不兼容,因为我主机系统是cuda12.0,按道理是可以兼容10.02的,

(VBee) iusl@iusl-MS-7D04:~/ultralytics-20240713/ultralytics-main$ python
Python 3.8.20 (default, Oct  3 2024, 15:24:27) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import torchvision
>>> print(torch.cuda.is_available())
True
>>> a = torch.Tensor(5,3)
>>> a=a.cuda()
/home/iusl/anaconda3/envs/VBee/lib/python3.8/site-packages/torch/cuda/__init__.py:146: UserWarning: 
NVIDIA GeForce RTX 3070 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3070 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
>>> print(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/iusl/anaconda3/envs/VBee/lib/python3.8/site-packages/torch/_tensor.py", line 338, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/home/iusl/anaconda3/envs/VBee/lib/python3.8/site-packages/torch/_tensor_str.py", line 481, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/home/iusl/anaconda3/envs/VBee/lib/python3.8/site-packages/torch/_tensor_str.py", line 447, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/home/iusl/anaconda3/envs/VBee/lib/python3.8/site-packages/torch/_tensor_str.py", line 270, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/iusl/anaconda3/envs/VBee/lib/python3.8/site-packages/torch/_tensor_str.py", line 103, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

问题分析:

PyTorch提示当前安装的版本仅支持CUDA架构sm_37、sm_50、sm_60和sm_70,而你的显卡RTX 3070的CUDA架构是sm_86,不在支持列表中。因此,尽管可以检测到显卡,但无法正确使用其计算能力。

NVIDIA CUDA架构sm_86是为RTX 30系列(如RTX 3070、3080等)显卡设计的。因此,要支持sm_86架构,你需要使用与CUDA 11.x或更高版本兼容的PyTorch。

总结:

因为我的torch安装的是

torch==1.12.0+cu102

也就是低于cuda11.x,这里的坑点在于,之前我的印象都是cuda都向下兼容,但是sm_86不支持cuda11以下的pytorch

所以更改为安装这个版本的pytorch

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113


今日签到

点亮在社区的每一天
去签到