一、配置环境
我的都是在Linux系统下,训练部署的;模型训练之前,需要配置好环境,Anaconda、显卡驱动、cuda、cudnn、pytorch等;建议在Linux下进行模型训练,能充分利用GPU,提升训练的速度。
【环境配置】Ubuntu/Debian小白从零开始配置深度学习环境和各种软件库(显卡驱动、CUDA、CUDNN、Pytorch、OpenCv、PCL、Cmake
…)【持续维护】
【yolo全家桶github官网】https://github.com/ultralytics/ultralytics
【yolo说明文档】https://docs.ultralytics.com/zh/
git clone https://github.com/ultralytics/ultralytics.git
#python 版本不固定,我一般次新版本,比如最新3.12 就用3.11
conda create -n yolov11 python==3.10 -y
conda activate yolov11
# 换成适配电脑自己的版本
#pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
#pip install ultralytics
pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple
二、数据标注
这里【强烈推荐使用】X-anylabeling, windows可以直接下载exe文件使用,linux下直接运行可执行文件,非常方便,而且后续可以加入分割大模型sam、sam2、自动标注等,可以实现快速标注;
官网链接:https://github.com/CVHub520/X-AnyLabeling
使用说明:https://blog.csdn.net/CVHub/article/details/144595664?spm=1001.2014.3001.5502
直接点击下载就行;
若是源码安装的,则要注意onnxruntime版本,onnxruntime>=1.16.0,且在运行时候启动下面代码
#一般我们往往直接是 python app.py,此时源码运行,必须在app.py的上一级目录下
python anylabeling/app.py
数据转换
标注完成后,标注文件是json格式,需要转换成yolo格式的txt文件,执行如下命令,会自动生成yolo格式的数据集,名字为YOLODataset,数据已经划分好,且配置好了数据的dataset.yaml文件;
#先安装
pip install labelme2yolov8
#data中是图片和json放到一起
labelme2yolo --json_dir ./data/ --val_size 0.1 --test_size 0.0
dataset.yaml文件内容为:若路径里面没有中文字符,就是配置路径,否则修改为自己刚刚生成数据集路径,names是类别,分别是类别的个数和名称,类别个数一定要和自己标注的类别数一样,类别名称随便起一个就行,不是特别重要。
path: \xxx\YOLODataset
train: images/train
val: images/val
test:
names:
0: yuantong
1: gunzi
2: taotong
AI自动标注和大模型sam标注示例
学会使用这个两个,标注非常方便,需要把模型转换一下,成为onnx模型。
1. AI自动标注
这个前提是我们已经有目标预训练好的模型,格式onnx,加载这个模型,可以实现一次性标注;比较适用于,可以先标注少部分,训练后一个模型,然后使用这个模型全标注,再微调即可;如果想添加数据集,二次训练,使用这个功能最好了;
1.先写配置文件yolov8n_seg.yaml
type: yolov8_seg
name: yolov8n-seg-r20230620
display_name: YOLOv8n-Seg-My-Model Ultralytics #界面显示名字
#model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.1.0/yolov8n-seg.onnx
model_path: /home/xxx/yolov11/yolo_base/model/yolo11n-seg.onnx #关键是这个路径
nms_threshold: 0.45
confidence_threshold: 0.25
classes:
- person
- bicycle
- car
- motorcycle
- airplane
- bus
- train
- truck
- boat
2.加载模型,就是加载yaml文件,然后一次性标注所有文件就可以;
2. sam大模型标注
打开X–anylabeling的软件界面,在Auto里面选择带(VIT-Base Quant)模型,此时需要科学上网,会自动下载相关onnx模型文件,一个是解码的,一个是编码的。
若手动下载,需要写好配置文件,然后选择 … Load Custom Model ,这个要选择的是.yaml格式的文件,这个.yaml文件需要我们自己写了 主要修改模型路径。下面是yaml文件的大致内容。
type: segment_anything
name: segment_anything_vit_b_quant-r20230520
display_name: Segment Anything (ViT-Base Quant)
# encoder_model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.2.0/sam_vit_b_01ec64.encoder.quant.onnx
# decoder_model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.2.0/sam_vit_b_01ec64.decoder.quant.onnx
encoder_model_path: D:\CvHub_YoLo_obb\sam_vit_h_4b8939.encoder.quant.onnx
decoder_model_path: D:\CvHub_YoLo_obb\sam_vit_h_4b8939.decoder.quant.onnx
input_size: 1024
max_width: 1024
max_height: 682
三、模型训练
下载预训练模型 点击 【ultralytics 11说明文档】 训练时候不指定版本的话,默认是YOLO11n大小;选择n/s/m/l/x哪个看自己的需求;
关键的训练参数说明,列举关键的几个参数,全部的参数参考./ultralytics/cfg/default.yaml 这个文件
参数 | 描述 |
---|---|
data | 数据集路径,配置文件coco128.yaml |
model | 模型文件路径,如yolov8n_cls.pt |
epochs | 训练的周期数 |
batch | 每批次的图像数量,一般选1 2 4 8 16,由大到小逐一尝试,太大内存不够,太小未充分利用内存 |
imgsz | 输入图像的尺寸,默认预训练模型输入大小,若图片尺寸太大,可以适当减少尺寸,如原始图片大小1920,可以设置输入模型训练是1024,可以加快推理速度,精度会略略低 |
lr0 | 学习率,模型效果不好,可以调低一些,默认0.01 |
device | 若设备只有一个,默认,若多个,可以指定,device=0,1,2,3 |
optimizer | 使用的优化器,选项包括 [SGD, Adam, Adamax, AdamW, NAdam, RAdam, RMSProp, auto] |
patience | 早停训练等待的最大无改善周期数 |
workers | 数据加载的工作线程数,默认为8 |
mosaic | 马赛克增强,若目标在图片中占比大,可以设置小一点,开启这个,不一定会增强模型 |
cd到下载的yolo/ultralytics-main文件夹里面,在终端执行训练命令,我习惯终端执行;执行训练命令,检测、分割等都比较相似,把关键词修改下就行,如,detect ->segment。
yolo detect train data=dataset.yaml model=yolo11n.pt epochs=300 imgsz=1920 amp=False batch=2 lr0=0.001 mosaic=0.05 patience=200
训练结果在runs/detect里面;观察损坏函数曲线,粗略判断训练结果的好坏;若效果不理想可以在训练的第一次模型上进行第二次训练,或者迭代次数多些,500代/1000代。
yolo detect train data=dataset.yaml model=./runs/detect/train/weights/best.pt epochs=300 imgsz=1920 amp=False batch=2 lr0=0.001 mosaic=0.05 patience=200
四、验证预测
指定预测文件夹,结果保存runs/detect/predict里面,conf是置信度,iou阈值设置为0.5
yolo detect predict model=runs/detect/train4/weights/best.pt source=/xxximages/test save=true conf=0.4 iou=0.5
五、onnx部署
训练完毕后,一般要部署到电脑、嵌入式设备中,这是导出onnx模型进行部署;onnx版本是17,看你自己安装的版本;在weights文件夹里面,生成对应的best.onnx文件。opset=17是指定的版本,可以向下兼容;高版本可以降低初始化时间,自己测试的,也不一定。
yolo export model=/xxx/yolov11/runs/detect/train4/weights/best.pt format=onnx opset=17 simplify=True
可以使用在线网站 https://netron.app/,查看导出的onnx模型的输入和输出。
c++ 版
需要安装onnxruntime推理库,onnx比较友好,其实不需要源码编译,官网有直接生成的库文件可以是使用;这也是onnx广泛应用的原因,虽然onnx一般cpu速度不如openvino,gpu速度不如tensort,但通用强,部署相对简单点。
下载对应的onnxruntime包,官网 https://github.com/microsoft/onnxruntime/releases/,目前最新的出了1.22版本,我用的是1.17,看自己的配置需求,必须与上述导出的模型一致;可以安装高版本,向下兼容。
主要参考大佬github开源文件 https://github.com/UNeedCryDear/yolov8-opencv-onnxruntime-cpp,进行了简单修改,把模型初始化放到了主程序里面,推理预测放到检测里面,不用每次都执行一遍初始化。主要涉及五个文件,main.cpp yolov8_utils.h yolov8_onnx.h yolov8_utils.cpp yolov8_onnx.cpp。
yolov8_utils.h 主要是额外的结果结构体,显示的函数等;
#pragma once
#include<iostream>
#include <numeric>
#include <unistd.h>
#include<opencv2/opencv.hpp>
#define ORT_OLD_VISON 13 //ort1.12.0 ֮ǰ�İ汾Ϊ�ɰ汾API
struct PoseKeyPoint {
float x = 0;
float y = 0;
float confidence = 0;
};
struct OutputParams {
int id; //������id
float confidence; //������Ŷ�
cv::Rect box; //���ο�
cv::RotatedRect rotatedBox; //obb������ο�
cv::Mat boxMask; //���ο���mask����ʡ�ڴ�ռ�ͼӿ��ٶ�
std::vector<PoseKeyPoint> keyPoints; //pose key points
};
struct MaskParams {
//int segChannels = 32;
//int segWidth = 160;
//int segHeight = 160;
int netWidth = 1920; //640
int netHeight = 1920; //640
float maskThreshold = 0.5;
cv::Size srcImgShape;
cv::Vec4d params;
};
struct PoseParams {
float kptThreshold = 0.5;
int kptRadius = 5;
bool isDrawKptLine = true; //If True, the function will draw lines connecting keypoint for human pose.Default is True.
cv::Scalar personColor = cv::Scalar(0, 0, 255);
std::vector<std::vector<int>>skeleton = {
{16, 14} ,{14, 12},{17, 15},{15, 13},
{12, 13},{6, 12},{7, 13},{6, 7},{6, 8},{7, 9},
{8, 10},{9, 11},{2, 3},{1, 2},{1, 3},{2, 4},
{3, 5},{4, 6},{5, 7}
};
std::vector<cv::Scalar> posePalette =
{
cv::Scalar(255, 128, 0) ,
cv::Scalar(255, 153, 51),
cv::Scalar(255, 178, 102),
cv::Scalar(230, 230, 0),
cv::Scalar(255, 153, 255),
cv::Scalar(153, 204, 255),
cv::Scalar(255, 102, 255),
cv::Scalar(255, 51, 255),
cv::Scalar(102, 178, 255),
cv::Scalar(51, 153, 255),
cv::Scalar(255, 153, 153),
cv::Scalar(255, 102, 102),
cv::Scalar(255, 51, 51),
cv::Scalar(153, 255, 153),
cv::Scalar(102, 255, 102),
cv::Scalar(51, 255, 51),
cv::Scalar(0, 255, 0),
cv::Scalar(0, 0, 255),
cv::Scalar(255, 0, 0),
cv::Scalar(255, 255, 255),
};
std::vector<int> limbColor = { 9, 9, 9, 9, 7, 7, 7, 0, 0, 0, 0, 0, 16, 16, 16, 16, 16, 16, 16 };
std::vector<int> kptColor = { 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 9, 9, 9, 9, 9, 9 };
std::map<unsigned int, std::string> kptBodyNames{
{0,"Nose"},
{1, "left_eye"}, {2, "right_eye"},
{3, "left_ear"}, {4, "right_ear"},
{5, "left_shoulder"}, {6, "right_shoulder"},
{7, "left_elbow"}, {8, "right_elbow"},
{9, "left_wrist"}, {10,"right_wrist"},
{11,"left_hip"}, {12,"right_hip"},
{13,"left_knee"}, {14,"right_knee"},
{15,"left_ankle"}, {16,"right_ankle"}
};
};
bool CheckModelPath(std::string modelPath);
bool CheckParams(int netHeight, int netWidth, const int* netStride, int strideSize);
void DrawPred(cv::Mat& img,
std::vector<OutputParams> result,
std::vector<std::string> classNames,
std::vector<cv::Scalar> color,
bool isVideo = false
);
void DrawPredPose(cv::Mat& img, std::vector<OutputParams> result, PoseParams& poseParams, bool isVideo = false);
void DrawRotatedBox(cv::Mat& srcImg, cv::RotatedRect box, cv::Scalar color, int thinkness);
void LetterBox(const cv::Mat& image, cv::Mat& outImage,
cv::Vec4d& params, //[ratio_x,ratio_y,dw,dh]
const cv::Size& newShape = cv::Size(640, 640),//640,640
bool autoShape = false,
bool scaleFill = false,
bool scaleUp = true,
int stride = 32,
const cv::Scalar& color = cv::Scalar(114, 114, 114));
void GetMask(const cv::Mat& maskProposals, const cv::Mat& maskProtos, std::vector<OutputParams>& output, const MaskParams& maskParams);
void GetMask2(const cv::Mat& maskProposals, const cv::Mat& maskProtos, OutputParams& output, const MaskParams& maskParams);
int BBox2Obb(float centerX, float centerY, float boxW, float boxH, float angle, cv::RotatedRect& rotatedRect);
yolov8_utils.cpp 画图显示,可以修改为自己的风格
#pragma once
#include "yolov8_utils.h"
//using namespace cv;
//using namespace std;
bool CheckParams(int netHeight, int netWidth, const int* netStride, int strideSize) {
if (netHeight % netStride[strideSize - 1] != 0 || netWidth % netStride[strideSize - 1] != 0)
{
std::cout << "Error:_netHeight and _netWidth must be multiple of max stride " << netStride[strideSize - 1] << "!" << std::endl;
return false;
}
return true;
}
bool CheckModelPath(std::string modelPath) {
if (0 != access(modelPath.c_str(), 0)) {
std::cout << "Model path does not exist, please check " << modelPath << std::endl;
return false;
}
else
return true;
}
void LetterBox(const cv::Mat& image, cv::Mat& outImage, cv::Vec4d& params, const cv::Size& newShape,
bool autoShape, bool scaleFill, bool scaleUp, int stride, const cv::Scalar& color)
{
if (false) {
int maxLen = MAX(image.rows, image.cols);
outImage = cv::Mat::zeros(cv::Size(maxLen, maxLen), CV_8UC3);
image.copyTo(outImage(cv::Rect(0, 0, image.cols, image.rows)));
params[0] = 1;
params[1] = 1;
params[3] = 0;
params[2] = 0;
}
cv::Size shape = image.size();
float r = std::min((float)newShape.height / (float)shape.height,
(float)newShape.width / (float)shape.width);
if (!scaleUp)
r = std::min(r, 1.0f);
float ratio[2]{ r, r };
int new_un_pad[2] = { (int)std::round((float)shape.width * r),(int)std::round((float)shape.height * r) };
auto dw = (float)(newShape.width - new_un_pad[0]);
auto dh = (float)(newShape.height - new_un_pad[1]);
if (autoShape)
{
dw = (float)((int)dw % stride);
dh = (float)((int)dh % stride);
}
else if (scaleFill)
{
dw = 0.0f;
dh = 0.0f;
new_un_pad[0] = newShape.width;
new_un_pad[1] = newShape.height;
ratio[0] = (float)newShape.width / (float)shape.width;
ratio[1] = (float)newShape.height / (float)shape.height;
}
dw /= 2.0f;
dh /= 2.0f;
if (shape.width != new_un_pad[0] && shape.height != new_un_pad[1])
{
cv::resize(image, outImage, cv::Size(new_un_pad[0], new_un_pad[1]));
}
else {
outImage = image.clone();
}
int top = int(std::round(dh - 0.1f));
int bottom = int(std::round(dh + 0.1f));
int left = int(std::round(dw - 0.1f));
int right = int(std::round(dw + 0.1f));
params[0] = ratio[0];
params[1] = ratio[1];
params[2] = left;
params[3] = top;
cv::copyMakeBorder(outImage, outImage, top, bottom, left, right, cv::BORDER_CONSTANT, color);
}
void GetMask(const cv::Mat& maskProposals, const cv::Mat& maskProtos, std::vector<OutputParams>& output, const MaskParams& maskParams) {
//std::cout << maskProtos.size << std::endl;
int net_width = maskParams.netWidth;
int net_height = maskParams.netHeight;
int seg_channels = maskProtos.size[1];
int seg_height = maskProtos.size[2];
int seg_width = maskProtos.size[3];
float mask_threshold = maskParams.maskThreshold;
cv::Vec4f params = maskParams.params;
cv::Size src_img_shape = maskParams.srcImgShape;
cv::Mat protos = maskProtos.reshape(0, { seg_channels,seg_width * seg_height });
cv::Mat matmul_res = (maskProposals * protos).t();
cv::Mat masks = matmul_res.reshape(output.size(), { seg_width,seg_height });
std::vector<cv::Mat> maskChannels;
split(masks, maskChannels);
for (int i = 0; i < output.size(); ++i) {
cv::Mat dest, mask;
//sigmoid
cv::exp(-maskChannels[i], dest);
dest = 1.0 / (1.0 + dest);
cv::Rect roi(int(params[2] / net_width * seg_width), int(params[3] / net_height * seg_height), int(seg_width - params[2] / 2), int(seg_height - params[3] / 2));
dest = dest(roi);
resize(dest, mask, src_img_shape, cv::INTER_NEAREST);
//crop
cv::Rect temp_rect = output[i].box;
mask = mask(temp_rect) > mask_threshold;
output[i].boxMask = mask;
}
}
void GetMask2(const cv::Mat& maskProposals, const cv::Mat& maskProtos, OutputParams& output, const MaskParams& maskParams) {
int net_width = maskParams.netWidth;
int net_height = maskParams.netHeight;
int seg_channels = maskProtos.size[1];
int seg_height = maskProtos.size[2];
int seg_width = maskProtos.size[3];
float mask_threshold = maskParams.maskThreshold;
cv::Vec4f params = maskParams.params;
cv::Size src_img_shape = maskParams.srcImgShape;
cv::Rect temp_rect = output.box;
//crop from mask_protos
int rang_x = floor((temp_rect.x * params[0] + params[2]) / net_width * seg_width);
int rang_y = floor((temp_rect.y * params[1] + params[3]) / net_height * seg_height);
int rang_w = ceil(((temp_rect.x + temp_rect.width) * params[0] + params[2]) / net_width * seg_width) - rang_x;
int rang_h = ceil(((temp_rect.y + temp_rect.height) * params[1] + params[3]) / net_height * seg_height) - rang_y;
//�������� mask_protos(roi_rangs).clone()λ�ñ�����˵�����output.box���ݲ��ԣ����߾��ο��1�����صģ����������ע�Ͳ��ַ�ֹ������
rang_w = MAX(rang_w, 1);
rang_h = MAX(rang_h, 1);
if (rang_x + rang_w > seg_width) {
if (seg_width - rang_x > 0)
rang_w = seg_width - rang_x;
else
rang_x -= 1;
}
if (rang_y + rang_h > seg_height) {
if (seg_height - rang_y > 0)
rang_h = seg_height - rang_y;
else
rang_y -= 1;
}
std::vector<cv::Range> roi_rangs;
roi_rangs.push_back(cv::Range(0, 1));
roi_rangs.push_back(cv::Range::all());
roi_rangs.push_back(cv::Range(rang_y, rang_h + rang_y));
roi_rangs.push_back(cv::Range(rang_x, rang_w + rang_x));
//crop
cv::Mat temp_mask_protos = maskProtos(roi_rangs).clone();
cv::Mat protos = temp_mask_protos.reshape(0, { seg_channels,rang_w * rang_h });
cv::Mat matmul_res = (maskProposals * protos).t();
cv::Mat masks_feature = matmul_res.reshape(1, { rang_h,rang_w });
cv::Mat dest, mask;
//sigmoid
cv::exp(-masks_feature, dest);
dest = 1.0 / (1.0 + dest);
int left = floor((net_width / seg_width * rang_x - params[2]) / params[0]);
int top = floor((net_height / seg_height * rang_y - params[3]) / params[1]);
int width = ceil(net_width / seg_width * rang_w / params[0]);
int height = ceil(net_height / seg_height * rang_h / params[1]);
resize(dest, mask, cv::Size(width, height), cv::INTER_NEAREST);
cv::Rect mask_rect = temp_rect - cv::Point(left, top);
mask_rect &= cv::Rect(0, 0, width, height);
mask = mask(mask_rect) > mask_threshold;
if (mask.rows != temp_rect.height || mask.cols != temp_rect.width) { //https://github.com/UNeedCryDear/yolov8-opencv-onnxruntime-cpp/pull/30
resize(mask, mask, temp_rect.size(), cv::INTER_NEAREST);
}
output.boxMask = mask;
}
int BBox2Obb(float centerX, float centerY, float boxW, float boxH, float angle, cv::RotatedRect& rotatedRect) {
rotatedRect = cv::RotatedRect(cv::Point2f(centerX, centerY), cv::Size2f(boxW, boxH),angle);
return 0;
}
void DrawRotatedBox(cv::Mat &srcImg, cv::RotatedRect box, cv::Scalar color, int thinkness) {
cv::Point2f p[4];
box.points(p);
for (int l = 0; l < 4; ++l) {
line(srcImg, p[l], p[(l + 1) % 4], color, thinkness, 8);
}
}
void DrawPred(cv::Mat& img, std::vector<OutputParams> result, std::vector<std::string> classNames, std::vector<cv::Scalar> color, bool isVideo) {
cv::Mat mask = img.clone();
for (int i = 0; i < result.size(); i++) {
int left=0, top=0;
int color_num = i;
if (result[i].box.area() > 0) {
rectangle(img, result[i].box, color[result[i].id], 2, 8);
left = result[i].box.x;
top = result[i].box.y;
}
if (result[i].rotatedBox.size.width * result[i].rotatedBox.size.height > 0) {
DrawRotatedBox(img, result[i].rotatedBox, color[result[i].id], 2);
left = result[i].rotatedBox.center.x;
top = result[i].rotatedBox.center.y;
}
if (result[i].boxMask.rows && result[i].boxMask.cols > 0)
mask(result[i].box).setTo(color[result[i].id], result[i].boxMask);
std::string label = classNames[result[i].id] + ":" + std::to_string(result[i].confidence);
int baseLine;
cv::Size labelSize = cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
top = MAX(top, labelSize.height);
//rectangle(frame, cv::Point(left, top - int(1.5 * labelSize.height)), cv::Point(left + int(1.5 * labelSize.width), top + baseLine), cv::Scalar(0, 255, 0), FILLED);
putText(img, label, cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 1, color[result[i].id], 2);
}
//std::cout<<"end draw_img"<<std::endl;
cv::addWeighted(img, 0.5, mask, 0.5, 0, img); //add mask to src
cv::imwrite("pre_img.png",img);
cv::imshow("pre_img", img);
// if (!isVideo)
// cv::waitKey();
//destroyAllWindows();
}
void DrawPredPose(cv::Mat& img, std::vector<OutputParams> result, PoseParams& poseParams, bool isVideo) {
for (int i = 0; i < result.size(); i++) {
int left, top;
int color_num = i;
if (result[i].box.area() > 0) {
rectangle(img, result[i].box, poseParams.personColor, 2, 8);
left = result[i].box.x;
top = result[i].box.y;
}
else
continue;
std::string label = "person:" + std::to_string(result[i].confidence);
int baseLine;
cv::Size labelSize = cv::getTextSize(label, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
top = MAX(top, labelSize.height);
//rectangle(frame, cv::Point(left, top - int(1.5 * labelSize.height)), cv::Point(left + int(1.5 * labelSize.width), top + baseLine), cv::Scalar(0, 255, 0), FILLED);
putText(img, label, cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 1, poseParams.personColor, 2);
if (result[i].keyPoints.size()!= poseParams.kptBodyNames.size())
continue;
for (int j = 0; j < result[i].keyPoints.size(); ++j) {
PoseKeyPoint kpt = result[i].keyPoints[j];
if (kpt.confidence < poseParams.kptThreshold)
continue;
cv::Scalar kptColor = poseParams.posePalette[poseParams.kptColor[j]];
cv::circle(img, cv::Point(kpt.x, kpt.y), poseParams.kptRadius, kptColor, -1, 8);
}
if (poseParams.isDrawKptLine) {
for (int j = 0; j < poseParams.skeleton.size(); ++j) {
PoseKeyPoint kpt0 = result[i].keyPoints[poseParams.skeleton[j][0]-1];
PoseKeyPoint kpt1 = result[i].keyPoints[poseParams.skeleton[j][1]-1];
if (kpt0.confidence < poseParams.kptThreshold || kpt1.confidence < poseParams.kptThreshold)
continue;
cv::Scalar kptColor= poseParams.posePalette[poseParams.limbColor[j]];
cv::line(img, cv::Point(kpt0.x, kpt0.y),cv::Point(kpt1.x, kpt1.y), kptColor, 2, 8);
}
}
}
cv::imshow("1", img);
if (!isVideo)
cv::waitKey();
//destroyAllWindows();
}
yolov8_onnx.h 一定要记得修改模型输入的大小 const int _netWidth和_netHeight ,要与模型导出的输入大小一样。
#pragma once
#include <iostream>
#include<memory>
#include <opencv2/opencv.hpp>
#include "yolov8_utils.h"
//#include<onnxruntime_cxx_api.h>
#include <onnxruntime_cxx_api.h>
//#include <tensorrt_provider_factory.h> //if use OrtTensorRTProviderOptionsV2
//#include <onnxruntime_c_api.h>
class Yolov8Onnx {
public:
Yolov8Onnx() :_OrtMemoryInfo(Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtDeviceAllocator, OrtMemType::OrtMemTypeCPUOutput)) {};
~Yolov8Onnx() {
if (_OrtSession != nullptr)
delete _OrtSession;
};// delete _OrtMemoryInfo;
public:
/** \brief Read onnx-model
* \param[in] modelPath:onnx-model path
* \param[in] isCuda:if true,use Ort-GPU,else run it on cpu.
* \param[in] cudaID:if isCuda==true,run Ort-GPU on cudaID.
* \param[in] warmUp:if isCuda==true,warm up GPU-model.
*/
bool ReadModel(const std::string& modelPath, bool isCuda = false, int cudaID = 0, bool warmUp = true);
/** \brief detect.
* \param[in] srcImg:a 3-channels image.
* \param[out] output:detection results of input image.
*/
bool OnnxDetect(cv::Mat& srcImg, std::vector<OutputParams>& output);
/** \brief detect,batch size= _batchSize
* \param[in] srcImg:A batch of images.
* \param[out] output:detection results of input images.
*/
bool OnnxBatchDetect(std::vector<cv::Mat>& srcImg, std::vector<std::vector<OutputParams>>& output);
private:
template <typename T>
T VectorProduct(const std::vector<T>& v)
{
return std::accumulate(v.begin(), v.end(), 1, std::multiplies<T>());
};
int Preprocessing(const std::vector<cv::Mat>& srcImgs, std::vector<cv::Mat>& outSrcImgs, std::vector<cv::Vec4d>& params);
const int _netWidth = 1920; //ONNX-net-input-width 640
const int _netHeight = 1920; //ONNX-net-input-height
int _batchSize = 1; //if multi-batch,set this
bool _isDynamicShape = false;//onnx support dynamic shape
float _classThreshold = 0.3;
float _nmsThreshold = 0.45;
float _maskThreshold = 0.5;
//ONNXRUNTIME
Ort::Env _OrtEnv = Ort::Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_ERROR, "Yolov8");
Ort::SessionOptions _OrtSessionOptions = Ort::SessionOptions();
Ort::Session* _OrtSession = nullptr;
Ort::MemoryInfo _OrtMemoryInfo;
#if ORT_API_VERSION < ORT_OLD_VISON
char* _inputName, * _output_name0;
#else
std::shared_ptr<char> _inputName, _output_name0;
#endif
std::vector<char*> _inputNodeNames; //����ڵ���
std::vector<char*> _outputNodeNames;//����ڵ���
size_t _inputNodesNum = 0; //����ڵ���
size_t _outputNodesNum = 0; //����ڵ���
ONNXTensorElementDataType _inputNodeDataType; //��������
ONNXTensorElementDataType _outputNodeDataType;
std::vector<int64_t> _inputTensorShape; //��������shape
std::vector<int64_t> _outputTensorShape;
public:
std::vector<std::string> _className = {
"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
"fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
"sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
"potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
"hair drier", "toothbrush"
};
// std::vector<std::string> _Name = {
// "9.4","9.2","9.3","9.1","5.1","5.2","8.3","4.1","4.2","7.1",
// "3.1","8.1","8.2","1.2","5.3","5.4","1.1","4.5","4.6","3.2","4.3","4.4","2.1","6.3","6.1",
// "6.2","5.5","5.6","2.2"
// };
std::vector<std::string> _Name = {
"0","1","2"};
};
yolov8_onnx.cpp 整个推理的核心过程
#include "yolov8_onnx.h"
//using namespace std;
//using namespace cv;
//using namespace cv::dnn;
using namespace Ort;
bool Yolov8Onnx::ReadModel(const std::string& modelPath, bool isCuda, int cudaID, bool warmUp) {
if (_batchSize < 1) _batchSize = 1;
try
{
if (!CheckModelPath(modelPath))
return false;
std::vector<std::string> available_providers = GetAvailableProviders();
auto cuda_available = std::find(available_providers.begin(), available_providers.end(), "CUDAExecutionProvider");
if (isCuda && (cuda_available == available_providers.end()))
{
std::cout << "Your ORT build without GPU. Change to CPU." << std::endl;
std::cout << "************* Infer model on CPU! *************" << std::endl;
}
else if (isCuda && (cuda_available != available_providers.end()))
{
std::cout << "************* Infer model on GPU! *************" << std::endl;
#if ORT_API_VERSION < ORT_OLD_VISON
OrtCUDAProviderOptions cudaOption;
cudaOption.device_id = cudaID;
_OrtSessionOptions.AppendExecutionProvider_CUDA(cudaOption);
#else
OrtStatus* status = OrtSessionOptionsAppendExecutionProvider_CUDA(_OrtSessionOptions, cudaID);
#endif
}
else
{
std::cout << "************* Infer model on CPU! *************" << std::endl;
}
//
_OrtSessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
#ifdef _WIN32
std::wstring model_path(modelPath.begin(), modelPath.end());
_OrtSession = new Ort::Session(_OrtEnv, model_path.c_str(), _OrtSessionOptions);
#else
_OrtSession = new Ort::Session(_OrtEnv, modelPath.c_str(), _OrtSessionOptions);
#endif
Ort::AllocatorWithDefaultOptions allocator;
//init input
_inputNodesNum = _OrtSession->GetInputCount();
#if ORT_API_VERSION < ORT_OLD_VISON
_inputName = _OrtSession->GetInputName(0, allocator);
_inputNodeNames.push_back(_inputName);
#else
_inputName = std::move(_OrtSession->GetInputNameAllocated(0, allocator));
_inputNodeNames.push_back(_inputName.get());
#endif
//std::cout << _inputNodeNames[0] << std::endl;
Ort::TypeInfo inputTypeInfo = _OrtSession->GetInputTypeInfo(0);
auto input_tensor_info = inputTypeInfo.GetTensorTypeAndShapeInfo();
_inputNodeDataType = input_tensor_info.GetElementType();
_inputTensorShape = input_tensor_info.GetShape();
if (_inputTensorShape[0] == -1)
{
_isDynamicShape = true;
_inputTensorShape[0] = _batchSize;
}
if (_inputTensorShape[2] == -1 || _inputTensorShape[3] == -1) {
_isDynamicShape = true;
_inputTensorShape[2] = _netHeight;
_inputTensorShape[3] = _netWidth;
}
//init output
_outputNodesNum = _OrtSession->GetOutputCount();
#if ORT_API_VERSION < ORT_OLD_VISON
_output_name0 = _OrtSession->GetOutputName(0, allocator);
_outputNodeNames.push_back(_output_name0);
#else
_output_name0 = std::move(_OrtSession->GetOutputNameAllocated(0, allocator));
_outputNodeNames.push_back(_output_name0.get());
#endif
Ort::TypeInfo type_info_output0(nullptr);
type_info_output0 = _OrtSession->GetOutputTypeInfo(0); //output0
auto tensor_info_output0 = type_info_output0.GetTensorTypeAndShapeInfo();
_outputNodeDataType = tensor_info_output0.GetElementType();
_outputTensorShape = tensor_info_output0.GetShape();
//_outputMaskNodeDataType = tensor_info_output1.GetElementType(); //the same as output0
//_outputMaskTensorShape = tensor_info_output1.GetShape();
//if (_outputTensorShape[0] == -1)
//{
// _outputTensorShape[0] = _batchSize;
// _outputMaskTensorShape[0] = _batchSize;
//}
//if (_outputMaskTensorShape[2] == -1) {
// //size_t ouput_rows = 0;
// //for (int i = 0; i < _strideSize; ++i) {
// // ouput_rows += 3 * (_netWidth / _netStride[i]) * _netHeight / _netStride[i];
// //}
// //_outputTensorShape[1] = ouput_rows;
// _outputMaskTensorShape[2] = _segHeight;
// _outputMaskTensorShape[3] = _segWidth;
//}
//warm up
if (isCuda && warmUp) {
//draw run
std::cout << "Start warming up" << std::endl;
size_t input_tensor_length = VectorProduct(_inputTensorShape);
float* temp = new float[input_tensor_length];
std::vector<Ort::Value> input_tensors;
std::vector<Ort::Value> output_tensors;
input_tensors.push_back(Ort::Value::CreateTensor<float>(
_OrtMemoryInfo, temp, input_tensor_length, _inputTensorShape.data(),
_inputTensorShape.size()));
for (int i = 0; i < 3; ++i) {
output_tensors = _OrtSession->Run(Ort::RunOptions{ nullptr },
_inputNodeNames.data(),
input_tensors.data(),
_inputNodeNames.size(),
_outputNodeNames.data(),
_outputNodeNames.size());
}
delete[]temp;
}
}
catch (const std::exception&) {
return false;
}
return true;
}
int Yolov8Onnx::Preprocessing(const std::vector<cv::Mat>& srcImgs, std::vector<cv::Mat>& outSrcImgs, std::vector<cv::Vec4d>& params) {
outSrcImgs.clear();
cv::Size input_size = cv::Size(_netWidth, _netHeight);
for (int i = 0; i < srcImgs.size(); ++i) {
cv::Mat temp_img = srcImgs[i];
cv::Vec4d temp_param = {1,1,0,0};
if (temp_img.size() != input_size) {
cv::Mat borderImg;
LetterBox(temp_img, borderImg, temp_param, input_size, false, false, true, 32);
//std::cout << borderImg.size() << std::endl;
outSrcImgs.push_back(borderImg);
params.push_back(temp_param);
}
else {
outSrcImgs.push_back(temp_img);
params.push_back(temp_param);
}
}
int lack_num = _batchSize- srcImgs.size();
if (lack_num > 0) {
for (int i = 0; i < lack_num; ++i) {
cv::Mat temp_img = cv::Mat::zeros(input_size, CV_8UC3);
cv::Vec4d temp_param = { 1,1,0,0 };
outSrcImgs.push_back(temp_img);
params.push_back(temp_param);
}
}
return 0;
}
bool Yolov8Onnx::OnnxDetect(cv::Mat& srcImg, std::vector<OutputParams>& output) {
std::vector<cv::Mat> input_data = { srcImg };
std::vector<std::vector<OutputParams>> tenp_output;
if (OnnxBatchDetect(input_data, tenp_output)) {
output = tenp_output[0];
return true;
}
else return false;
}
bool Yolov8Onnx::OnnxBatchDetect(std::vector<cv::Mat>& srcImgs, std::vector<std::vector<OutputParams>>& output) {
std::vector<cv::Vec4d> params;
std::vector<cv::Mat> input_images;
cv::Size input_size(_netWidth, _netHeight);
//preprocessing
Preprocessing(srcImgs, input_images, params);
cv::Mat blob = cv::dnn::blobFromImages(input_images, 1 / 255.0, input_size, cv::Scalar(0, 0, 0), true, false);
int64_t input_tensor_length = VectorProduct(_inputTensorShape);
std::vector<Ort::Value> input_tensors;
std::vector<Ort::Value> output_tensors;
input_tensors.push_back(Ort::Value::CreateTensor<float>(_OrtMemoryInfo, (float*)blob.data, input_tensor_length, _inputTensorShape.data(), _inputTensorShape.size()));
output_tensors = _OrtSession->Run(Ort::RunOptions{ nullptr },
_inputNodeNames.data(),
input_tensors.data(),
_inputNodeNames.size(),
_outputNodeNames.data(),
_outputNodeNames.size()
);
//post-process
float* all_data = output_tensors[0].GetTensorMutableData<float>();
_outputTensorShape = output_tensors[0].GetTensorTypeAndShapeInfo().GetShape();
int net_width = _outputTensorShape[1];
int socre_array_length = net_width - 4;
int64_t one_output_length = VectorProduct(_outputTensorShape) / _outputTensorShape[0];
for (int img_index = 0; img_index < srcImgs.size(); ++img_index) {
cv::Mat output0 = cv::Mat(cv::Size((int)_outputTensorShape[2], (int)_outputTensorShape[1]), CV_32F, all_data).t(); //[bs,116,8400]=>[bs,8400,116]
all_data += one_output_length;
float* pdata = (float*)output0.data;
int rows = output0.rows;
std::vector<int> class_ids;//���id����
std::vector<float> confidences;//���ÿ��id��Ӧ���Ŷ�����
std::vector<cv::Rect> boxes;//ÿ��id���ο�
for (int r = 0; r < rows; ++r) { //stride
cv::Mat scores(1, socre_array_length, CV_32F, pdata + 4);
cv::Point classIdPoint;
double max_class_socre;
minMaxLoc(scores, 0, &max_class_socre, 0, &classIdPoint);
max_class_socre = (float)max_class_socre;
if (max_class_socre >= _classThreshold) {
//rect [x,y,w,h]
float x = (pdata[0] - params[img_index][2]) / params[img_index][0]; //x
float y = (pdata[1] - params[img_index][3]) / params[img_index][1]; //y
float w = pdata[2] / params[img_index][0]; //w
float h = pdata[3] / params[img_index][1]; //h
int left = MAX(int(x - 0.5 * w + 0.5), 0);
int top = MAX(int(y - 0.5 * h + 0.5), 0);
class_ids.push_back(classIdPoint.x);
confidences.push_back(max_class_socre);
boxes.push_back(cv::Rect(left, top, int(w + 0.5), int(h + 0.5)));
}
pdata += net_width;//��һ��
}
std::vector<int> nms_result;
cv::dnn::NMSBoxes(boxes, confidences, _classThreshold, _nmsThreshold, nms_result);
std::vector<std::vector<float>> temp_mask_proposals;
cv::Rect holeImgRect(0, 0, srcImgs[img_index].cols, srcImgs[img_index].rows);
std::vector<OutputParams> temp_output;
for (int i = 0; i < nms_result.size(); ++i) {
int idx = nms_result[i];
OutputParams result;
result.id = class_ids[idx];
result.confidence = confidences[idx];
result.box = boxes[idx] & holeImgRect;
temp_output.push_back(result);
}
output.push_back(temp_output);
}
if (output.size())
return true;
else
return false;
}
main.cpp 使用过程
#include <iostream>
#include<opencv2/opencv.hpp>
#include<math.h>
#include "yolov8_onnx.h"
#include<time.h>
//#define VIDEO_OPENCV //if define, use opencv for video.
using namespace std;
using namespace cv;
using namespace dnn;
template<typename _Tp>
std::vector<OutputParams> yolov8_onnx(_Tp& task, cv::Mat& img, std::string& model_path)
{
// if (task.ReadModel(model_path, false,0,true)) {
// std::cout << "read net ok!" << std::endl;
// }
//生成随机颜色
std::vector<cv::Scalar> color;
srand(time(0));
for (int i = 0; i < 80; i++) {
int b = rand() % 256;
int g = rand() % 256;
int r = rand() % 256;
color.push_back(cv::Scalar(b, g, r));
}
std::vector<OutputParams> result;
if (task.OnnxDetect(img, result)) {
std::cout<<"111"<<std::endl;
//发现这个画图程序挺占时间的
//DrawPred(img, result, task._Name, color,false);
}
else {
std::cout << "Detect Failed!" << std::endl;
}
//system("pause");
return result;
}
int main() {
std::string img_path = "../images/_20250609_144509.bmp";
std::string model_path_detect = "../models/20250612.onnx";
cv::Mat src = imread(img_path);
cv::Mat img = src.clone();
Yolov8Onnx task_detect_ort;
if (task_detect_ort.ReadModel(model_path_detect, false,0,true)) {
std::cout << "read net ok!" << std::endl;
}
std::vector<OutputParams> results_detect;
long long startTime = std::chrono::system_clock::now().time_since_epoch().count(); //ns
results_detect=yolov8_onnx(task_detect_ort, img, model_path_detect);
long long timeNow = std::chrono::system_clock::now().time_since_epoch().count();
double timeuse = (timeNow - startTime) * 0.000001;
//std::cout<<"end detect"<<endl;
std::cout << (timeNow - startTime) * 0.000001 << "ms\n";
std::cout<<"num: "<<results_detect.size()<<endl;
OutputParams out_result;
for (int i = 0; i < results_detect.size(); i++) {
cout<<results_detect[i].id<<" "<<task_detect_ort._Name[results_detect[i].id]<<" conf: "<<results_detect[i].confidence<<" rect: "<< results_detect[i].box<<endl;
}
cv::waitKey(0);
return 0;
}
附赠一个CMakeLists.txt
CMAKE_MINIMUM_REQUIRED(VERSION 3.0.0)
project(YOLOv8)
SET (ONNXRUNTIME_DIR /home/xxx/src/onnxruntime-linux-x64-gpu-1.17.0)
find_package(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
# 打印opencv信息
message(STATUS "OpenCV library status:")
message(STATUS " config: ${OpenCV_DIR}")
message(STATUS " version: ${OpenCV_VERSION}")
message(STATUS " libraries: ${OpenCV_LIBS}")
message(STATUS " include path: ${OpenCV_INCLUDE_DIRS}")
# 添加PCL环境
find_package(PCL REQUIRED)
add_definitions(${PCL_DEFINITIONS})
include_directories(${PCL_INCLUDE_DIRS})
link_directories(${PCL_LIBRARY_DIRS})
find_package(VTK REQUIRED)
ADD_EXECUTABLE(YOLOv8 yolov8_utils.h yolov8_onnx.h
main.cpp yolov8_utils.cpp yolov8_onnx.cpp )
SET(CMAKE_CXX_STANDARD 14)
SET(CMAKE_CXX_STANDARD_REQUIRED ON)
TARGET_INCLUDE_DIRECTORIES(YOLOv8 PRIVATE "${ONNXRUNTIME_DIR}/include")
TARGET_COMPILE_FEATURES(YOLOv8 PRIVATE cxx_std_14)
TARGET_LINK_LIBRARIES(YOLOv8 ${OpenCV_LIBS}
)
# TARGET_LINK_LIBRARIES(YOLOv8 "${ONNXRUNTIME_DIR}/lib/libonnxruntime.so")
# TARGET_LINK_LIBRARIES(YOLOv8 "${ONNXRUNTIME_DIR}/lib/libonnxruntime.so.1.13.1")
if (WIN32)
TARGET_LINK_LIBRARIES(YOLOv8 "${ONNXRUNTIME_DIR}/lib/onnxruntime.lib")
endif(WIN32)
if (UNIX)
TARGET_LINK_LIBRARIES(YOLOv8 "${ONNXRUNTIME_DIR}/lib/libonnxruntime.so")
endif(UNIX)
修改图片和模型路径,使用cmake make下生成可执行文件,然后在可执行文件目录下Linux终端运行 ./YOLOv8
python版本
python部署,相对简单,修改下面模型和图片路径,直接运行就可以。
import cv2
import onnxruntime as ort
from PIL import Image
import numpy as np
# 置信度
confidence_thres = 0.35
# iou阈值
iou_thres = 0.5
# 类别
classes = {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus','...'}
# 随机颜色
color_palette = np.random.uniform(100, 255, size=(len(classes), 3))
# 判断是使用GPU或CPU
providers = [
('CUDAExecutionProvider', {
'device_id': 0, # 可以选择GPU设备ID,如果你有多个GPU
}),
'CPUExecutionProvider', # 也可以设置CPU作为备选
]
def calculate_iou(box, other_boxes):
"""
计算给定边界框与一组其他边界框之间的交并比(IoU)。
参数:
- box: 单个边界框,格式为 [x1, y1, width, height]。
- other_boxes: 其他边界框的数组,每个边界框的格式也为 [x1, y1, width, height]。
返回值:
- iou: 一个数组,包含给定边界框与每个其他边界框的IoU值。
"""
# 计算交集的左上角坐标
x1 = np.maximum(box[0], np.array(other_boxes)[:, 0])
y1 = np.maximum(box[1], np.array(other_boxes)[:, 1])
# 计算交集的右下角坐标
x2 = np.minimum(box[0] + box[2], np.array(other_boxes)[:, 0] + np.array(other_boxes)[:, 2])
y2 = np.minimum(box[1] + box[3], np.array(other_boxes)[:, 1] + np.array(other_boxes)[:, 3])
# 计算交集区域的面积
intersection_area = np.maximum(0, x2 - x1) * np.maximum(0, y2 - y1)
# 计算给定边界框的面积
box_area = box[2] * box[3]
# 计算其他边界框的面积
other_boxes_area = np.array(other_boxes)[:, 2] * np.array(other_boxes)[:, 3]
# 计算IoU值
iou = intersection_area / (box_area + other_boxes_area - intersection_area)
return iou
def custom_NMSBoxes(boxes, scores, confidence_threshold, iou_threshold):
# 如果没有边界框,则直接返回空列表
if len(boxes) == 0:
return []
# 将得分和边界框转换为NumPy数组
scores = np.array(scores)
boxes = np.array(boxes)
# 根据置信度阈值过滤边界框
mask = scores > confidence_threshold
filtered_boxes = boxes[mask]
filtered_scores = scores[mask]
# 如果过滤后没有边界框,则返回空列表
if len(filtered_boxes) == 0:
return []
# 根据置信度得分对边界框进行排序
sorted_indices = np.argsort(filtered_scores)[::-1]
# 初始化一个空列表来存储选择的边界框索引
indices = []
# 当还有未处理的边界框时,循环继续
while len(sorted_indices) > 0:
# 选择得分最高的边界框索引
current_index = sorted_indices[0]
indices.append(current_index)
# 如果只剩一个边界框,则结束循环
if len(sorted_indices) == 1:
break
# 获取当前边界框和其他边界框
current_box = filtered_boxes[current_index]
other_boxes = filtered_boxes[sorted_indices[1:]]
# 计算当前边界框与其他边界框的IoU
iou = calculate_iou(current_box, other_boxes)
# 找到IoU低于阈值的边界框,即与当前边界框不重叠的边界框
non_overlapping_indices = np.where(iou <= iou_threshold)[0]
# 更新sorted_indices以仅包含不重叠的边界框
sorted_indices = sorted_indices[non_overlapping_indices + 1]
# 返回选择的边界框索引
return indices
def draw_detections(img, box, score, class_id):
"""
在输入图像上绘制检测到的对象的边界框和标签。
参数:
img: 要在其上绘制检测结果的输入图像。
box: 检测到的边界框。
score: 对应的检测得分。
class_id: 检测到的对象的类别ID。
返回:
无
"""
# 提取边界框的坐标
x1, y1, w, h = box
# 根据类别ID检索颜色
color = color_palette[class_id]
# 在图像上绘制边界框
cv2.rectangle(img, (int(x1), int(y1)), (int(x1 + w), int(y1 + h)), color, 2)
# 创建标签文本,包括类名和得分
label = f'{classes[class_id]}: {score:.2f}'
# 计算标签文本的尺寸
(label_width, label_height), _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
# 计算标签文本的位置
label_x = x1
label_y = y1 - 10 if y1 - 10 > label_height else y1 + 10
# 绘制填充的矩形作为标签文本的背景
cv2.rectangle(img, (label_x, label_y - label_height), (label_x + label_width, label_y + label_height), color, cv2.FILLED)
# 在图像上绘制标签文本
cv2.putText(img, label, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA)
def preprocess(img, input_width, input_height):
"""
在执行推理之前预处理输入图像。
返回:
image_data: 为推理准备好的预处理后的图像数据。
"""
# 获取输入图像的高度和宽度
img_height, img_width = img.shape[:2]
# 将图像颜色空间从BGR转换为RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# 将图像大小调整为匹配输入形状
img = cv2.resize(img, (input_width, input_height))
# 通过除以255.0来归一化图像数据
image_data = np.array(img) / 255.0
# 转置图像,使通道维度为第一维
image_data = np.transpose(image_data, (2, 0, 1)) # 通道首
# 扩展图像数据的维度以匹配预期的输入形状
image_data = np.expand_dims(image_data, axis=0).astype(np.float32)
# 返回预处理后的图像数据
return image_data, img_height, img_width
def postprocess(input_image, output, input_width, input_height, img_width, img_height):
"""
对模型输出进行后处理,提取边界框、得分和类别ID。
参数:
input_image (numpy.ndarray): 输入图像。
output (numpy.ndarray): 模型的输出。
input_width (int): 模型输入宽度。
input_height (int): 模型输入高度。
img_width (int): 原始图像宽度。
img_height (int): 原始图像高度。
返回:
numpy.ndarray: 绘制了检测结果的输入图像。
"""
# 转置和压缩输出以匹配预期的形状
outputs = np.transpose(np.squeeze(output[0]))
# 获取输出数组的行数
rows = outputs.shape[0]
# 用于存储检测的边界框、得分和类别ID的列表
boxes = []
scores = []
class_ids = []
# 计算边界框坐标的缩放因子
x_factor = img_width / input_width
y_factor = img_height / input_height
# 遍历输出数组的每一行
for i in range(rows):
# 从当前行提取类别得分
classes_scores = outputs[i][4:]
# 找到类别得分中的最大得分
max_score = np.amax(classes_scores)
# 如果最大得分高于置信度阈值
if max_score >= confidence_thres:
# 获取得分最高的类别ID
class_id = np.argmax(classes_scores)
# 从当前行提取边界框坐标
x, y, w, h = outputs[i][0], outputs[i][1], outputs[i][2], outputs[i][3]
# 计算边界框的缩放坐标
left = int((x - w / 2) * x_factor)
top = int((y - h / 2) * y_factor)
width = int(w * x_factor)
height = int(h * y_factor)
# 将类别ID、得分和框坐标添加到各自的列表中
class_ids.append(class_id)
scores.append(max_score)
boxes.append([left, top, width, height])
# 应用非最大抑制过滤重叠的边界框
indices = custom_NMSBoxes(boxes, scores, confidence_thres, iou_thres)
# 遍历非最大抑制后的选定索引
for i in indices:
# 根据索引获取框、得分和类别ID
box = boxes[i]
score = scores[i]
class_id = class_ids[i]
# 在输入图像上绘制检测结果
draw_detections(input_image, box, score, class_id)
# 返回修改后的输入图像
return input_image
def init_detect_model(model_path):
# 使用ONNX模型文件创建一个推理会话,并指定执行提供者
session = ort.InferenceSession(model_path, providers=providers)
# 获取模型的输入信息
model_inputs = session.get_inputs()
# 获取输入的形状,用于后续使用
input_shape = model_inputs[0].shape
# 从输入形状中提取输入宽度
input_width = input_shape[2]
# 从输入形状中提取输入高度
input_height = input_shape[3]
# 返回会话、模型输入信息、输入宽度和输入高度
return session, model_inputs, input_width, input_height
def detect_object(image, session, model_inputs, input_width, input_height):
# 如果输入的图像是PIL图像对象,将其转换为NumPy数组
if isinstance(image, Image.Image):
result_image = np.array(image)
else:
# 否则,直接使用输入的图像(假定已经是NumPy数组)
result_image = image
# 预处理图像数据,调整图像大小并可能进行归一化等操作
img_data, img_height, img_width = preprocess(result_image, input_width, input_height)
# 使用预处理后的图像数据进行推理
outputs = session.run(None, {model_inputs[0].name: img_data})
# 对推理结果进行后处理,例如解码检测框,过滤低置信度的检测等
output_image = postprocess(result_image, outputs, input_width, input_height, img_width, img_height)
return output_image
if __name__ == '__main__':
model_path = "yolov8n.onnx"
session, model_inputs, input_width, input_height = init_detect_model(model_path)
image_data = cv2.imread("test.png")
result_image = detect_object(image_data, session, model_inputs, input_width, input_height)
cv2.imwrite("result_image.png", result_image)
cv2.imshow('Output', result_image)
cv2.waitKey(0)