基于YOLO11实例分割与奥比中光相机的快递包裹抓取点检测-EW帮帮网

本博客来源于CSDN机器鱼，未同意任何人转载。

更多内容，欢迎点击本专栏，查看更多内容。

0 引言

1 奥比中光相机测试

1.1 SDK相机初体验

1.2 将SDK的Example改成所需要程序

2 YOLO11S-SEG的onnxruntime推理

3 结语

0 引言

项目采用六轴机械臂搭配末端真空吸盘，从无序包裹中抓取想要的包裹。AI算法需要提供各包裹的抓取点的3D坐标与3D姿态。由于快递包裹含有多个面，且大多为倾斜状态，为了顺利抓取，我们的算法需要如下几个步骤：

①从彩色图的包裹堆里面识别、分割出每个单独的包裹，得到2D掩码Mask1；

②从对齐的深度图中提取每个包裹Mask1对应的深度掩码Mask2；

③根据相机参数，计算Mask2对应的三维点云DisMask；

④将DisMask中的异常点云值剔除后做平面分割，只要含点云数量最多的那个平面；

⑤求平面的质心作为抓取点，求平面的法向量与x、y、z轴的角度为末端抓取姿态。

最终得到6个值返回给下位机做机械臂的控制。

本次是在windows上进行开发，我用到的软件与SDK有：cmak-3.28.1，vs2019，opencv-4.6.1，open3d-0.18.0，orbbecsdk1.10.18、cuda-11.6、cudnn-8.5.0、onnxruntime-gpu-1.14.0（原本依旧用tensorrt的毕竟快，但是我之前已经出过yolov8-seg的tensorrt推理博客了，而yolov11-seg的可以直接用，再写一遍没啥意思，所以改成onnxruntime-gpu进行推理，实测比tensorrt差不到很多，还可以省掉onnx转tensorrt模型的步骤，巴适的板。我的PC显卡是3060，yolo11s-seg的推理速度是11ms，而onnxruntime的速度是24ms左右）。

模型用的ultralytics最新版训练的yolo11s-seg，训练好之后转好的onnx模型与训练数据可以在【这里】下载，训练方法网上到处都有，不是此博客的重点。

1 奥比中光相机测试

1.1 SDK相机初体验

以前做这种对精度要求不那么高项目用的都是intel的realsense系列，但价格越来越贵了，加上奥博中光天天网上吹牛批，所以某鱼买了一款二手的GeminiPro相机，是奥比中光与轮趣科技一同研发的。

首先配置奥比中光相机的驱动与SDK，必须要装驱动才能使用，下载地址：地址1。因为我采用的win11，所以下载的win_x64这个版本。

解压后的目录为D:/Project/OrbbecSDK/（这是我放置的目录），也可以换其他路径存放。有个driver的文件夹，里面的exe无脑安装即可。

安装完成后，打开CMAKE_GUI与VS2019，CMAKE我采用的版本是3.28.1。首先打开CMakeLists.txt将OpenCV_DIR的路径改成自己的实际路径，关于opencv的安装编译网上有很多，这里就不赘述了。

https://github.com/orbbec/OrbbecSDK/releases然后打开cmakegui按照下列步骤配置并生成vs的工程文件，生成好之后打开，最后ALL_BUILD。

ALL_BUILD右键生成之后会在bin目录下生成很多exe文件，插上相机、运行color_viewer有画面就代表已经可以正确使用SDK了，下面我们对其进行改写，得到我们想要的程序。

1.2 将SDK的Example改成所需要程序

在我的案例中，对于SDK需要有以下几个功能：

①设定参数与初始化相机；②获取彩色图与深度图；③对彩色图与深度图进行对齐；④传入检测得到的box与mask计算3d点云；⑤点云处理得到三维抓取点与三维姿态。

首先建立一个camera.h，定义需要的函数，代码如下：

#ifndef CAMERAUTIL_H
#define CAMERAUTIL_H

#include "libobsensor/ObSensor.hpp"
#include "libobsensor/hpp/Pipeline.hpp"
#include <Eigen/Dense>
#include <open3d/Open3D.h>
#include "opencv2/opencv.hpp"
#include <fstream>
#include <iostream>
#include <cmath>
#include <vector>
class Orbbec {
public:
	bool Init(void);//初始化相机
	bool GetRgbAndDepth(cv::Mat& src, cv::Mat& dep);//获取当前帧，并进行深度图与彩色图的对齐
	void GetDistance(int x, int y);//打印提取某个点的x、y、z坐标
	void GetDistance(int box[4], cv::Mat& mask, cv::Mat& dis_mask);//提取某个box中mask对应的所有点的三维点云，并组成dis_mask
	void ProcessPointcloud(cv::Mat& dis_mask, cv::Mat imgcrop, std::vector<float>& result, float *ylength, float *xlength);//对点云进行后处理得到抓取坐标与抓取姿态
	void Postprocess(int box[4], cv::Mat& mask, cv::Mat srcimg, float result[10]);//对提取的dis_mask进行后处理,GetDistance与ProcessPointcloud
	//都是后处理需要用到的函数
	void close(void);
private:
	ob::Pipeline pipe;
	OBCameraParam cameraParam;
	cv::Mat depth;
	float scale;
	ob::Align* align=nullptr;
};
#endif

然后新建一个camera.cpp，orbbec的函数进行实现，代码如下：

#include "camera.h"

bool Orbbec::Init(void) {
    // Query the list of connected devices
    // Create a Context.
    ob::Context ctx;
    auto devList = ctx.queryDeviceList();
    // Get the number of connected devices
    if (devList->deviceCount() == 0) {
        std::cerr << "Device not found!" << std::endl;
        return false;
    }
    
    // Configure which streams to enable or disable for the Pipeline by creating a Config
    std::shared_ptr<ob::Config> config = std::make_shared<ob::Config>();

    // Get all stream profiles of the color camera, including stream resolution, frame rate, and frame format
    auto colorProfiles = pipe.getStreamProfileList(OB_SENSOR_COLOR);
    // Get the default color profile
    //auto colorProfile = colorProfiles->getProfile(OB_PROFILE_DEFAULT);
    std::shared_ptr<ob::VideoStreamProfile> colorProfile = nullptr;
    colorProfile = colorProfiles->getVideoStreamProfile(640, 480, OB_FORMAT_RGB, 30);
    config->enableStream(colorProfile);

    // Get all stream profiles of the depth camera, including stream resolution, frame rate, and frame format
    auto depthProfiles = pipe.getStreamProfileList(OB_SENSOR_DEPTH);
    // Get the default depth profile 
     //auto depthProfile = depthProfiles->getProfile(OB_PROFILE_DEFAULT);
    std::shared_ptr<ob::VideoStreamProfile> depthProfile = nullptr;
    depthProfile = depthProfiles->getVideoStreamProfile(640, OB_HEIGHT_ANY, OB_FORMAT_Y11, 30);
    config->enableStream(depthProfile);

    // Configure the alignment mode as hardware D2C alignment
    config->setAlignMode(ALIGN_D2C_HW_MODE);

    align = new ob::Align(OB_STREAM_COLOR);


    try {
        pipe.start(config);
        cameraParam = pipe.getCameraParam();
    }
    catch (ob::Error& e) {
        std::cerr << "function:" << e.getName() << "\nargs:" << e.getArgs() << "\nmessage:" << e.getMessage() << "\ntype:" << e.getExceptionType() << std::endl;
        return false;
    }
    return true;
};

bool Orbbec::GetRgbAndDepth(cv::Mat& src, cv::Mat& dep) {
    auto frameset = pipe.waitForFrames(100);
    if (frameset == nullptr || frameset->depthFrame() == nullptr || frameset->colorFrame() == nullptr) {
        return false;
    }
    auto newFrame = align->process(frameset);
    auto newFrameSet = newFrame->as<ob::FrameSet>();
    auto colorFrame = newFrameSet->colorFrame();
    auto depthFrame = newFrameSet->depthFrame();
    uint32_t  width = colorFrame->width();
    uint32_t  height = colorFrame->height();
    cv::Mat rawMat(height, width, CV_8UC3, (uint16_t*)colorFrame->data(), cv::Mat::AUTO_STEP);
    cv::cvtColor(rawMat, src, cv::COLOR_RGB2BGR);

    width = depthFrame->width();
    height = depthFrame->height();
    scale = depthFrame->getValueScale();
    uint16_t* data = (uint16_t*)depthFrame->data();
    cv::Mat depimg(height, width, CV_16U, data, cv::Mat::AUTO_STEP);
    depimg.copyTo(depth);
    // 将深度图转换为8位图像，进行可视化
    cv::Mat depth_image_8bit;
    double minVal, maxVal;
    cv::Point minLoc, maxLoc;
    cv::minMaxLoc(depimg, &minVal, &maxVal, &minLoc, &maxLoc);
    cv::convertScaleAbs(depimg, depth_image_8bit, 255.0 / maxVal);
    cv::applyColorMap(depth_image_8bit, dep, cv::COLORMAP_JET);
    //float centerDistance = data[width * height / 2 + width / 2] * scale;
    return true;
};
void Orbbec::GetDistance(int x,int y) {
    float Z = (float)depth.at<uint16_t>(y,x) * scale;
    float X= (x - cameraParam.depthIntrinsic.cx) * Z / cameraParam.depthIntrinsic.fx;
    float Y= (y - cameraParam.depthIntrinsic.cy) * Z / cameraParam.depthIntrinsic.fy;

    printf("X,Y,Z:%f,%f.%f\n", X, Y, Z);
}

void Orbbec::GetDistance(int box[4], cv::Mat& mask, cv::Mat& dis_mask) {
    for (int i = box[0]; i < box[2]; i++) {
        for (int j = box[1]; j < box[3]; j++) {
            if (mask.at<uchar>(j - box[1], i - box[0]) != 0){
                float upoint[3];
                upoint[2] = (float)depth.at<uint16_t>(j, i)* scale/1000.f;//转换成米
                upoint[0] = (i - cameraParam.depthIntrinsic.cx) * upoint[2] / cameraParam.depthIntrinsic.fx;
                upoint[1] = (j - cameraParam.depthIntrinsic.cy) * upoint[2] / cameraParam.depthIntrinsic.fy;
                dis_mask.at<cv::Vec3f>(j - box[1], i - box[0]) = cv::Vec3f(upoint[0], upoint[1], upoint[2]);
                if (upoint[2] == 0.0f)
                {
                    mask.at<uchar>(j - box[1], i - box[0]) = 0; // 深度值为0的地方 mask也改成0
                }
            }
        }
    }
}

void Orbbec::Postprocess(int box[4], cv::Mat& mask, cv::Mat srcimg,float result[10]) {
    cv::Rect r = cv::Rect(box[0], box[1], box[2] - box[0], box[3] - box[1]);
    cv::Mat imgcrop = srcimg(r).clone();
    mask = mask(r).clone();
    cv::Mat dis_mask = cv::Mat::zeros(r.height,r.width, CV_32FC3);
    GetDistance(box, mask, dis_mask);
    //cv::Mat dis_show;
    //cv::convertScaleAbs(dis_mask, dis_show, 255.0 / 1000);
    //cv::applyColorMap(dis_show, dis_show, cv::COLORMAP_JET);
    //cv::imshow("dismask", dis_show);
    
    // 利用open3d进行点云的后处理
    // segment plane分割平面，因为快递分割结果里面会含有侧面，因此先分割，取最大的那个面，然后提取这个面的中点作抓取点
    // 以及该面的法向量
    float ylength = 0, xlength = 0;
    std::vector<float> Normals_and_Axis;
    ProcessPointcloud(dis_mask, imgcrop, Normals_and_Axis, &ylength, &xlength);
    
    // 通过抓取点反算彩色图上的抓取点，用来观察是否抓对点
    float CX = Normals_and_Axis[3] * cameraParam.depthIntrinsic.fx / Normals_and_Axis[5] + cameraParam.depthIntrinsic.cx;
    float CY = Normals_and_Axis[4] * cameraParam.depthIntrinsic.fy / Normals_and_Axis[5] + cameraParam.depthIntrinsic.cy;
    // 通过点云算出来的包裹长宽来反算彩色图中的2D最小包围框,用于观察是否抓准平面
    float Width = 0, Height = 0;
    if (Normals_and_Axis[5] > 0) {//只计算有合理深度值的
        //根据xlength与ylength，反算框的wh，与XY形成抓取面的2d最小包围框
        Width = xlength * cameraParam.depthIntrinsic.fx / Normals_and_Axis[5];
        Height = ylength * cameraParam.depthIntrinsic.fy / Normals_and_Axis[5];
    }
    for (int i = 0; i < 6; i++) {
        result[i] = Normals_and_Axis[i];
    }
    result[6] = CX;
    result[7] = CY;
    result[8] = Width;
    result[9] = Height;



}
void Orbbec::ProcessPointcloud(cv::Mat& dis_mask, cv::Mat image, std::vector<float>& result, float* ylength, float* xlength)
{
	// //dis_mask to pointcloud
	auto outputPC = std::make_shared<open3d::geometry::PointCloud>();
	int num_points = dis_mask.cols * dis_mask.rows;
	for (int i = 0; i < dis_mask.rows; i++)
	{
		for (int j = 0; j < dis_mask.cols; j++)
		{
			Eigen::Vector3d pc = { dis_mask.at<cv::Vec3f>(i, j)[0], dis_mask.at<cv::Vec3f>(i, j)[1], dis_mask.at<cv::Vec3f>(i, j)[2] };
			Eigen::Vector3d clr = { (float)image.at<cv::Vec3b>(i,j)[2] / 255.0,
								   (float)image.at<cv::Vec3b>(i,j)[1] / 255.0,
								   (float)image.at<cv::Vec3b>(i,j)[0] / 255.0 };
			outputPC->points_.push_back(pc);
			outputPC->colors_.push_back(clr);
		}
	}
	//open3d::visualization::DrawGeometries({ outputPC}, "outputPC", 800, 450);
	//异常值剔除，剔除z距离为0的无效点云
	std::vector<size_t> indices;
	for (size_t i = 0; i < outputPC->points_.size(); i++) {
		if (outputPC->points_[i][2] == 0.0) {
			indices.push_back(i);
		}
	}
	outputPC = outputPC->SelectByIndex(indices, true);
	if (outputPC->points_.size() <= 3) {
		result = { 0,0,0,0,0,0,0 };
		printf("points size too small\n");
		return;
	}
    // 平面分割 取出来最大的平面
	auto start = std::chrono::system_clock::now();
	double tDis = 0.01; // 邻域距离 
	int minNum = 3;    // 最小点云数
	int numIter = 200; // 迭代数  
	std::tuple<Eigen::Vector4d, std::vector<size_t>> vRes = outputPC->SegmentPlane(tDis, minNum, numIter);
	// 平面参数
	// AX+BY+CZ+D=0  ==> Z=-(AX+BY+D)/C
	Eigen::Vector4d para = std::get<0>(vRes);
	//printf("Segment plane:%.3fX+%.3fY+%.3fZ+%.3f=0\n",para[0],para[1],para[2],para[3]);
	// 内点索引
	std::vector<size_t> selectedIndex = std::get<1>(vRes);
	std::shared_ptr<open3d::geometry::PointCloud> inPC = outputPC->SelectByIndex(selectedIndex, false);
	//open3d::visualization::DrawGeometries({ inPC}, "inPC", 800, 450);

	// 点云质心作为距离坐标
	Eigen::Vector3d Axis;
	Axis = inPC->GetCenter();
	// 计算法向量z夹角
	Eigen::Vector3d A(para[0], para[1], para[2]), X(-1, 0, 0), Y(0, -1, 0), Z(0, 0, 1);
	double angleX0, angleX, angleY, angleZ;
	double PI = 3.1415926;
	angleZ = std::acos(A.dot(Z) / (A.norm() * Z.norm())) * 180 / PI;
	angleX0 = std::acos(A.dot(X) / (A.norm() * X.norm())) * 180 / PI;
	angleY = std::acos(A.dot(Y) / (A.norm() * Y.norm())) * 180 / PI;
	angleX = std::atan2(A[1], A[0]) * 180 / PI;
	if (angleZ > 90) {

		angleZ = 180 - angleZ;
		X = { 1, 0, 0 };
		Y = { 0, 1, 0 };
		angleX0 = std::acos(A.dot(X) / (A.norm() * X.norm())) * 180 / PI;
		angleY = std::acos(A.dot(Y) / (A.norm() * Y.norm())) * 180 / PI;
		angleX = std::atan2(-A[1], A[0]) * 180 / PI;
		
	}
	//printf("angle:%f,%f,%f,xplane:%f\n", angleX0, angleY, angleZ, angleX);
	//angleX的角度是与法向量投影到xoy平面的并于x负方向的夹角,顺时针是0-180，逆时针是0-负180
    //
	
	result.push_back(angleX0);
	result.push_back(angleY);
	result.push_back(angleZ);
	result.push_back(Axis[0]);
	result.push_back(Axis[1]);
	result.push_back(Axis[2]);
	
	auto end = std::chrono::system_clock::now();
	//std::cout << "open3d process time: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
	// // 绘制分割出来的平面模型
	Eigen::Vector3d min_pt, max_pt;
	min_pt = inPC->GetMinBound();
	max_pt = inPC->GetMaxBound();
	double Xmin = min_pt[0], Ymin = min_pt[1], Zmin = min_pt[2];
	double Xmax = max_pt[0], Ymax = max_pt[1], Zmax = max_pt[2];
	*ylength = (float)Ymax - (float)Ymin;
	*xlength = (float)Xmax - (float)Xmin;
	//展示结果
	if (0) {
		// 外点   绿色
		std::shared_ptr<open3d::geometry::PointCloud> outPC = outputPC->SelectByIndex(selectedIndex, true);
		const Eigen::Vector3d colorOut = { 0,1,0 };
		outPC->PaintUniformColor(colorOut);
		// open3d::visualization::DrawGeometries({ outPC}, "outPC", 800, 450);

		auto Plane = std::make_shared<open3d::geometry::PointCloud>();
		int width = 20, height = 20;
		for (int i = 0; i < width; i++) {
			double x = Xmin + (Xmax - Xmin) / width * i;
			for (int j = 0; j < height; j++) {
				double y = Ymin + (Ymax - Ymin) / height * j;
				double z = -(para[0] * x + para[1] * y + para[3]) / para[2];
				Eigen::Vector3d pc = { x, y, z };
				Plane->points_.push_back(pc);
			}
		}

		const Eigen::Vector3d color = { 1,0,1 };
		Plane->PaintUniformColor(color);

		// 夹爪方向
		auto lineCloud = std::make_shared<open3d::geometry::PointCloud>();
		lineCloud->points_.resize(20);
		for (size_t i = 0; i < lineCloud->points_.size(); ++i)
		{
			lineCloud->points_[i] = Axis - i * 0.01 * A;
		}
		lineCloud->PaintUniformColor(color);

		// 抓取点
		auto mesh = open3d::geometry::TriangleMesh::CreateSphere(0.01);
		mesh->ComputeVertexNormals();
		mesh->PaintUniformColor(Eigen::Vector3d(0.1, 0.1, 0.1));
		mesh->Translate(Axis);
		//圆心坐标
		auto mesh1 = open3d::geometry::TriangleMesh::CreateCoordinateFrame(0.1);
		mesh1->ComputeVertexNormals();
		//旋转到相机方向 方便我们观看
		Eigen::Matrix4d transform;
		transform << 1.0, 0.0, 0.0, 0.0,
			0.0, -1.0, 0.0, 0.0,
			0.0, 0.0, -1.0, 0.0,
			0.0, 0.0, 0.0, 1.0;
		inPC->Transform(transform);
		outPC->Transform(transform);
		lineCloud->Transform(transform);
		Plane->Transform(transform);

		mesh->Transform(transform);
		mesh1->Transform(transform);

		// open3d::io::WritePointCloudToPCD("inPC.pcd", *inPC,false);
		// open3d::io::WritePointCloudToPCD("outPC.pcd", *outPC,false);
		// open3d::io::WritePointCloudToPCD("lineCloud.pcd", *lineCloud,false);
		// open3d::io::WritePointCloudToPCD("Plane.pcd", *Plane,false);

		open3d::visualization::DrawGeometries({ inPC,outPC,lineCloud,Plane,mesh,mesh1 }, "Result", 800, 450);
	}
}


void Orbbec::close(void) {
    pipe.stop();
};

再新建一个主函数，用来调用上面的函数，名字就叫main1_testcamera.cpp写，主体代码如下所示：

#include "camera.h"
// #include "opencv2/opencv.hpp"
#include <fstream>
#include <iostream>
#include <cmath>
#include <conio.h>
using namespace std;

int main(int argc, char** argv) try {
    Orbbec camera;
    camera.Init();
    while (true)  {
        cv::Mat src, dep;
        if (!camera.GetRgbAndDepth(src, dep)) {
            continue;
        }
        cv::Mat show=src.clone();
        //cv::addWeighted(src,0.5,dep,0.8,0.5,show);
        int box[4] = { 280,200,360,280};//检测框的xmin,ymin,xmax,ymax
        cv::Mat mask = cv::Mat::ones(cv::Size(src.cols, src.rows), CV_8UC1);
        float result[10] = {0};
        camera.Postprocess(box, mask, src, result);
        printf("ax,ay,az,x,y,z,:%f,%f,%f,%f,%f,%f\n", result[0], result[1], result[2], result[3], result[4], result[5]);
        
        cv::Point p1((int)(0.5 * box[0] + 0.5 * box[2]), (int)(0.5 * box[1] + 0.5 * box[3]));
        cv::circle(show, p1, 3, cv::Scalar(0, 255, 255), 3);//center point
        cv::Point p2((int)result[6], (int)result[7]);
        cv::circle(show, p2, 3, cv::Scalar(0, 0, 255), 3);//pick point   
        cv::Rect rdraw0 = cv::Rect((int)box[0], (int)box[1], (int)(box[2]-box[0]), (int)(box[3] - box[1]));
        cv::rectangle(show, rdraw0, cv::Scalar(0, 255, 255), 2);//detect box
        cv::Rect rdraw1 = cv::Rect((int)(result[6] - 0.5 * result[8]),(int)(result[7] - 0.5 * result[9]),
            (int)result[8], (int)result[9]);
        cv::rectangle(show, rdraw1, cv::Scalar(0, 0, 255), 2);//postprocess box
        printf("-------------------------------------------------\n");
        cv::imshow("show", show);
        char c = cv::waitKey(1);
        if (c == 27) {
            cv::destroyAllWindows();
            break;
        }
    }
    camera.close();
    return 0;
}
catch (ob::Error& e) {
    std::cerr << "function:" << e.getName() << "\nargs:" << e.getArgs() << "\nmessage:" << e.getMessage() << "\ntype:" << e.getExceptionType() << std::endl;
    exit(EXIT_FAILURE);
}

最后，新建一个CMakeLists.txt配置我们的环境，如下所示，需要根据实际情况修改OpenCV_DIR、OrbbecSDK_ROOT_DIR、Open3D_DIR，

cmake_minimum_required(VERSION 3.10)

project(orbbec)

add_definitions(-std=c++11)
add_definitions(-DAPI_EXPORTS)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Release)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${PROJECT_SOURCE_DIR}/bin)

include_directories(include/)
link_directories(lib/) #添加特定的库文件搜索路径 相当于g++的 -L参数


include_directories(${PROJECT_SOURCE_DIR}/src/)
file(GLOB_RECURSE SRCS ${PROJECT_SOURCE_DIR}/src/*.cpp)

#opencv
Set(OpenCV_DIR D:/Project/opencv/build/)
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories(${OpenCV_DIR}/include/)
link_directories(${OpenCV_DIR}/x64/vc15/lib/)

#orbbec
set(OrbbecSDK_ROOT_DIR D:/Project/OrbbecSDK/SDK)
set(OrbbecSDK_LIBRARY_DIRS ${OrbbecSDK_ROOT_DIR}/lib)
set(OrbbecSDK_INCLUDE_DIR ${OrbbecSDK_ROOT_DIR}/include)
include_directories(${OrbbecSDK_INCLUDE_DIR})
link_directories(${OrbbecSDK_LIBRARY_DIRS})

#open3d
Set(Open3D_DIR D:/Project/open3d-0.18.0/CMake/)
find_package( Open3D  REQUIRED)
include_directories(${Open3D_INCLUDE_DIRS})
link_directories(${Open3D_LIBRARY_DIRS})

add_executable(test main1_test_camera.cpp ${SRCS})
target_link_libraries(test OrbbecSDK ${OpenCV_LIBS})
target_link_libraries(test ${Open3D_LIBRARIES})
target_include_directories(test PUBLIC ${Open3D_INCLUDE_DIRS})

最后，利用Cmake-gui与vs2019生成可执行文件即可，运行程序时候将这3个dll文件放在test.exe的同级目录即可。

2 YOLO11S-SEG的onnxruntime推理

直接上代码，首先建立Segment.h，用来定义推理用到的函数，如下

#ifndef SEGMENT_H
#define SEGMENT_H
#include <fstream>
#include <sstream>
#include <iostream>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
//#include <cuda_provider_factory.h>
#include <onnxruntime_cxx_api.h>
#include <chrono>
using namespace std;
using namespace Ort;
struct BoxInfo
{
	float x;
	float y;
	float w;
	float h;
	float score;
	int label;
	float mask[32];
};

struct Net_config
{
	float confThreshold; // Confidence threshold
	float nmsThreshold;  // Non-maximum suppression threshold
	std::string modelpath;
	std::string labelpath;
};


constexpr static int num_class = 1;//我们只有一类就是1，多类就要改这儿
const static int kOutputSize = num_class + 4 + 32;
class YOLOV8
{
public:
	YOLOV8(Net_config config);//加载模型
	void preprocess_img(cv::Mat& img, cv::Mat& dis, int input_w, int input_h);//将图像letterbox resize到640*640
	void normalize_(cv::Mat img);//除以255，归一化，同时将rgbrgbrgb转成rrrgggbbb
	void detect(cv::Mat& frame, std::vector<BoxInfo>& generate_boxes, std::vector<cv::Mat>& masks);//输入图像进行检测与后处理
	//得到每个目标的检测框与每个目标的mask1
	float iou(float lbox[4], float rbox[4]);//iou计算
	void nms(vector<BoxInfo>& input_boxes);//nms

	std::vector<cv::Mat> process_mask(const float* proto, int proto_size, std::vector<BoxInfo>& dets, cv::Mat img);//后处理得到mask
	cv::Rect get_downscale_rect(float bbox[4], float scale);//将640*640输入的box尺寸下采样到160*160的特征图上
	cv::Mat scale_mask(cv::Mat mask, cv::Mat img); // 将640 * 640输入得到的掩码还原到原图上, 很好理解，就跟processimg反着操作
	cv::Rect get_rect(cv::Mat& img, float bbox[4]);// 将640 * 640输入得到的box还原到原图上, 很好理解，就跟processimg反着操作
	
	

private:
	int inpWidth;
	int inpHeight;
	int nout;
	int num_proposal;
	std::vector<string> class_names;
	float confThreshold;
	float nmsThreshold;
	std::vector<float> input_image_;//ONNXRUMTIME接收的数据是一个vector
	
	Env env = Env(ORT_LOGGING_LEVEL_ERROR, "YOLOV8");
	Ort::Session* ort_session = nullptr;
	SessionOptions sessionOptions = SessionOptions();
	std::vector<char*> input_names;
	std::vector<char*> output_names;
	std::vector<vector<int64_t>> input_node_dims; // >=1 outputs
	std::vector<vector<int64_t>> output_node_dims; // >=1 outputs
	const std::vector<uint32_t> colors = { 0xFF3838, 0xFF9D97, 0xFF701F, 0xFFB21D, 0xCFD231, 0x48F90A,
										0x92CC17, 0x3DDB86, 0x1A9334, 0x00D4BB, 0x2C99A8, 0x00C2FF,
										0x344593, 0x6473FF, 0x0018EC, 0x8438FF, 0x520085, 0xCB38FF,
										0xFF95C8, 0xFF37C7 };
};
#endif

然后新建Segment.cpp开具体实现这些函数，代码如下：

#include "Segment.h"

YOLOV8::YOLOV8(Net_config config)
{
	confThreshold = config.confThreshold;
	nmsThreshold = config.nmsThreshold;
	string classesFile = config.labelpath;
	string model_path = config.modelpath;
	std::wstring widestr = std::wstring(model_path.begin(), model_path.end());
	OrtStatus* status = OrtSessionOptionsAppendExecutionProvider_CUDA(sessionOptions, 0);
	sessionOptions.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);
	ort_session = new Session(env, widestr.c_str(), sessionOptions);
	size_t numInputNodes = ort_session->GetInputCount();
	size_t numOutputNodes = ort_session->GetOutputCount();

	AllocatorWithDefaultOptions allocator;
	for (int i = 0; i < numInputNodes; i++)
	{
		AllocatedStringPtr input_name_Ptr = ort_session->GetInputNameAllocated(i, allocator);
		input_names.push_back(input_name_Ptr.get());

		Ort::TypeInfo input_type_info = ort_session->GetInputTypeInfo(i);
		auto input_tensor_info = input_type_info.GetTensorTypeAndShapeInfo();
		auto input_dims = input_tensor_info.GetShape();
		input_node_dims.push_back(input_dims);
	}
	for (int i = 0; i < numOutputNodes; i++)
	{
		AllocatedStringPtr output_name_Ptr = ort_session->GetOutputNameAllocated(i, allocator);
		output_names.push_back(output_name_Ptr.get());

		Ort::TypeInfo output_type_info = ort_session->GetOutputTypeInfo(i);
		auto output_tensor_info = output_type_info.GetTensorTypeAndShapeInfo();
		auto output_dims = output_tensor_info.GetShape();
		output_node_dims.push_back(output_dims);
	}
	inpHeight = input_node_dims[0][2];
	inpWidth = input_node_dims[0][3];
	nout = output_node_dims[0][1];
	num_proposal = output_node_dims[0][2];

	ifstream ifs(classesFile.c_str());
	string line;
	while (getline(ifs, line))  class_names.push_back(line);
	printf("classsize is:%d\n", num_class);
	assert(num_class = class_names.size());
}
cv::Rect YOLOV8::get_rect(cv::Mat& img, float bbox[4]) {
	float l, r, t, b;
	float r_w = inpWidth / (img.cols * 1.0);
	float r_h = inpHeight / (img.rows * 1.0);//inpHeight inpWidth
	if (r_h > r_w) {
		l = bbox[0] - bbox[2] / 2.f;
		r = bbox[0] + bbox[2] / 2.f;
		t = bbox[1] - bbox[3] / 2.f - (inpHeight - r_w * img.rows) / 2;
		b = bbox[1] + bbox[3] / 2.f - (inpHeight - r_w * img.rows) / 2;
		l = l / r_w;
		r = r / r_w;
		t = t / r_w;
		b = b / r_w;
	}
	else {
		l = bbox[0] - bbox[2] / 2.f - (inpWidth - r_h * img.cols) / 2;
		r = bbox[0] + bbox[2] / 2.f - (inpWidth - r_h * img.cols) / 2;
		t = bbox[1] - bbox[3] / 2.f;
		b = bbox[1] + bbox[3] / 2.f;
		l = l / r_h;
		r = r / r_h;
		t = t / r_h;
		b = b / r_h;
	}
	return cv::Rect(round(l), round(t), round(r - l), round(b - t));
}

cv::Rect YOLOV8::get_downscale_rect(float bbox[4], float scale) {
	float left = bbox[0] - bbox[2] / 2;
	float top = bbox[1] - bbox[3] / 2;
	float right = bbox[0] + bbox[2] / 2;
	float bottom = bbox[1] + bbox[3] / 2;
	left /= scale;
	top /= scale;
	right /= scale;
	bottom /= scale;
	return cv::Rect(round(left), round(top), round(right - left), round(bottom - top));
}

cv::Mat YOLOV8::scale_mask(cv::Mat mask, cv::Mat img) {
	int x, y, w, h;
	float r_w = inpWidth / (img.cols * 1.0);
	float r_h = inpHeight / (img.rows * 1.0);
	if (r_h > r_w) {
		w = inpWidth;
		h = r_w * img.rows;
		x = 0;
		y = (inpHeight - h) / 2;
	}
	else {
		w = r_h * img.cols;
		h = inpHeight;
		x = (inpWidth - w) / 2;
		y = 0;
	}
	cv::Rect r(x, y, w, h);
	cv::Mat res;
	cv::resize(mask(r), res, img.size());

	return res;
}

std::vector<cv::Mat> YOLOV8::process_mask(const float* proto, int proto_size, std::vector<BoxInfo>& dets, cv::Mat img) {
	std::vector<cv::Mat> masks;
	//proto 32x(H/4)x(W/4)
	//auto start = std::chrono::system_clock::now();
	for (size_t i = 0; i < dets.size(); i++) {
		cv::Mat mask_mat = cv::Mat::zeros(inpHeight / 4, inpWidth / 4, CV_8UC1);//160*160
		float bbox[4] = { dets[i].x, dets[i].y, dets[i].w, dets[i].h };
		auto r = get_downscale_rect(bbox, 4);//将640*640输入的box尺寸下采样到160*160的特征图上
		for (int x = r.x; x < r.x + r.width; x++) {
			for (int y = r.y; y < r.y + r.height; y++) {
				float e = 0.0f;
				for (int j = 0; j < 32; j++) {
					//1x32 x 1*32*160*160 =1*160*160 可以理解成160*160特征图上的每一个像素有32个值，要与mask的32个值分别相乘再相加
					//而1*32*160*160在内存中的存储方式是160*160 160*160 160*160 .... 因此每次乘法操作的时候都要跳过160*160个内存空间

					e += dets[i].mask[j] * proto[j * proto_size / 32 + y * mask_mat.cols + x];
				}
				e = 1.0f / (1.0f + expf(-e));
				if (e > 0.5) {
					mask_mat.at<uchar>(y, x) = 1;
				}
			}
		}
		//160*160的掩码resize到640*640上
		cv::resize(mask_mat, mask_mat, cv::Size(inpWidth, inpHeight), 0, 0, 1);
		//再将640*640的掩码还原到原图上,很好理解，就跟processimg反着操作
		cv::Mat img_mask = scale_mask(mask_mat, img);
		//将640*640的box还原到原图上,很好理解，就跟processimg反着操作
		r = get_rect(img, bbox);
		cv::Rect holeImgRect(0, 0, img.cols, img.rows);
		r = r & holeImgRect;
		dets[i].x = r.x;
		dets[i].y = r.y;
		dets[i].w = r.width;
		dets[i].h = r.height;
		masks.push_back(img_mask);


	}
	return masks;
}
void YOLOV8::normalize_(cv::Mat img)
{
	int row = img.rows;
	int col = img.cols;
	input_image_.resize(row * col * img.channels());
	// bgrbgrbgr to rrrgggbbb
	for (int c = 0; c < 3; c++)
	{
		for (int i = 0; i < row; i++)
		{
			for (int j = 0; j < col; j++)
			{
				float pix = img.ptr<uchar>(i)[j * 3 + 2 - c];
				input_image_[c * row * col + i * col + j] = pix / 255.0;
			}
		}
	}
}
float YOLOV8::iou(float lbox[4], float rbox[4]) {
	float interBox[] = {
	  (std::max)(lbox[0] - lbox[2] / 2.f , rbox[0] - rbox[2] / 2.f), //left
	  (std::min)(lbox[0] + lbox[2] / 2.f , rbox[0] + rbox[2] / 2.f), //right
	  (std::max)(lbox[1] - lbox[3] / 2.f , rbox[1] - rbox[3] / 2.f), //top
	  (std::min)(lbox[1] + lbox[3] / 2.f , rbox[1] + rbox[3] / 2.f), //bottom
	};

	if (interBox[2] > interBox[3] || interBox[0] > interBox[1])
		return 0.0f;

	float interBoxS = (interBox[1] - interBox[0]) * (interBox[3] - interBox[2]);
	return interBoxS / (lbox[2] * lbox[3] + rbox[2] * rbox[3] - interBoxS);
}
void YOLOV8::nms(vector<BoxInfo>& input_boxes)
{
	sort(input_boxes.begin(), input_boxes.end(), [](BoxInfo a, BoxInfo b) { return a.score > b.score; });

	vector<bool> isSuppressed(input_boxes.size(), false);
	for (int i = 0; i < int(input_boxes.size()); ++i)
	{
		if (isSuppressed[i]) { continue; }
		for (int j = i + 1; j < int(input_boxes.size()); ++j)
		{
			if (isSuppressed[j]) { continue; }
			float lbox[4] = { input_boxes.at(i).x,input_boxes.at(i).y,input_boxes.at(i).w,input_boxes.at(i).h };
			float rbox[4] = { input_boxes.at(j).x,input_boxes.at(j).y,input_boxes.at(j).w,input_boxes.at(j).h };
			float ovr = iou(lbox, rbox);
			if (ovr >= nmsThreshold)
			{
				isSuppressed[j] = true;
			}
		}
	}
	int idx_t = 0;
	input_boxes.erase(remove_if(input_boxes.begin(), input_boxes.end(), [&idx_t, &isSuppressed](const BoxInfo& f) { return isSuppressed[idx_t++]; }), input_boxes.end());
}
void YOLOV8::preprocess_img(cv::Mat& img, cv::Mat& dis, int input_w, int input_h) {
	int w, h, x, y;
	float r_w = input_w / (img.cols * 1.0);
	float r_h = input_h / (img.rows * 1.0);
	if (r_h > r_w) {//宽大于高
		w = input_w;
		h = r_w * img.rows;
		x = 0;
		y = (input_h - h) / 2;
	}
	else {
		w = r_h * img.cols;
		h = input_h;
		x = (input_w - w) / 2;
		y = 0;
	}
	cv::Mat re(h, w, CV_8UC3);
	cv::resize(img, re, re.size(), 0, 0, cv::INTER_LINEAR);
	cv::Mat out(input_h, input_w, CV_8UC3, cv::Scalar(128, 128, 128));
	re.copyTo(out(cv::Rect(x, y, re.cols, re.rows)));
	out.copyTo(dis);

}

void YOLOV8::detect(cv::Mat& frame, vector<BoxInfo>& generate_boxes,std::vector<cv::Mat>& masks)
{
	cv::Mat dstimg;
	preprocess_img(frame, dstimg, inpWidth, inpHeight);
	normalize_(dstimg);
	array<int64_t, 4> input_shape_{ 1, 3,  inpHeight,  inpWidth };
	auto allocator_info = MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
	Value input_tensor_ = Value::CreateTensor<float>(allocator_info, input_image_.data(), input_image_.size(), input_shape_.data(), input_shape_.size());
	// 开始推理
	const std::array<const char*, 1> inputNames = { "images" };
	const std::array<const char*, 2> outNames = { "output0","output1" };
	vector<Value> ort_outputs = ort_session->Run(Ort::RunOptions{ nullptr }, inputNames.data(), &input_tensor_, 1, outNames.data(), outNames.size());
	/generate proposals
	//vector<BoxInfo> generate_boxes;
	int n = 0, k = 0;
	const float* pdata = ort_outputs[0].GetTensorMutableData<float>();// 1 * 116 *8400
	const float* pdata1 = ort_outputs[1].GetTensorMutableData<float>();//1*32*160*160 mask propsal
	for (n = 0; n < num_proposal; n++)
	{
		float out[kOutputSize];//4+class+32
		float mask[32];
		for (int j = 0; j < nout; j++) {
			out[j] = pdata[n + num_proposal * j];
		}
		memcpy(&mask, &out[4 + num_class], 32 * sizeof(float));
		int max_ind = 0;
		float max_class_socre = 0;
		// 计算最大的置信度得分
		for (k = 0; k < num_class; k++)
		{
			if (out[k + 4] > max_class_socre)
			{
				max_class_socre = out[k + 4];
				max_ind = k;
			}
		}

		if (max_class_socre > confThreshold)
		{
			float cx = out[0];
			float cy = out[1];
			float w = out[2];
			float h = out[3];
			BoxInfo det;
			det.x = cx;
			det.y = cy;
			det.w = w;
			det.h = h;
			det.score = max_class_socre;
			det.label = max_ind;
			for (int jj = 0; jj < 32; jj++) {
				det.mask[jj] = mask[jj];
			}

			generate_boxes.push_back(det);
		}
	}
	nms(generate_boxes);
	int kOutputSize2 = 32 * (inpHeight / 4) * (inpWidth / 4);
	masks = process_mask(pdata1, kOutputSize2, generate_boxes, frame);


	// draw at original frame

	for (size_t i = 0; i < generate_boxes.size(); i++) {


		cv::Mat img_mask = masks[i];
		auto color = colors[(int)generate_boxes[i].label % colors.size()];
		auto bgr = cv::Scalar(color & 0xFF, color >> 8 & 0xFF, color >> 16 & 0xFF);
		cv::Rect r = cv::Rect((int)generate_boxes[i].x, (int)generate_boxes[i].y, (int)generate_boxes[i].w, (int)generate_boxes[i].h);
		cv::rectangle(frame, r, bgr, 2);
		// mask 画原图上
		for (int x = r.x; x < r.x + r.width; x++) {
			for (int y = r.y; y < r.y + r.height; y++) {
				float val = img_mask.at<uchar>(y, x);
				if (val == 0) continue;
				frame.at<cv::Vec3b>(y, x)[0] = frame.at<cv::Vec3b>(y, x)[0] / 2 + bgr[0] / 2;
				frame.at<cv::Vec3b>(y, x)[1] = frame.at<cv::Vec3b>(y, x)[1] / 2 + bgr[1] / 2;
				frame.at<cv::Vec3b>(y, x)[2] = frame.at<cv::Vec3b>(y, x)[2] / 2 + bgr[2] / 2;
			}
		}

		char label[100];
		sprintf(label, "%s:%.2f", class_names[generate_boxes[i].label], generate_boxes[i].score);
		cv::putText(frame, label, cv::Point(r.x, r.y - 4), cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(128, 128, 255), 2);

	}
}

然后再建立一个主程序，叫main.cpp用来调用这些程序，代码如下：


#include <iostream>
#include <chrono>
#include <cmath>
#include <assert.h>
#include "Segment.h"
#include "camera.h"
#include <stdio.h>
using namespace std;

int main(int argc, char** argv) {
    Net_config YOLOV8_nets = { 0.9, 0.5,
            "D:/Project/inference/Orbbec_example/model/kuaidi.onnx" ,
            "D:/Project/inference/Orbbec_example/model/kuaidi.names" };
    YOLOV8 net(YOLOV8_nets);
    Orbbec camera;
    camera.Init();

    while (1){
        cv::Mat src, dep;
        if (!camera.GetRgbAndDepth(src, dep)) {
            continue;
        }
        cv::Mat show = src.clone();
        // Run inference
        auto start = std::chrono::system_clock::now();
        std::vector<BoxInfo> res;
        std::vector<cv::Mat> masks;
        net.detect(src, res, masks);
        auto end = std::chrono::system_clock::now();
        std::cout << "inference time: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
        if (res.size()==0){printf("no target detect\n");}
        cv::imshow("reult", src);
        char c = cv::waitKey(1);


        for (int i = 0; i < res.size(); i++){
            int box[4] = { (int)res[i].x, (int)res[i].y,(int)(res[i].x + res[i].w),(int)(res[i].y + res[i].h) };
            cv::Mat mask = masks[i].clone();
            float result[10] = { 0 };
            camera.Postprocess(box, mask, src, result);
            cv::Point p1((int)(0.5 * box[0] + 0.5 * box[2]), (int)(0.5 * box[1] + 0.5 * box[3]));
            cv::circle(show, p1, 3, cv::Scalar(0, 255, 255), 3);//center point
            cv::Point p2((int)result[6], (int)result[7]);
            cv::circle(show, p2, 3, cv::Scalar(0, 0, 255), 3);//pick point   
            cv::Rect rdraw0 = cv::Rect((int)box[0], (int)box[1], (int)(box[2] - box[0]), (int)(box[3] - box[1]));
            cv::rectangle(show, rdraw0, cv::Scalar(0, 255, 255), 2);//detect box
            cv::Rect rdraw1 = cv::Rect((int)(result[6] - 0.5 * result[8]), (int)(result[7] - 0.5 * result[9]),
                (int)result[8], (int)result[9]);
            cv::rectangle(show, rdraw1, cv::Scalar(0, 0, 255), 2);//postprocess box
        }
        printf("-------------------------------------------------\n");
        cv::imshow("show", show);
        c = cv::waitKey(1);
        if (c == 27) {
            cv::destroyAllWindows();
            break;
        }
    
    }
  
  camera.close(); 
  printf("exit prgram\n");
  return 0;
}

改写以下CMakeLists.txt，使其可以同时使用1.2与这一章节，如下：

cmake_minimum_required(VERSION 3.10)

project(orbbec)

add_definitions(-std=c++11)
add_definitions(-DAPI_EXPORTS)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Release)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${PROJECT_SOURCE_DIR}/bin)



include_directories(include/)
link_directories(lib/) #添加特定的库文件搜索路径 相当于g++的 -L参数


include_directories(${PROJECT_SOURCE_DIR}/src/)
file(GLOB_RECURSE SRCS ${PROJECT_SOURCE_DIR}/src/*.cpp)

#opencv
Set(OpenCV_DIR D:/Project/opencv/build/)
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories(${OpenCV_DIR}/include/)
link_directories(${OpenCV_DIR}/x64/vc15/lib/)

#orbbec
set(OrbbecSDK_ROOT_DIR D:/Project/OrbbecSDK/SDK)
set(OrbbecSDK_LIBRARY_DIRS ${OrbbecSDK_ROOT_DIR}/lib)
set(OrbbecSDK_INCLUDE_DIR ${OrbbecSDK_ROOT_DIR}/include)
include_directories(${OrbbecSDK_INCLUDE_DIR})
link_directories(${OrbbecSDK_LIBRARY_DIRS})
#open3d
Set(Open3D_DIR D:/Project/open3d-0.18.0/CMake/)
find_package( Open3D  REQUIRED)
include_directories(${Open3D_INCLUDE_DIRS})
link_directories(${Open3D_LIBRARY_DIRS})

#onnxruntinme
Set(ONNXRUNTIME_ROOT_DIR D:/Project/onnxruntime-win-x64-gpu-1.14.0/)
include_directories(${ONNXRUNTIME_ROOT_DIR}/include/)
link_directories(${ONNXRUNTIME_ROOT_DIR}/lib/) 


add_executable(test main1_test_camera.cpp ${SRCS})
target_link_libraries(test OrbbecSDK ${OpenCV_LIBS})
target_link_libraries(test ${Open3D_LIBRARIES})
target_include_directories(test PUBLIC ${Open3D_INCLUDE_DIRS})
target_link_libraries(test onnxruntime)


add_executable(main main.cpp ${SRCS})
target_link_libraries(main OrbbecSDK ${OpenCV_LIBS})
target_link_libraries(main ${Open3D_LIBRARIES})
target_include_directories(main PUBLIC ${Open3D_INCLUDE_DIRS})
target_link_libraries(main onnxruntime)

依旧基于cmake-gui与vs，生成对应的exe文件，运行exe的时候记得将下面几个dll拷贝到同一个目录下，不然会报错。