OpenVINO计算机视觉模型加速

OpenVINO计算机视觉模型加速

OpenVINO介绍

  • 计算机视觉部署框架,支持多种边缘硬件平台
  • Intel开发并开源使用的计算机视觉库
  • 支持多个场景视觉任务场景的快速演示

OpenVINO计算机视觉模型加速插图

四个主要模块:

OpenVINO计算机视觉模型加速插图1

1、开发环境搭建

安装cmake、Miniconda3、Notepad++、PyCharm、VisualStudio 2019

注意:安装Miniconda3一定要设置其自动添加环境变量,需要添加5个环境变量,手动添加很可能会漏掉,排错困难

OpenVINO计算机视觉模型加速插图2

OpenVINO计算机视觉模型加速插图3

OpenVINO计算机视觉模型加速插图4

下载OpenVINO并安装:[Download Intel® Distribution of OpenVINO™ Toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download-previous-versions.html?operatingsystem=window&distributions=webdownload&version=2021 4.2 LTS&options=offline)

安装完毕后运行测试程序

OpenVINO计算机视觉模型加速插图5

出现下面运行结果代表安装配置成功

OpenVINO计算机视觉模型加速插图6

添加OpenVINO环境变量

OpenVINO计算机视觉模型加速插图7

配置VisualStudio包含目录、库目录及附加依赖项

OpenVINO计算机视觉模型加速插图8

OpenVINO计算机视觉模型加速插图9

运行以下脚本自动获取附加依赖项

OpenVINO计算机视觉模型加速插图10

添加附加依赖项

OpenVINO计算机视觉模型加速插图11

至此,开发环境搭建完毕!

2、SDK介绍与开发流程

OpenVINO计算机视觉模型加速插图12

inference_engine.dll 推理引擎

依赖支持:inference_engine_transformations.dll, tbb.dll, tbbmalloc.dll, ngraph.dll

一定要把这些dll文件都添加到 C:/Windows/System32 中才可以正常运行OpenVINO程序

InferenceEngine相关API函数支持

  • InferenceEngine::Core
  • InferenceEngine::Blob, InferenceEngine::TBlob, InferenceEngine::NV12Blob
  • InferenceEngine::BlobMap
  • InferenceEngine::InputsDataMap, InferenceEngine::InputInfo
  • InferenceEngine::OutputsDataMap
  • InferenceEngine核心库的包装类
    • InferenceEngine::CNNNetwork
    • InferenceEngine::ExecutableNetwork
    • InferenceEngine::InferRequest

代码实现

#include 
#include 

using namespace InferenceEngine;

int main(int argc, char** argv) {

    InferenceEngine::Core ie;  //使用推理引擎获取可用的设备及cpu全称
    std::vector<:string> devices = ie.GetAvailableDevices();
    for (std::string name : devices) {
        std::cout ();
    std::cout 

效果:

OpenVINO计算机视觉模型加速插图13

3、ResNet18实现图像分类

预训练模型介绍 - ResNet18

  • 预处理图像
  • mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225],图像归一化后再减去均值,除以方差
  • 输入:NCHW = 1 * 3 * 224 * 224 (num,channels,height,width)
  • 输出格式:1 * 1000

代码实现整体步骤

  • 初始化Core ie
  • ie.ReadNetwork
  • 获取输入与输出格式并设置精度
  • 获取可执行网络并链接硬件
  • auto executable_network = ie.LoadNetwork(network, "CPU");
  • 创建推理请求
  • auto infer_request = executable_network.CreateInferRequest();
  • 设置输入数据 - 图像数据预处理
  • 推理并解析输出

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;
std::string labels_txt_file = "D:/projects/models/resnet18_ir/imagenet_classes.txt";
std::vector<:string> readClassNames();

int main(int argc, char** argv) {

    InferenceEngine::Core ie;
    std::vector<:string> devices = ie.GetAvailableDevices();
    for (std::string name : devices) {
        std::cout ();
    std::cout  labels = readClassNames();  //读取标签
    cv::Mat src = cv::imread("D:/images/messi.jpg");  //读取图像
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取resnet18网络

    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组
    std::string input_name = "";
    for (auto item : inputs) {  //auto可以自动推断变量类型
        input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto input_data = item.second;
        input_data->setPrecision(Precision::FP32);
        input_data->setLayout(Layout::NCHW);
        input_data->getPreProcess().setColorFormat(ColorFormat::RGB);
        std::cout setPrecision(Precision::FP32);
        //注意:output_data不要设置结构
        std::cout getTensorDesc().getDims()[1];  //size_t 类型表示C中任何对象所能达到的最大长度,它是无符号整数
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;
    cv::resize(src, blob_image, cv::Size(w, h));  //将输入图片大小转换为与网络输入大小一致
    blob_image.convertTo(blob_image, CV_32F);  //将输入图像转换为浮点数
    blob_image = blob_image / 255.0;
    cv::subtract(blob_image, cv::Scalar(0.485, 0.456, 0.406), blob_image);
    cv::divide(blob_image, cv::Scalar(0.229, 0.224, 0.225), blob_image);
    // HWC =》NCHW  将输入图像从HWC格式转换为NCHW格式
    float* data = static_cast(input->buffer());  //将图像放到buffer中,放入input中
    for (size_t row = 0; row (row, col)[ch];
            }
        }
    }

    infer_request.Infer();
    auto output = infer_request.GetBlob(output_name);
    //转换输出数据
    const float* probs = static_cast::value_type*>(output->buffer());
    const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
    std::cout  readClassNames() {  //读取文件

    std::vector<:string> classNames;
    std::ifstream fp(labels_txt_file);
    if (!fp.is_open()) {
        printf("could not open file...n");
        exit(-1);
    }
    std::string name;
    while (!fp.eof()) {  //eof()函数判断是否读到文件末尾
        std::getline(fp, name);  //逐行读取文件并保存在变量中
        if (name.length()) {
            classNames.push_back(name);
        }
    }
    fp.close();
    return classNames;
}

效果:

OpenVINO计算机视觉模型加速插图14

4、车辆检测与车牌识别

模型介绍

  • vehicle - license - plate - detection - varrier - 0106
  • 基于BIT-Vehicle数据集
  • 输入 1 * 3 * 300 * 300 = NCHW
  • 输出格式:[1, 1, N, 7]
  • 七个值:[image_id, label, conf, x_min, y_min, x_max, y_max]

调用流程

  • 加载模型
  • 设置输入输出
  • 构建输入
  • 执行推断
  • 解析输出
  • 显示结果

车辆及车牌检测模型下载

cd C:Program Files (x86)Intelopenvino_2021.2.185deployment_toolsopen_model_zootoolsdownloader  #以管理员身份运行cmd,切换到downloader文件夹下

python downloader.py --name vehicle-license-plate-detection-barrier-0106  #在该文件夹下执行该脚本,下载模型

出现下图代表下载成功:

OpenVINO计算机视觉模型加速插图15

将下载的模型文件移动到模型文件夹中:

OpenVINO计算机视觉模型加速插图16

车辆及车牌检测代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;

int main(int argc, char** argv) {

    InferenceEngine::Core ie;
    std::vector<:string> devices = ie.GetAvailableDevices();
    for (std::string name : devices) {
        std::cout ();
    std::cout setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout getTensorDesc().getDims()[1];  //size_t 类型表示C中任何对象所能达到的最大长度,它是无符号整数
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;
    cv::resize(src, blob_image, cv::Size(w, h));  //将输入图片大小转换为与网络输入大小一致
    //cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空间转换

    // HWC =》NCHW  将输入图像从HWC格式转换为NCHW格式
    unsigned char* data = static_cast(input->buffer());  //将图像放到buffer中,放入input中
    for (size_t row = 0; row (row, col)[ch];
            }
        }
    }

    infer_request.Infer();
    auto output = infer_request.GetBlob(output_name);
    //转换输出数据
    const float* detection_out = static_cast::value_type*>(output->buffer());
    //output:[1, 1, N, 7]
    //七个参数为:[image_id, label, conf, x_min, y_min, x_max, y_max]
    const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
    std::cout  0.5) {
            printf("label id: %d n", static_cast(label));
            cv::Rect box;
            box.x = static_cast(xmin);
            box.y = static_cast(ymin);
            box.width = static_cast(xmax - xmin);
            box.height = static_cast(ymax - ymin);
            cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
            //box.tl()返回矩形左上角坐标
            cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
        }
    }

    //cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
    cv::namedWindow("out", cv::WINDOW_FREERATIO);
    cv::imshow("out", src);
    cv::waitKey(0);
    return 0;
}

效果:

OpenVINO计算机视觉模型加速插图17

车牌识别

  • 模型名称:license-plate-recognition-barrier-0001
  • 输入格式:BGR
  • 1 * 3 * 24 * 94,88 * 1 = [0, 1, 1, 1, 1, ...... , 1]
  • 输出格式:1 * 88 * 1 * 1

下载模型(license-plate-recognition-barrier-0001),下载方法同上,实现思路:1初始化车牌识别网络,提升输入输出值的应用范围;2调用车辆及车牌检测模型进行车牌检测;3将车牌检测的数据输入车牌识别函数,使用车牌识别网络初始化的输入输出值在该函数中进行识别,输出识别到的车牌信息。

车牌识别代码实现

#include 
#include 
#include 

using namespace InferenceEngine;
static std::vector<:string> items = {
    "0","1","2","3","4","5","6","7","8","9",
    "","","","",
    "","","","",
    "","","","",
    "","","","",
    "","","","",
    "","","","",
    "","","","",
    "","","","",
    "","",
    "A","B","C","D","E","F","G","H","I","J",
    "K","L","M","N","O","P","Q","R","S","T",
    "U","V","W","X","Y","Z"
};

InferenceEngine::InferRequest plate_request;
std::string plate_input_name1;
std::string plate_input_name2;
std::string plate_output_name;

void load_plate_recog_model();
void fetch_plate_text(cv::Mat &image, cv::Mat &plateROI);

int main(int argc, char** argv) {

    InferenceEngine::Core ie;
    load_plate_recog_model();  //调用车牌识别模型,模型信息保存到plate_input_name1/name2/output_name中

    std::string xml = "D:/projects/models/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.xml";
    std::string bin = "D:/projects/models/vehicle-license-plate-detection-barrier-0106/FP32/vehicle-license-plate-detection-barrier-0106.bin";
    cv::Mat src = cv::imread("D:/images/car_1.bmp");  //读取图像
    int im_h = src.rows;
    int im_w = src.cols;
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取resnet18网络

    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组
    std::string input_name = "";
    for (auto item : inputs) {  //auto可以自动推断变量类型
        input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto input_data = item.second;
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout getTensorDesc().getDims()[1];  //size_t 类型表示C中任何对象所能达到的最大长度,它是无符号整数
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;
    cv::resize(src, blob_image, cv::Size(w, h));  //将输入图片大小转换为与网络输入大小一致
    //cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空间转换

    // HWC =》NCHW  将输入图像从HWC格式转换为NCHW格式
    unsigned char* data = static_cast(input->buffer());  //将图像放到buffer中,放入input中
    for (size_t row = 0; row (row, col)[ch];
            }
        }
    }

    infer_request.Infer();
    auto output = infer_request.GetBlob(output_name);
    //转换输出数据
    const float* detection_out = static_cast::value_type*>(output->buffer());
    //output:[1, 1, N, 7]
    //七个参数为:[image_id, label, conf, x_min, y_min, x_max, y_max]
    const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
    std::cout  0.5) {
            printf("label id: %d n", static_cast(label));
            cv::Rect box;
            box.x = static_cast(xmin);
            box.y = static_cast(ymin);
            box.width = static_cast(xmax - xmin);
            box.height = static_cast(ymax - ymin);

            if (label == 2) {  //将车牌用绿色表示
                cv::rectangle(src, box, cv::Scalar(0, 255, 0), 2, 8);
                //recognize plate
                cv::Rect plate_roi;
                plate_roi.x = box.x - 5;
                plate_roi.y = box.y - 5;
                plate_roi.width = box.width + 10;
                plate_roi.height = box.height + 10;
                cv::Mat roi = src(plate_roi);  //需要先初始化Mat&,才能使用
                //调用车牌识别方法
                fetch_plate_text(src, roi);
            }
            else {
                cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
            }

            //box.tl()返回矩形左上角坐标
            cv::putText(src, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
        }
    }

    //cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
    cv::namedWindow("out", cv::WINDOW_FREERATIO);
    cv::imshow("out", src);
    cv::waitKey(0);
    return 0;
}

void load_plate_recog_model() {
    InferenceEngine::Core ie;

    std::string xml = "D:/projects/models/license-plate-recognition-barrier-0001/FP32/license-plate-recognition-barrier-0001.xml";
    std::string bin = "D:/projects/models/license-plate-recognition-barrier-0001/FP32/license-plate-recognition-barrier-0001.bin";

    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取网络
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组

    int cnt = 0;
    for (auto item : inputs) {  //auto可以自动推断变量类型
        if (cnt == 0) {
            plate_input_name1 = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
            auto input_data = item.second;
            input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8
            input_data->setLayout(Layout::NCHW);
        }
        else if (cnt == 1) {
            plate_input_name2 = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
            auto input_data = item.second;
            input_data->setPrecision(Precision::FP32);  //默认为unsigned char对应U8
        }
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout getTensorDesc().getDims()[1];  //size_t 类型表示C中任何对象所能达到的最大长度,它是无符号整数
    size_t h = input1->getTensorDesc().getDims()[2];
    size_t w = input1->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;
    cv::resize(plateROI, blob_image, cv::Size(94, 24));  //将输入图片大小转换为与网络输入大小一致
    //cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空间转换

    // HWC =》NCHW  将输入图像从HWC格式转换为NCHW格式
    unsigned char* data = static_cast(input1->buffer());  //将图像放到buffer中,放入input中
    for (size_t row = 0; row (row, col)[ch];
            }
        }
    }

    //使用车牌识别的方法中获取的输入输出信息,用于文本获取
    auto input2 = plate_request.GetBlob(plate_input_name2);
    int max_sequence = input2->getTensorDesc().getDims()[0];  //输出字符长度
    float* blob2 = input2->buffer().as();
    blob2[0] = 0.0;
    std::fill(blob2 + 1, blob2 + max_sequence, 1.0f);  //填充起止范围与填充值

    plate_request.Infer();  //执行推理
    auto output = plate_request.GetBlob(plate_output_name);  //获取推理结果
    const float* plate_data = static_cast::value_type*>(output->buffer());  //获取浮点类型输出值plate_data
    std::string result;
    for (int i = 0; i 

效果:

OpenVINO计算机视觉模型加速插图18

5、行人检测、人脸检测及表情识别

视频行人检测

模型介绍

  • pedestrian-detection-adas-0002
  • SSD MobileNetv1
  • 输入格式:[1 * 3 * 384 * 672]
  • 输出格式:[1, 1, N, 7]

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;
void infer_process(cv::Mat &frame, InferenceEngine::InferRequest &request, std::string &input_name, std::string &output_name);
int main(int argc, char** argv) {

    InferenceEngine::Core ie;

    std::string xml = "D:/projects/models/pedestrian-detection-adas-0002/FP32/pedestrian-detection-adas-0002.xml";
    std::string bin = "D:/projects/models/pedestrian-detection-adas-0002/FP32/pedestrian-detection-adas-0002.bin";
    cv::Mat src = cv::imread("D:/images/pedestrians_test.jpg");  //读取图像
    int im_h = src.rows;
    int im_w = src.cols;
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取车辆检测网络

    //获取网络输入输出信息
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组
    std::string input_name = "";
    for (auto item : inputs) {  //auto可以自动推断变量类型
        input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto input_data = item.second;
        // A->B 表示提取A中的成员B
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout getTensorDesc().getDims()[1];  //size_t 类型表示C中任何对象所能达到的最大长度,它是无符号整数
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;
    cv::resize(frame, blob_image, cv::Size(w, h));  //将输入图片大小转换为与网络输入大小一致
    //cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空间转换

    // HWC =》NCHW  将输入图像从HWC格式转换为NCHW格式
    unsigned char* data = static_cast(input->buffer());  //将图像放到buffer中,放入input中
    for (size_t row = 0; row (row, col)[ch];
            }
        }
    }

    request.Infer();
    auto output = request.GetBlob(output_name);
    //转换输出数据
    const float* detection_out = static_cast::value_type*>(output->buffer());
    //output:[1, 1, N, 7]
    //七个参数为:[image_id, label, conf, x_min, y_min, x_max, y_max]
    const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
    std::cout  0.9) {
            printf("label id: %d n", static_cast(label));
            cv::Rect box;
            box.x = static_cast(xmin);
            box.y = static_cast(ymin);
            box.width = static_cast(xmax - xmin);
            box.height = static_cast(ymax - ymin);

            if (label == 2) {  //将车牌与车辆用不同颜色表示
                cv::rectangle(frame, box, cv::Scalar(0, 255, 0), 2, 8);
            }
            else {
                cv::rectangle(frame, box, cv::Scalar(0, 0, 255), 2, 8);
            }

            //cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
            //box.tl()返回矩形左上角坐标
            cv::putText(frame, cv::format("%.2f", confidence), box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
        }
    }
}

效果:

OpenVINO计算机视觉模型加速插图19

实时人脸检测之异步推理

模型介绍

  • 人脸检测:face-detection-0202,SSD-MobileNetv2
  • 输入格式:1 * 3 * 384 * 384
  • 输出格式:[1, 1, N, 7]
  • OpenVINO中人脸检测模型0202~0206

同步与异步执行

OpenVINO计算机视觉模型加速插图20

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;

//图像预处理函数
template 
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<:memoryblob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION wmap();

    T* blob_data = mblobHolder.as();

    cv::Mat resized_image(orig_image);
    if (static_cast(width) != orig_image.size().width ||
        static_cast(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c (h, w)[c];
            }
        }
    }
}

void frameToBlob(std::shared_ptr<:inferrequest>& request, cv::Mat& frame, std::string& input_name) {
    //图像预处理,输入数据 ->指针获取成员方法
    InferenceEngine::Blob::Ptr input = request->GetBlob(input_name);  //获取网络输入图像信息
    //该函数template模板类型,需要指定具体类型
    matU8ToBlob(frame, input);  //使用该函数处理输入数据
}

int main(int argc, char** argv) {

    InferenceEngine::Core ie;
    std::vector<:string> devices = ie.GetAvailableDevices();
    for (std::string name : devices) {
        std::cout ();
    std::cout B 表示提取A中的成员B
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout StartAsync();  //开启线程
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else {  //如果first_frame与last_frame同为false表示只有下一帧不为空,则开启一个next线程
            if (!last_frame) {
                next_infer_request->StartAsync();
            }
        }
        //判断当前请求是否预处理完毕
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY)) {
            auto output = curr_infer_request->GetBlob(output_name);
            //转换输出数据
            const float* detection_out = static_cast::value_type*>(output->buffer());
            //output:[1, 1, N, 7]
            //七个参数为:[image_id, label, conf, x_min, y_min, x_max, y_max]
            const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
            std::cout  0.5) {
                    printf("label id: %d n", static_cast(label));
                    cv::Rect box;
                    box.x = static_cast(xmin);
                    box.y = static_cast(ymin);
                    box.width = static_cast(xmax - xmin);
                    box.height = static_cast(ymax - ymin);

                    cv::rectangle(curr_frame, box, cv::Scalar(0, 0, 255), 2, 8);
                    //getTickCount()相减得到cpu走过的时钟周期数,getTickFrequency()得到cpu一秒走过的始终周期数
                    float t = (cv::getTickCount() - start) / static_cast(cv::getTickFrequency());
                    std::cout 

效果:

OpenVINO计算机视觉模型加速插图21

实时人脸表情识别

模型介绍

  • 人脸检测:face-detection-0202,SSD-MobileNetv2
  • 输入格式:1 * 3 * 384 * 384
  • 输出格式:[1, 1, N, 7]
  • 表情识别:emotions-recognition-retail-0003
  • 1 * 3 * 64 * 64
  • [1, 5, 1, 1] - ('neutral', 'happy', 'sad', 'suprise', 'anger')
  • 下载模型 emotions-recognition-retail-0003 同前

同步与异步执行

OpenVINO计算机视觉模型加速插图22

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;

static const char *const items[] = {
    "neutral","happy","sad","surprise","anger"
};

//图像预处理函数
template 
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<:memoryblob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION wmap();

    T* blob_data = mblobHolder.as();

    cv::Mat resized_image(orig_image);
    if (static_cast(width) != orig_image.size().width ||
        static_cast(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c (h, w)[c];
            }
        }
    }
}

void fetch_emotion(cv::Mat& image, InferenceEngine::InferRequest& request, cv::Rect& face_roi, std::string& e_input, std::string& e_output);
void frameToBlob(std::shared_ptr<:inferrequest>& request, cv::Mat& frame, std::string& input_name) {
    //图像预处理,输入数据 ->指针获取成员方法
    InferenceEngine::Blob::Ptr input = request->GetBlob(input_name);  //获取网络输入图像信息
    //该函数template模板类型,需要指定具体类型
    matU8ToBlob(frame, input);  //使用该函数处理输入数据
}

int main(int argc, char** argv) {

    InferenceEngine::Core ie;

    //load face model
    std::string xml = "D:/projects/models/face-detection-0202/FP32/face-detection-0202.xml";
    std::string bin = "D:/projects/models/face-detection-0202/FP32/face-detection-0202.bin";
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取车辆检测网络
    //获取网络输入输出信息并设置
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组

    std::string input_name = "";
    for (auto item : inputs) {  //auto可以自动推断变量类型
        input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto input_data = item.second;
        // A->B 表示提取A中的成员B
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout setPrecision(Precision::U8);
        em_input_data->setLayout(Layout::NCHW);
    }
    std::string em_output_name = "";
    for (auto item : em_outputs) {  //auto可以自动推断变量类型
        em_output_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto em_output_data = item.second;
        em_output_data->setPrecision(Precision::FP32);  //输出还是浮点数
    }
    auto executable_em_network = ie.LoadNetwork(em_network, "CPU");  //设置运行的设备
    //创建指针类型便于后续操作
    auto em_request = executable_em_network.CreateInferRequest();  //设置推理请求

    cv::VideoCapture capture("D:/images/video/face_detect.mp4");
    cv::Mat curr_frame;
    cv::Mat next_frame;
    capture.read(curr_frame);  //先读取一帧作为当前帧
    int im_h = curr_frame.rows;
    int im_w = curr_frame.cols;
    frameToBlob(curr_infer_request, curr_frame, input_name);
    bool first_frame = true;  //设置两个bool变量控制线程开启
    bool last_frame = false;
    //开启两个线程,curr转换显示结果,next预处理图像,预处理后交换给curr
    while (true) {
        int64 start = cv::getTickCount();  //计时
        bool ret = capture.read(next_frame);  //读取一帧作为下一帧
        if (!ret) {
            last_frame = true;  //如果下一帧为空,则last_frame为true
        }
        if (!last_frame) {  //如果last_frame为false则预处理下一帧图像
            frameToBlob(next_infer_request, next_frame, input_name);
        }
        if (first_frame) {  //如果first_frame为true则开启两个线程,同时修改first_frame为false,避免多次开启线程
            curr_infer_request->StartAsync();  //开启线程
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else {  //如果first_frame与last_frame同为false表示只有下一帧不为空,则开启一个next线程
            if (!last_frame) {
                next_infer_request->StartAsync();
            }
        }
        //判断当前请求是否预处理完毕
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY)) {
            auto output = curr_infer_request->GetBlob(output_name);
            //转换输出数据
            const float* detection_out = static_cast::value_type*>(output->buffer());
            //output:[1, 1, N, 7]
            //七个参数为:[image_id, label, conf, x_min, y_min, x_max, y_max]
            const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
            std::cout  0.5) {
                    printf("label id: %d n", static_cast(label));
                    cv::Rect box;
                    box.x = static_cast(xmin);
                    box.y = static_cast(ymin);
                    xmax = xmax > im_w ? im_w : xmax;  //通过判断避免越界
                    ymax = ymax > im_h ? im_h : ymax;
                    box.width = static_cast(xmax - xmin);
                    box.height = static_cast(ymax - ymin);

                    box.x = box.x (cv::getTickFrequency()) / (cv::getTickCount() - start);

                    cv::putText(curr_frame, cv::format("FPS:%.2f", fps), cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
                }
            }
        }
        //显示结果
        cv::imshow("人脸检测异步显示", curr_frame);
        char c = cv::waitKey(1);
        if (c == 27) {  //ESC
            break;
        }
        if (last_frame) {  //如果last_frame为true表示下一帧为空,则跳出循环
            break;
        }

        //异步交换,下一帧复制到当前帧,当前帧请求与下一帧请求交换
        next_frame.copyTo(curr_frame);
        curr_infer_request.swap(next_infer_request);  //指针可以使用swap方法,否则不行
    }

    cv::waitKey(0);
    return 0;
}

//获取表情
void fetch_emotion(cv::Mat& image, InferenceEngine::InferRequest& request, cv::Rect& face_roi, std::string& e_input, std::string& e_output) {

    cv::Mat faceROI = image(face_roi);  //获取面部区域
    //图像预处理,使用车牌识别的方法中获取的输入输出信息,用于文本获取
    auto blob = request.GetBlob(e_input);  //获取网络输入图像信息
    matU8ToBlob(faceROI, blob);

    request.Infer();  //执行推理

    auto output = request.GetBlob(e_output);
    //转换输出数据
    const float* probs = static_cast::value_type*>(output->buffer());
    const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
    std::cout 

效果:

OpenVINO计算机视觉模型加速插图23

人脸关键点landmark检测

模型介绍

  • face-detection-0202 - 人脸检测
  • facial-landmarks-35-adas-0002 - landmark提取
  • 输入格式:[1 * 3 * 60 * 60]
  • 输出格式:[1, 70]
  • 输出人脸35个特征点,浮点数坐标

OpenVINO计算机视觉模型加速插图24

程序流程

OpenVINO计算机视觉模型加速插图25

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;

//图像预处理函数
template 
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<:memoryblob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION wmap();

    T* blob_data = mblobHolder.as();

    cv::Mat resized_image(orig_image);
    if (static_cast(width) != orig_image.size().width ||
        static_cast(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c (h, w)[c];
            }
        }
    }
}

void frameToBlob(std::shared_ptr<:inferrequest>& request, cv::Mat& frame, std::string& input_name) {
    //图像预处理,输入数据 ->指针获取成员方法
    InferenceEngine::Blob::Ptr input = request->GetBlob(input_name);  //获取网络输入图像信息
    //该函数template模板类型,需要指定具体类型
    matU8ToBlob(frame, input);  //使用该函数处理输入数据
}

InferenceEngine::InferRequest landmark_request;  //提高推理请求作用域
void loadLandmarksRequest(Core& ie, std::string& land_input_name, std::string& land_output_name);
int main(int argc, char** argv) {

    InferenceEngine::Core ie;
    std::vector<:string> devices = ie.GetAvailableDevices();
    for (std::string name : devices) {
        std::cout ();
    std::cout B 表示提取A中的成员B
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout StartAsync();  //开启线程
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else {  //如果first_frame与last_frame同为false表示只有下一帧不为空,则开启一个next线程
            if (!last_frame) {
                next_infer_request->StartAsync();
            }
        }
        //判断当前请求是否预处理完毕
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY)) {
            auto output = curr_infer_request->GetBlob(output_name);
            //转换输出数据
            const float* detection_out = static_cast::value_type*>(output->buffer());
            //output:[1, 1, N, 7]
            //七个参数为:[image_id, label, conf, x_min, y_min, x_max, y_max]
            const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
            std::cout  0.5) {
                    printf("label id: %d n", static_cast(label));
                    cv::Rect box;

                    float x1 = std::min(std::max(0.0f, xmin), static_cast(im_w));  //防止目标区域越界
                    float y1 = std::min(std::max(0.0f, ymin), static_cast(im_h));
                    float x2 = std::min(std::max(0.0f, xmax), static_cast(im_w));
                    float y2 = std::min(std::max(0.0f, ymax), static_cast(im_h));

                    box.x = static_cast(x1);
                    box.y = static_cast(y1);
                    box.width = static_cast(x2 - x1);
                    box.height = static_cast(y2 - y1);

                    cv::Mat face_roi = curr_frame(box);
                    auto face_input_blob = landmark_request.GetBlob(land_input_name);
                    matU8ToBlob(face_roi, face_input_blob);
                    landmark_request.Infer();  //执行推理获取目标区域面部特征点

                    auto land_output = landmark_request.GetBlob(land_output_name);
                    const float* blob_out = static_cast::value_type*>(land_output->buffer());
                    const SizeVector land_dims = land_output->getTensorDesc().getDims();
                    const int b = land_dims[0];
                    const int cc = land_dims[1];

                    //共70个特征参数(x0, y0, x1, y1, ..., x34, y34),所以每次要 +2
                    for (int i = 0; i (cv::getTickFrequency());
                    std::cout setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
    }
    for (auto item : outputs) {  //auto可以自动推断变量类型
        land_output_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto output_data = item.second;
        output_data->setPrecision(Precision::FP32);  //输出还是浮点数
    }

    auto executable_network = ie.LoadNetwork(network, "CPU");  //设置运行的设备
    landmark_request = executable_network.CreateInferRequest();  //设置推理请求
}

效果:

OpenVINO计算机视觉模型加速插图26

6、图像语义分割与实例分割

实时道路语义分割

  • 识别道路、背景、路边、标志线四个类别

道路分割模型介绍

  • 模型:road-segmentation-adas-0001
  • 输入格式:[B, C=3, H=512, W=896], BGR
  • 输出格式:[B, C=4, H=512, W=896]
  • 四个类别:BG, road, curb, mark

程序流程

OpenVINO计算机视觉模型加速插图27

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;

//图像预处理函数
template 
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<:memoryblob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION wmap();

    T* blob_data = mblobHolder.as();

    cv::Mat resized_image(orig_image);
    if (static_cast(width) != orig_image.size().width ||
        static_cast(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c (h, w)[c];
            }
        }
    }
}

void frameToBlob(std::shared_ptr<:inferrequest>& request, cv::Mat& frame, std::string& input_name) {
    //图像预处理,输入数据 ->指针获取成员方法
    InferenceEngine::Blob::Ptr input = request->GetBlob(input_name);  //获取网络输入图像信息
    //该函数template模板类型,需要指定具体类型
    matU8ToBlob(frame, input);  //使用该函数处理输入数据
}

int main(int argc, char** argv) {

    InferenceEngine::Core ie;

    std::string xml = "D:/projects/models/road-segmentation-adas-0001/FP32/road-segmentation-adas-0001.xml";
    std::string bin = "D:/projects/models/road-segmentation-adas-0001/FP32/road-segmentation-adas-0001.bin";

    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取车辆检测网络

    //获取网络输入输出信息并设置
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组
    std::string input_name = "";
    for (auto item : inputs) {  //auto可以自动推断变量类型
        input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto input_data = item.second;
        // A->B 表示提取A中的成员B
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout  color_tab;  //设置分割输出图像中的不同颜色代表不同分类
    color_tab.push_back(cv::Vec3b(0, 0, 0));  //背景
    color_tab.push_back(cv::Vec3b(255, 0, 0));  //道路
    color_tab.push_back(cv::Vec3b(0, 0, 255));  //路边
    color_tab.push_back(cv::Vec3b(0, 255, 255));  //路标

    //开启两个线程,curr转换显示结果,next预处理图像,预处理后交换给curr
    while (true) {
        int64 start = cv::getTickCount();  //计时
        bool ret = capture.read(next_frame);  //读取一帧作为下一帧
        if (!ret) {
            last_frame = true;  //如果下一帧为空,则last_frame为true
        }
        if (!last_frame) {  //如果last_frame为false则预处理下一帧图像
            frameToBlob(next_infer_request, next_frame, input_name);
        }
        if (first_frame) {  //如果first_frame为true则开启两个线程,同时修改first_frame为false,避免多次开启线程
            curr_infer_request->StartAsync();  //开启线程
            next_infer_request->StartAsync();
            first_frame = false;
        }
        else {  //如果first_frame与last_frame同为false表示只有下一帧不为空,则开启一个next线程
            if (!last_frame) {
                next_infer_request->StartAsync();
            }
        }
        //判断当前请求是否预处理完毕
        if (InferenceEngine::OK == curr_infer_request->Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY)) {
            auto output = curr_infer_request->GetBlob(output_name);
            //转换输出数据
            const float* detection_out = static_cast::value_type*>(output->buffer());

            //output:[B, C, H, W]
            const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000

            //每个像素针对每种分类分别有一个识别结果数值,数值最大的为该像素的分类
            //结果矩阵格式为:每种分类各有一个输出图像大小的矩阵,每个像素位置对应其在该分类的可能性
            const int out_c = outputDims[1];  //分割识别的类型个数,此处为4
            const int out_h = outputDims[2];  //分割网络输出图像的高
            const int out_w = outputDims[3];  //分割网络输出图像的宽
            cv::Mat result = cv::Mat::zeros(cv::Size(out_w, out_h), CV_8UC3);
            int step = out_h * out_w;
            for (int row = 0; row  max_prob) {
                            max_prob = prob;
                            max_index = cn;
                        }
                    }
                    //在结果矩阵中对应像素位置保存原图中该像素分类对应的颜色
                    result.at<:vec3b>(row, col) = color_tab[max_index];
                }
            }
            //先初始化一个网络输出结果大小的矩阵保存每个像素点对应的颜色,再将结果矩阵恢复到原图大小,以便最终结果显示
            cv::resize(result, result, cv::Size(im_w, im_h));
            //在输入图像中对应位置按比例增加结果矩阵中对应的颜色
            cv::addWeighted(curr_frame, 0.5, result, 0.5, 0, curr_frame);
        }
        //getTickCount()相减得到cpu走过的时钟周期数,getTickFrequency()得到cpu一秒走过的始终周期数
        float t = (cv::getTickCount() - start) / static_cast(cv::getTickFrequency());
        cv::putText(curr_frame, cv::format("FPS: %.2f", 1.0 / t), cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
        //显示结果
        cv::imshow("道路分割异步显示", curr_frame);
        char c = cv::waitKey(1);
        if (c == 27) {  //ESC
            break;
        }
        if (last_frame) {  //如果last_frame为true表示下一帧为空,则跳出循环
            break;
        }

        //异步交换,下一帧复制到当前帧,当前帧请求与下一帧请求交换
        next_frame.copyTo(curr_frame);
        curr_infer_request.swap(next_infer_request);  //指针可以使用swap方法,否则不行
    }

    cv::waitKey(0);
    return 0;
}

效果:

黄色为路面标志,红色为路边,蓝色为道路,其余部分为背景

OpenVINO计算机视觉模型加速插图28

实例分割

实例分割模型介绍(Mask R-CNN)

  • instance-segmentation-security-0050
  • 有两个输入层:
  • im_data: [1 * 3 * 480 * 480],图像数据 1 * C * H * C(num、channels、height、width)
  • im_info: [1 * 3],图像信息,宽、高和scale
  • 输出格式:
  • classes: [100, ],最多100个实例,属于不超过80个分类
  • scores: [100, ],每个检测到对象不是背景的概率
  • Boxes: [100, 4],每个检测到的对象的位置(左上角及右下角坐标)
  • raw_masks: [100, 81, 28, 28],实际是对每个实例都生成一个14*14的mask,对每个实例获取81个类别(80个类别+背景)的概率值,输出81个14 * 14大小的矩阵
  • 实际内存中的结果矩阵是 14 * 14 * 81 * 100

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;
/*
void read_coco_labels(std::vector<:string>& labels) {
    std::string label_file = "D:/projects/models/coco_labels.txt";
    std::ifstream fp(label_file);
    if (!fp.is_open())
    {
        printf("could not open file...n");
        exit(-1);
    }
    std::string name;
    while (!fp.eof())
    {
        std::getline(fp, name);
        if (name.length())
            labels.push_back(name);
    }
    fp.close();
}
*/

//图像预处理函数
template 
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<:memoryblob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION wmap();

    T* blob_data = mblobHolder.as();

    cv::Mat resized_image(orig_image);
    if (static_cast(width) != orig_image.size().width ||
        static_cast(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c (h, w)[c];
            }
        }
    }
}

int main(int argc, char** argv) {

    InferenceEngine::Core ie;
    std::vector<:string> coco_labels;
    //read_coco_labels(coco_labels);
    cv::RNG rng(12345);

    std::string xml = "D:/projects/models/instance-segmentation-security-0050/FP32/instance-segmentation-security-0050.xml";
    std::string bin = "D:/projects/models/instance-segmentation-security-0050/FP32/instance-segmentation-security-0050.bin";
    cv::Mat src = cv::imread("D:/images/instance_segmentation.jpg");  //读取图像
    int im_h = src.rows;
    int im_w = src.cols;
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取车辆检测网络

    //获取网络输入输出信息
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组
    std::string image_input_name = "";
    std::string image_info_name = "";
    int in_index = 0;

    //设置两个网络输入数据的参数
    for (auto item : inputs) {  //auto可以自动推断变量类型
        if (in_index == 0) {
            image_input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
            auto input_data = item.second;
            // A->B 表示提取A中的成员B
            input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8,浮点类型则为FP32
            input_data->setLayout(Layout::NCHW);
        }
        else {
            image_info_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
            auto input_data = item.second;
            // A->B 表示提取A中的成员B
            input_data->setPrecision(Precision::FP32);  //默认为unsigned char对应U8,浮点类型则为FP32
        }
        in_index++;
    }

    for (auto item : outputs) {  //auto可以自动推断变量类型
        std::string output_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto output_data = item.second;
        output_data->setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout (src, input);

    //设置网络的第二个输入
    auto input2 = infer_request.GetBlob(image_info_name);
    auto imInforDim = inputs.find(image_info_name)->second->getTensorDesc().getDims()[1];
    InferenceEngine::MemoryBlob::Ptr minput2 = InferenceEngine::as<:memoryblob>(input2);
    auto minput2Holder = minput2->wmap();
    float* p = minput2Holder.as<:precisiontrait>::value_type*>();
    p[0] = static_cast(inputs[image_input_name]->getTensorDesc().getDims()[2]);  //输入图像的高
    p[1] = static_cast(inputs[image_input_name]->getTensorDesc().getDims()[3]);  //输入图像的宽
    p[2] = 1.0f;  //scale,前面图像已经转换为480*480,这里保持为1.0就可以

    infer_request.Infer();

    float w_rate = static_cast(im_w) / 480.0;  //用于通过网络输出中的坐标获取原图的坐标
    float h_rate = static_cast(im_h) / 480.0;

    auto scores = infer_request.GetBlob("scores");  //获取网络输出中的信息
    auto boxes = infer_request.GetBlob("boxes");
    auto classes = infer_request.GetBlob("classes");
    auto raw_masks = infer_request.GetBlob("raw_masks");
    //转换输出数据
    const float* scores_data = static_cast::value_type*>(scores->buffer());  //强制转换数据类型为浮点型
    const float* boxes_data = static_cast::value_type*>(boxes->buffer());
    const float* classes_data = static_cast::value_type*>(classes->buffer());
    const auto raw_masks_data = static_cast::value_type*>(raw_masks->buffer());
    const SizeVector scores_outputDims = scores->getTensorDesc().getDims();  //获取输出维度信息 1*1000
    const SizeVector boxes_outputDims = boxes->getTensorDesc().getDims();  //获取输出维度信息 1*1000
    const SizeVector raw_masks_outputDims = raw_masks->getTensorDesc().getDims();  //[100, 81, 28, 28]
    const int max_count = scores_outputDims[0];  //识别出的对象个数
    const int object_size = boxes_outputDims[1];  //获取对象信息的个数,此处为4个
    printf("mask NCHW=[%d, %d, %d, %d]n", raw_masks_outputDims[0], raw_masks_outputDims[1], raw_masks_outputDims[2], raw_masks_outputDims[3]);
    int mask_h = raw_masks_outputDims[2];
    int mask_w = raw_masks_outputDims[3];
    size_t box_stride = mask_h * mask_w * raw_masks_outputDims[1];  //两个mask之间的距离
    for (int n = 0; n  0.5) {
            cv::Scalar color(rng.uniform(0, 255), rng.uniform(0, 255), rng.uniform(0, 255));
            cv::Rect box;
            float x1 = std::min(std::max(0.0f, xmin), static_cast(im_w));  //避免越界
            float y1 = std::min(std::max(0.0f, ymin), static_cast(im_h));
            float x2 = std::min(std::max(0.0f, xmax), static_cast(im_w));
            float y2 = std::min(std::max(0.0f, ymax), static_cast(im_h));
            box.x = static_cast(x1);
            box.y = static_cast(y1);
            box.width = static_cast(x2 - x1);
            box.height = static_cast(y2 - y1);
            int label = static_cast(classes_data[n]);  //第几个实例
            //std::cout 

效果:

OpenVINO计算机视觉模型加速插图29

7、场景文字检测与识别

场景文字检测

模型介绍

  • text-detection-0003
  • PixelLink模型库,BGR顺序
  • 1个输入层:[B, C, H, W] [1 * 3 * 768 * 1280]
  • 2个输出层:
  • model/link_logits_/add:[1x16x192x320] - 像素与周围像素的联系
  • model/segm_logits/add:[1x2x192x320] - 每个像素所属分类(文本/非文本),只要解析第二个输出就可以获取文本区域

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;

//图像预处理函数
template 
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<:memoryblob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION wmap();

    T* blob_data = mblobHolder.as();

    cv::Mat resized_image(orig_image);
    if (static_cast(width) != orig_image.size().width ||
        static_cast(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c (h, w)[c];
            }
        }
    }
}

int main(int argc, char** argv) {

    InferenceEngine::Core ie;

    std::string xml = "D:/projects/models/text-detection-0003/FP32/text-detection-0003.xml";
    std::string bin = "D:/projects/models/text-detection-0003/FP32/text-detection-0003.bin";
    cv::Mat src = cv::imread("D:/images/text_detection.png");  //读取图像
    cv::imshow("input", src);
    int im_h = src.rows;
    int im_w = src.cols;
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取车辆检测网络

    //获取网络输入输出信息
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组
    std::string image_input_name = "";

    //设置两个网络输入数据的参数
    for (auto item : inputs) {  //auto可以自动推断变量类型
        image_input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto input_data = item.second;
        // A->B 表示提取A中的成员B
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8,浮点类型则为FP32
        input_data->setLayout(Layout::NCHW);
    }
    std::string output_name1 = "";
    std::string output_name2 = "";
    int out_index = 0;
    for (auto item : outputs) {  //auto可以自动推断变量类型
        if (out_index == 1) {
            output_name2 = item.first;
        }
        else {
            output_name1 = item.first;
        }
        auto output_data = item.second;
        output_data->setPrecision(Precision::FP32);  //输出还是浮点数
        out_index++;
    }

    auto executable_network = ie.LoadNetwork(network, "CPU");  //设置运行的设备
    auto infer_request = executable_network.CreateInferRequest();  //设置推理请求

    //图像预处理
    auto input = infer_request.GetBlob(image_input_name);  //获取网络输入图像信息
    //将输入图像转换为网络的输入格式
    matU8ToBlob(src, input);

    infer_request.Infer();

    auto output = infer_request.GetBlob(output_name2);  //只解析第二个输出即可
    //转换输出数据
    const float* detection_out = static_cast::value_type*>(output->buffer());

    //output:[B, C, H, W] [1, 2, 192, 320]
    const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000

    //每个像素针对每种分类分别有一个识别结果数值,数值最大的为该像素的分类
    //结果矩阵格式为:每种分类各有一个输出图像大小的矩阵,每个像素位置对应其在该分类的可能性
    const int out_c = outputDims[1];  //分割识别的类型个数,此处为2
    const int out_h = outputDims[2];  //分割网络输出图像的高
    const int out_w = outputDims[3];  //分割网络输出图像的宽
    cv::Mat mask = cv::Mat::zeros(cv::Size(out_w, out_h), CV_32F);
    int step = out_h * out_w;
    for (int row = 0; row (row, col) = p2;
            }
        }
    }
    //先初始化一个网络输出结果大小的矩阵保存每个像素点对应的颜色,再将结果矩阵恢复到原图大小,以便最终结果显示
    cv::resize(mask, mask, cv::Size(im_w, im_h));
    mask = mask * 255;
    mask.convertTo(mask, CV_8U);  //把mask从浮点数转换为整数,并将范围转换为0-255
    cv::threshold(mask, mask, 100, 255, cv::THRESH_BINARY);  //将mask按指定范围进行二值化分割
    std::vector<:vector>> contours;
    cv::findContours(mask, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);
    for (size_t t = 0; t 

效果:

OpenVINO计算机视觉模型加速插图30

场景文字识别

模型介绍

  • 模型名称:text-recognition-0012
  • 输入格式 - BCHW = [1 * 1 * 32 * 120],输入的是单通道灰度图
  • 输出层 - WBL = [30, 1, 37], W表示序列长度,每个字符占一行,共30行,每个字符有37种可能,所以占37列
  • 其中L为:0123456789abcdefghijklmnopqrstuvwxyz#
  • 表示CTC解析时候的空白字符,CTC的输出连续两个字符不能相同,相同字符间必有空格隔开,可参见以下博客:超详细讲解CTC理论和实战 - 简书 (jianshu.com)

代码实现

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

using namespace InferenceEngine;

//图像预处理函数
template 
void matU8ToBlob(const cv::Mat& orig_image, InferenceEngine::Blob::Ptr& blob, int batchIndex = 0) {
    InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
    const size_t width = blobSize[3];
    const size_t height = blobSize[2];
    const size_t channels = blobSize[1];
    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<:memoryblob>(blob);
    if (!mblob) {
        THROW_IE_EXCEPTION wmap();

    T* blob_data = mblobHolder.as();

    cv::Mat resized_image(orig_image);
    if (static_cast(width) != orig_image.size().width ||
        static_cast(height) != orig_image.size().height) {
        cv::resize(orig_image, resized_image, cv::Size(width, height));
    }

    int batchOffset = batchIndex * width * height * channels;

    for (size_t c = 0; c (h, w)[c];
            }
        }
    }
}

//文本识别预处理
void loadTextRecogRequest(Core& ie, std::string& reco_input_name, std::string& reco_output_name);
std::string alphabet = "0123456789abcdefghijklmnopqrstuvwxyz#";  //用于匹配的字符表
std::string ctc_decode(const float* blob_out, int seq_w, int seq_l);  //CTC字符匹配函数
InferenceEngine::InferRequest reco_request;
int main(int argc, char** argv) {

    InferenceEngine::Core ie;

    std::string xml = "D:/projects/models/text-detection-0003/FP32/text-detection-0003.xml";
    std::string bin = "D:/projects/models/text-detection-0003/FP32/text-detection-0003.bin";
    cv::Mat src = cv::imread("D:/images/text_detection02.png");  //读取图像
    cv::imshow("input", src);
    int im_h = src.rows;
    int im_w = src.cols;
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取车辆检测网络

    //获取网络输入输出信息
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组
    std::string image_input_name = "";

    //设置两个网络输入数据的参数
    for (auto item : inputs) {  //auto可以自动推断变量类型
        image_input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto input_data = item.second;
        // A->B 表示提取A中的成员B
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8,浮点类型则为FP32
        input_data->setLayout(Layout::NCHW);
    }
    std::string output_name1 = "";
    std::string output_name2 = "";
    int out_index = 0;
    for (auto item : outputs) {  //auto可以自动推断变量类型
        if (out_index == 1) {
            output_name2 = item.first;
        }
        else {
            output_name1 = item.first;
        }
        auto output_data = item.second;
        output_data->setPrecision(Precision::FP32);  //输出还是浮点数
        out_index++;
    }

    auto executable_network = ie.LoadNetwork(network, "CPU");  //设置运行的设备
    auto infer_request = executable_network.CreateInferRequest();  //设置推理请求

    //图像预处理
    auto input = infer_request.GetBlob(image_input_name);  //获取网络输入图像信息
    //将输入图像转换为网络的输入格式
    matU8ToBlob(src, input);

    infer_request.Infer();

    auto output = infer_request.GetBlob(output_name2);  //只解析第二个输出即可
    //转换输出数据
    const float* detection_out = static_cast::value_type*>(output->buffer());

    //output:[B, C, H, W] [1, 2, 192, 320]
    const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000

    //每个像素针对每种分类分别有一个识别结果数值,数值最大的为该像素的分类
    //结果矩阵格式为:每种分类各有一个输出图像大小的矩阵,每个像素位置对应其在该分类的可能性
    const int out_c = outputDims[1];  //分割识别的类型个数,此处为2
    const int out_h = outputDims[2];  //分割网络输出图像的高
    const int out_w = outputDims[3];  //分割网络输出图像的宽
    cv::Mat mask = cv::Mat::zeros(cv::Size(out_w, out_h), CV_8U);
    int step = out_h * out_w;
    for (int row = 0; row = 1.0) {
                mask.at(row, col) = 255;
            }
        }
    }
    //先初始化一个网络输出结果大小的矩阵保存每个像素点对应的颜色,再将结果矩阵恢复到原图大小,以便最终结果显示
    cv::resize(mask, mask, cv::Size(im_w, im_h));

    std::vector<:vector>> contours;  //初始化一个容器保存轮廓点集
    cv::findContours(mask, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);

    cv::Mat gray;
    cv::cvtColor(src, gray, cv::COLOR_BGR2GRAY);
    std::string reco_input_name = "";
    std::string reco_output_name = "";
    loadTextRecogRequest(ie, reco_input_name, reco_output_name);
    std::cout getTensorDesc().getDims()[1];
        size_t h = reco_input_blob->getTensorDesc().getDims()[2];
        size_t w = reco_input_blob->getTensorDesc().getDims()[3];
        size_t image_size = h * w;
        cv::Mat blob_image;
        cv::resize(roi, blob_image, cv::Size(w, h));  //转换图像为网络输入大小

        //HWC =》NCHW
        unsigned char* data = static_cast(reco_input_blob->buffer());
        for (size_t row = 0; row (row, col);  //uchar类型无符号 0-255
            }
        }
        reco_request.Infer();

        auto reco_output = reco_request.GetBlob(reco_output_name);
        //获取输出数据的指针
        const float* blob_out = static_cast::value_type*>(reco_output->buffer());
        const SizeVector reco_dims = reco_output->getTensorDesc().getDims();
        const int RW = reco_dims[0];  //30
        const int RB = reco_dims[1];  //1
        const int RL = reco_dims[2];  //37
        //通过CTC解码来处理网络输出的数据
        std::string ocr_txt = ctc_decode(blob_out, RW, RL);  //识别输出的数据为字符
        std::cout setPrecision(Precision::U8);
        input_data->setLayout(Layout::NCHW);
    }
    for (auto item : outputs) {
        reco_output_name = item.first;
        auto output_data = item.second;
        output_data->setPrecision(Precision::FP32);
    }

    auto exec_network = ie.LoadNetwork(network, "CPU");
    reco_request = exec_network.CreateInferRequest();
}

std::string ctc_decode(const float* blob_out, int seq_w, int seq_l) {
    printf("seq width:%d,seq length:%dn", seq_w, seq_l);
    std::string res = "";
    bool prev_pad = false;
    const int num_classes = alphabet.length();
    int seq_len = seq_w * seq_l;
    for (int i = 0; i  max_prob) {
                max_prob = blob_out[i * seq_l + j];
                argmax = j;
            }
        }
        auto symbol = alphabet[argmax];  //遍历查找每个字符的最大可能字符
        if (symbol == '#') {  //去除字符串中的空字符
            //通过prev_pad来控制空字符之后的字符一定会添加到结果字符串中,而两个连续相同字符的第二个不会被添加到结果字符串中
            prev_pad = true;
        }
        else {
            if (res.empty() || prev_pad || (!res.empty() && symbol != res.back())) {  //back()方法获取字符串最后一个字符;front()获取第一个字符
                prev_pad = false;
                res += symbol;  //字符串拼接
            }
        }
    }
    return res;
}

效果:

OpenVINO计算机视觉模型加速插图31

8、模型转换与部署

pytorch模型转换与部署

  • ONNX转换与支持
  • 首先需要保存pth文件,然后转化为内ONNX格式文件
  • OpenVINO支持直接读取ONNX格式文件解析
  • ONNX转换为IR文件

pytorch模型转换为onnx模型

从pytorch官网安装:Start Locally | PyTorch

import torch
import torchvision

def main():
    model = torchvision.models.resnet18(pretrained=True).eval()  #模型的推理模式
    dummy_input = torch.randn((1,3,224,224))  #tensor张量,多维数组,此处模型输入为3通道,224*224大小的图像
    torch.onnx.export(model,dummy_input,"resnet18.onnx")

if __name__ == '__main__':
    main()

运行后获取的onnx模型文件:

OpenVINO计算机视觉模型加速插图32

onnx模型转换为IR模型

  1. 进入OpenVINO安装路径下的model_optimizer文件夹,路径如下:C:Program Files (x86)Intelopenvino_2021.2.185deployment_toolsmodel_optimizer
  2. 可以通过运行该文件夹中的install_prerequisites文件夹中的bat脚本来安装onnx及tensorflow环境,也可手动根据requirements_onnx.txt文件中的环境要求安装,安装完环境后,以管理员身份运行cmd命令提示符并进入到model_optimizer文件夹下

OpenVINO计算机视觉模型加速插图33

  1. 运行model_optimizer文件夹下mo_onnx.py脚本将onnx模型转换为IR模型,运行后该文件夹下会生成xml及bin两个文件

OpenVINO计算机视觉模型加速插图34

OpenVINO计算机视觉模型加速插图35

运行脚本如下:

python mo_onnx.py --input_model D:/projects/models/resnet18_ir/resnet18.onnx

转换获得的onnx模型及IR模型测试代码

#include 
#include 
#include 

using namespace InferenceEngine;
std::string labels_txt_file = "D:/projects/models/resnet18_ir/imagenet_classes.txt";
std::vector<:string> readClassNames();
int main(int argc, char** argv) {
    InferenceEngine::Core ie;
    std::vector<:string> devices = ie.GetAvailableDevices();
    for (std::string name : devices) {
        std::cout ();
    std::cout  labels = readClassNames();
    cv::Mat src = cv::imread("D:/images/messi02.jpg");

    //IR和ONNX格式的模型都可以被InferenceEngine读取
    // InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(onnx);
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();

    std::string input_name = "";
    for (auto item : inputs) {
        input_name = item.first;
        auto input_data = item.second;
        input_data->setPrecision(Precision::FP32);
        input_data->setLayout(Layout::NCHW);
        input_data->getPreProcess().setColorFormat(ColorFormat::RGB);
        std::cout setPrecision(Precision::FP32);
        std::cout getTensorDesc().getDims()[1];
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;
    cv::resize(src, blob_image, cv::Size(w, h));
    cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);
    blob_image.convertTo(blob_image, CV_32F);
    blob_image = blob_image / 255.0;
    cv::subtract(blob_image, cv::Scalar(0.485, 0.456, 0.406), blob_image);
    cv::divide(blob_image, cv::Scalar(0.229, 0.224, 0.225), blob_image);

    // HWC =》NCHW
    float* data = static_cast(input->buffer());
    for (size_t row = 0; row (row, col)[ch];
            }
        }
    }

    infer_request.Infer();

    auto output = infer_request.GetBlob(output_name);
    const float* probs = static_cast::value_type*>(output->buffer());
    const SizeVector outputDims = output->getTensorDesc().getDims();
    std::cout  readClassNames()
{
    std::vector<:string> classNames;

    std::ifstream fp(labels_txt_file);
    if (!fp.is_open())
    {
        printf("could not open file...n");
        exit(-1);
    }
    std::string name;
    while (!fp.eof())
    {
        std::getline(fp, name);
        if (name.length())
            classNames.push_back(name);
    }
    fp.close();
    return classNames;
}

效果:

OpenVINO计算机视觉模型加速插图36

tensorflow模型转换与部署

  • 通用参数设置
    • --input_model
    • --transformations_config
    • --tensorflow_object_detection_api_pipeline_config
    • --input_shape
    • --reverse_input_channels(将rgb通道反序转换为bgr方便opencv后续操作)
  • 版本信息要求
  • tensorflow:required:>=1.15.2
  • numpy:required:
  • pip install tensorflow-gpu==1.15.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
  • pip install tensorflow-gpu==1.15.2 -i https://pypi.doubanio.com/simple/
  • networkx>=1.11
  • numpy>=1.14.0,
  • test-generator==0.1.1
  • defusedxml>=0.5.0

获取tensorflow预训练模型及查看OpenVINO模型转换文档

使用mobilenetv2版本pb转换为IR并调用推理

COCO-trained models链接:models/tf1_detection_zoo.md at master · tensorflow/models · GitHub

OpenVINO计算机视觉模型加速插图37

OpenVINO中的tesorflow模型转换链接:https://docs.openvino.ai/2021.2/openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models.html

获取的预训练模型文件夹中的pipeline.config文档可以对模型进行设置,比如参数image_resized可以保持原有固定输入图像大小300 * 300,也可以设置为保持原图像比例并设置图像大小缩放在一个范围内

OpenVINO计算机视觉模型加速插图38

pb模型转换为IR模型代码:

python mo_tf.py --input_model=D:/tensorflow/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb --transformations_config extensions/front/tf/ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config D:/tensorflow/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config --reverse_input_channels --input_shape [1,300,300,3]

tensorflow模型转换环境搭建及运行

python版本必须为3.8以下才能安装tensorflow 1.15.2,同时1.0版的tensorflow分cpu版与gpu版两种;

本机安装的python版本是3.8,所以使用conda创建python版本为3.6的虚拟环境用于模型转换:

conda create -n py36 python==3.6.5
conda activate py36
pip install tensorflow==1.15.2 -i https://pypi.doubanio.com/simple/
pip install tensorflow-gpu==1.15.2 -i https://pypi.doubanio.com/simple/
pip install networkx==1.11
pip install numpy==1.18.4
pip install test-generator==0.1.1
pip install defusedxml==0.5.0
cd C:Program Files (x86)Intelopenvino_2021.2.185deployment_toolsmodel_optimizer
python mo_tf.py --input_model=D:/tensorflow/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb --transformations_config extensions/front/tf/ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config D:/tensorflow/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config --reverse_input_channels --input_shape [1,300,300,3]

模型转换成功及转换得到的xml与bin文件

OpenVINO计算机视觉模型加速插图39

OpenVINO计算机视觉模型加速插图40

模型转换后的IR模型测试代码

#include 
#include 
#include   //fstream文件读写操作,iostream为控制台操作

void read_coco_labels(std::vector<:string>& labels) {
    std::string label_file = "D:/projects/models/object_detection_classes_coco.txt";
    std::ifstream fp(label_file);
    if (!fp.is_open())
    {
        printf("could not open file...n");
        exit(-1);
    }
    std::string name;
    while (!fp.eof())
    {
        std::getline(fp, name);
        if (name.length())
            labels.push_back(name);
    }
    fp.close();
}

using namespace InferenceEngine;

int main(int argc, char** argv) {

    InferenceEngine::Core ie;

    std::string xml = "D:/projects/models/tf_ssdv2_ir/frozen_inference_graph.xml";
    std::string bin = "D:/projects/models/tf_ssdv2_ir/frozen_inference_graph.bin";

    std::vector<:string> coco_labels;
    read_coco_labels(coco_labels);

    cv::Mat src = cv::imread("D:/images/dog_bike_car.jpg");  //读取图像
    int im_h = src.rows;
    int im_w = src.cols;
    InferenceEngine::CNNNetwork network = ie.ReadNetwork(xml, bin);  //读取车辆检测网络

    //获取网络输入输出信息
    InferenceEngine::InputsDataMap inputs = network.getInputsInfo();  //DataMap是一个Mat数组
    InferenceEngine::OutputsDataMap outputs = network.getOutputsInfo();  //DataMap是一个Mat数组
    std::string input_name = "";
    for (auto item : inputs) {  //auto可以自动推断变量类型
        input_name = item.first;  //第一个参数是name,第二个参数是结构,第二个参数设置精度与结构
        auto input_data = item.second;
        // A->B 表示提取A中的成员B
        input_data->setPrecision(Precision::U8);  //默认为unsigned char对应U8
        input_data->setLayout(Layout::NCHW);
        //input_data->getPreProcess().setColorFormat(ColorFormat::BGR);  默认就是BGR
        std::cout setPrecision(Precision::FP32);  //输出还是浮点数
        //注意:output_data不要设置结构
        std::cout getTensorDesc().getDims()[1];  //size_t 类型表示C中任何对象所能达到的最大长度,它是无符号整数
    size_t h = input->getTensorDesc().getDims()[2];
    size_t w = input->getTensorDesc().getDims()[3];
    size_t image_size = h * w;
    cv::Mat blob_image;
    cv::resize(src, blob_image, cv::Size(w, h));  //将输入图片大小转换为与网络输入大小一致
    //cv::cvtColor(blob_image, blob_image, cv::COLOR_BGR2RGB);  //色彩空间转换

    // HWC =》NCHW  将输入图像从HWC格式转换为NCHW格式
    unsigned char* data = static_cast(input->buffer());  //将图像放到buffer中,放入input中
    for (size_t row = 0; row (row, col)[ch];
            }
        }
    }

    infer_request.Infer();
    auto output = infer_request.GetBlob(output_name);
    //转换输出数据
    const float* detection_out = static_cast::value_type*>(output->buffer());
    //output:[1, 1, N, 7]
    //七个参数为:[image_id, label, conf, x_min, y_min, x_max, y_max]
    const SizeVector outputDims = output->getTensorDesc().getDims();  //获取输出维度信息 1*1000
    std::cout  0.7) {
            printf("label id: %d,label name: %s n", static_cast(label), coco_labels[static_cast(label)]);
            cv::Rect box;
            box.x = static_cast(xmin);
            box.y = static_cast(ymin);
            box.width = static_cast(xmax - xmin);
            box.height = static_cast(ymax - ymin);

            cv::rectangle(src, box, cv::Scalar(0, 0, 255), 2, 8);
            //box.tl()返回矩形左上角坐标
            cv::putText(src, coco_labels[static_cast(label)], box.tl(), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 255, 0), 2, 8);
        }
    }

    //cv::putText(src, labels[max_index], cv::Point(50, 50), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255), 2, 8);
    cv::namedWindow("out", cv::WINDOW_FREERATIO);
    cv::imshow("out", src);
    cv::waitKey(0);
    return 0;
}

效果:

OpenVINO计算机视觉模型加速插图41

9、YOLOv5模型部署与推理

  • Pytorch版本YOLOv5安装与配置
  • YOLOv5转ONNX格式生成
  • OpenVINO部署支持

YOLOv5安装与配置

强烈建议使用pycharm中的terminal命令行进行相关环境安装,速度快并且降低失败概率

  • pytorch三件套安装(cuda版本11.6)
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

OpenVINO计算机视觉模型加速插图42

下载YOLOv5项目地址:GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

从命令行进入YOLOv5项目解压文件夹路径安装相关依赖环境并测试(下载模型需FQ):

pip install -r requirements.txt  #根据requirements文本文件的环境需求进行自动安装,失败时手动安装成功后重新该命令直到成功
python detect.py --source data/images --weights yolov5s.pt --conf 0.25  #使用yolov5最新模型进行图片识别
python detect.py --source D:/images/video/Boogie_Up.mp4 --weights yolov5s.pt --conf 0.25  #使用yolov5最新模型进行视频识别

OpenVINO计算机视觉模型加速插图43

效果:

1、图片

OpenVINO计算机视觉模型加速插图44

2、视频

OpenVINO计算机视觉模型加速插图45

YOLOv5转ONNX格式

  • YOLOv5转换为ONNX代码
python export.py --weights yolov5s.pt --img 640 --batch 1 --include onnx  #include参数为转换之后的模型类型
  • 转IR格式文件,与pytorch模型转换中ONNX转IR一样
python mo_onnx.py --input_model D:/python/yolov5/yolov5s.onnx

OpenVINO计算机视觉模型加速插图46

OpenVINO计算机视觉模型加速插图47

OpenVINO+YOLOv5部署

YOLO识别原理概述:

1、YOLO识别原理

YOLO将图片分割成s2个网格。每个网格的大小相同,并且让s2个网格每个都可以预测出B个边界箱(预测框)。预测出来的每个边界箱都有5个信息量: 物体的中心位置(x,y),物体的高h,物体的宽w以及这次预测的置信度conf(预测这个网格里是否有目标的置信度)。每个网格不仅只预测B个边界箱,还预测这个网格是什么类别。假设我们要预测C类物体,则有C个置信度(预测是某一类目标的置信度)。那么这次预测的信息就有ss(5*B+C)个。

2、NMS非极大值抑制概念
方案一:选择预测类别的置信度(预测这个网格里是否有目标的置信度)高的里留下来,其余的预测都删除。方案二:把置信度(预测这个网格里是否有目标的置信度)最高的那个网格的边界箱作为极大边界箱,计算极大边界箱和其他几个网格的边界箱的IOU,如果超过一个阈值,例如0.5,就认为这两个网格实际上预测的是同一个物体,就把其中置信度比较小的删除。

YOLOv5s结构图:

OpenVINO计算机视觉模型加速插图48

  • 加载IR/ONNX格式文件
  • 设置输入格式 RGB - 640 * 640
  • YOLOv5的输出层是三层,对应的降采样倍数是32,16,8
  • 以输入 640 * 640大小的图像为例,得到三个输出层大小分别是20、40、80,每层上对应三个尺度的anchor(检测框)如下表
尺度 anchor 尺度比率(x * y)
80 10 * 13 16 * 30 33 * 23
40 30 * 61 62 * 45 59 * 119
20 116 * 90 156 * 198 373 * 326

OpenVINO计算机视觉模型加速插图49

模型的预测是在 20 * 20,40 * 40,80 * 80 每个输出层的每个特征点上预测三个anchor框,每个框预测分类,每个框的维度大小(不是输出维度)为:cx,cy,w,h,conf(置信度,表示框中对象不是背景的概率及框出对象的自信程度,当多个检测框检测到的对象重合时可根据大小值来判断选择哪一个检测框) + num of class(框中对象为COCO_labels中80类中每一类的概率),共5+80,图示如下(此处Hout和Wout可以看成是每层图像的高和宽):

OpenVINO计算机视觉模型加速插图50

输出结果解析时:先依次对每一个特征点遍历anchor0,再对每一个特征点遍历anchor1,即先保持anchor检测框不变,遍历特征点,遍历完所有特征点,再移动至下一个anchor检测框重新开始遍历特征点。

参考博客:目标检测之详解yolo的anchor、置信度和类别概率_专注于计算机视觉的AndyJiang的博客-CSDN博客_yolo置信度计算

output格式:[1, 3, 80, 80, 85]、[1, 3, 40, 40, 85]、[1, 3, 20, 20, 85],其中85是上面的(cx, cy, w, h, conf + number of class)。

每层每个特征点对应 85个参数个数 * 3个检测框,内存顺序按检测框顺序来保存,即先依次保存所有像素对应的第一个检测框再保存所有像素对应的第二个检测框。

代码实现

6.0版YOLOv5

#include 
#include 
#include 
#include 

using namespace std;
using namespace cv;
using namespace cv::dnn;
using namespace InferenceEngine;

class YOLOObjectDetection {
public:
    void detect(std::string xml, std::string bin, std::string filePath, int camera_index);
private:
    void inferAndOutput(cv::Mat &frame, InferenceEngine::InferRequest &request, InferenceEngine::InputsDataMap & input_info,
        InferenceEngine::OutputsDataMap &output_info, float sx, float sy);
};

/*====================================================================*/

#include 

using namespace std;
using namespace cv;

int main(int argc, char** argv) {
    std::string xml = "D:/python/yolov5/yolov5s.xml";
    std::string bin = "D:/python/yolov5/yolov5s.bin";
    std::string onnx_yolov5s = "D:/python/yolov5/yolov5s.onnx";
    std::string image_file = "D:/python/yolov5/data/images/zidane.jpg";
    std::string video_file = "D:/images/video/Boogie_Up.mp4";
    YOLOObjectDetection yolo_detector;
    yolo_detector.detect(xml, bin, video_file, 1);
    return 0;
}

/*====================================================================*/

#include 

using namespace std;
using namespace cv;
using namespace cv::dnn;
using namespace InferenceEngine;

std::vector anchors = {
    10,13, 16,30, 33,23,
    30,61, 62,45, 59,119,
    116,90, 156,198, 373,326
};

int get_anchor_index(int scale_w, int scale_h) {
    if (scale_w == 20) {
        return 12;
    }
    if (scale_w == 40) {
        return 6;
    }
    if (scale_w == 80) {
        return 0;
    }
    return -1;
}

float get_stride(int scale_w, int scale_h) {
    if (scale_w == 20) {
        return 32.0;
    }
    if (scale_w == 40) {
        return 16.0;
    }
    if (scale_w == 80) {
        return 8.0;
    }
    return -1;
}

float sigmoid_function(float a)
{
    float b = 1. / (1. + exp(-a));
    return b;
}

void YOLOObjectDetection::detect(std::string xml, std::string bin, std::string filePath, int camera_index) {
    VideoCapture cap;
    Mat frame;
    if (camera_index == 0) {
        cap.open(0);
    }
    if (camera_index == 1) {
        cap.open(filePath);
    }
    if (camera_index == -1) {
        frame = imread(filePath);
    }
    if (frame.empty()) {
        cap.read(frame);
    }
    int image_height = frame.rows;
    int image_width = frame.cols;

    // 创建IE插件, 查询支持硬件设备
    Core ie;
    vector availableDevices = ie.GetAvailableDevices();
    for (int i = 0; i setPrecision(Precision::FP32);
        input_data->setLayout(Layout::NCHW);
        input_data->getPreProcess().setResizeAlgorithm(RESIZE_BILINEAR);
        input_data->getPreProcess().setColorFormat(ColorFormat::RGB);
    }

    // 设置输出格式
    for (auto &item : output_info) {
        auto input_name = item.first;
        auto output_data = item.second;
        output_data->setPrecision(Precision::FP32);
    }
    auto executable_network = ie.LoadNetwork(network, "CPU");

    // 请求推断图
    auto infer_request = executable_network.CreateInferRequest();  //生成推理请求
    float scale_x = image_width / 640.0;
    float scale_y = image_height / 640.0;

    if (camera_index == -1) {
        inferAndOutput(frame, infer_request, input_info, output_info, scale_x, scale_y);
        cv::imshow("OpenVINO2021R2+YOLOv5对象检测", frame);
    }
    else {
        while (true) {
            bool ret = cap.read(frame);
            if (frame.empty()) {
                break;
            }
            inferAndOutput(frame, infer_request, input_info, output_info, scale_x, scale_y);
            cv::imshow("YOLOv5s+OpenVINO2021R02+Demo", frame);
            char c = cv::waitKey(1);
            if (c == 27) {
                break;
            }
        }
    }
    waitKey(0);
    destroyAllWindows();
}

void YOLOObjectDetection::inferAndOutput(cv::Mat &frame, InferenceEngine::InferRequest &infer_request,
    InferenceEngine::InputsDataMap & input_info, InferenceEngine::OutputsDataMap &output_info, float sx, float sy) {
    int64 start = getTickCount();

    // 处理解析输出结果
    vector boxes;  //创建三个容器用于保存检测框、分类id及置信度
    vector classIds;
    vector confidences;
    /** Iterating over all input blobs **/
    for (auto & item : input_info) {
        auto input_name = item.first;

        /** Getting input blob **/
        auto input = infer_request.GetBlob(input_name);
        size_t num_channels = input->getTensorDesc().getDims()[1];
        size_t h = input->getTensorDesc().getDims()[2];
        size_t w = input->getTensorDesc().getDims()[3];
        size_t image_size = h*w;
        Mat blob_image;
        resize(frame, blob_image, Size(w, h));
        cvtColor(blob_image, blob_image, COLOR_BGR2RGB);

        // NCHW
        float* data = static_cast(input->buffer());
        for (size_t row = 0; row (row, col)[ch]) / 255.0;
                }
            }
        }
    }

    // 执行预测
    infer_request.Infer();

    //循环遍历三个输出
    for (auto &item : output_info) {
        auto output_name = item.first;
        auto output = infer_request.GetBlob(output_name);

        const float* output_blob = static_cast::value_type*>(output->buffer());
        const SizeVector outputDims = output->getTensorDesc().getDims();
        const int out_n = outputDims[0];  //1张图
        const int out_c = outputDims[1];  //3个检测框
        const int side_h = outputDims[2];  //当前层的height
        const int side_w = outputDims[3];  //当前层的width
        const int side_data = outputDims[4];  //num of class + 5 = 85

        float stride = get_stride(side_h, side_h);  //获取该检测框的放缩倍数
        int anchor_index = get_anchor_index(side_h, side_h);
        int side_square = side_h*side_w;  //当前层的面积
        int side_data_square = side_square*side_data;  //每个检测框都有85个值(cx, cy, w, h, conf + number of class)
        int side_data_w = side_w*side_data;

        //每层每个特征点都有三个检测框,每个检测框都有85个值(cx, cy, w, h, conf + number of class)
        for (int i = 0; i  max_prob) {
                        max_prob = prob;
                        class_index = d - 5;
                    }
                }

                // 转换为top-left, bottom-right坐标
                int x1 = saturate_cast((x - w / 2) * sx);  // top left x
                int y1 = saturate_cast((y - h / 2) * sy);  // top left y
                int x2 = saturate_cast((x + w / 2) * sx);  // bottom right x
                int y2 = saturate_cast((y + h / 2) * sy); // bottom right y

                // 解析输出
                classIds.push_back(class_index);
                confidences.push_back((float)conf);
                boxes.push_back(Rect(x1, y1, x2 - x1, y2 - y1));
            }
        }
    }

    vector indices;
    cv::dnn::NMSBoxes(boxes, confidences, 0.25, 0.5, indices);  //非最大抑制,去除同一个对象上的多个输出框
    for (size_t i = 0; i 

7.0版YOLOv5

#include                    //C++ 文件操作
#include                   //C++ input & output stream
#include                    //C++ String stream, 读写内存中的string对象
#include         //OpenCV 头文件

#include      //OpenVINO >=2022.1

using namespace std;
using namespace ov;
using namespace cv;
// COCO数据集的标签
vector class_names = { "person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat", "traffic light","fire hydrant",
"stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe","backpack", "umbrella",
"handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove","skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange","broccoli", "carrot",
"hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse","remote",
"keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" };
//OpenVINO IR模型文件路径
string ir_filename = "D:/yolov5/yolov5s.xml";

// @brief 对网络的输入为图片数据的节点进行赋值,实现图片数据输入网络
// @param input_tensor 输入节点的tensor
// @param inpt_image 输入图片数据
void fill_tensor_data_image(ov::Tensor& input_tensor, const cv::Mat& input_image) {
    // 获取输入节点要求的输入图片数据的大小
    ov::Shape tensor_shape = input_tensor.get_shape();
    const size_t width = tensor_shape[3]; // 要求输入图片数据的宽度
    const size_t height = tensor_shape[2]; // 要求输入图片数据的高度
    const size_t channels = tensor_shape[1]; // 要求输入图片数据的维度
    // 读取节点数据内存指针
    float* input_tensor_data = input_tensor.data();
    // 将图片数据填充到网络中
    // 原有图片数据为 H、W、C 格式,输入要求的为 C、H、W 格式
    for (size_t c = 0; c >(h, w)[c];
            }
        }
    }
}

int main(int argc, char** argv) {

    //创建OpenVINO Core
    Core core;
    CompiledModel compiled_model = core.compile_model(ir_filename, "AUTO");
    InferRequest infer_request = compiled_model.create_infer_request();

    // 预处理输入数据 - 格式化操作
    VideoCapture cap;
    cap.open(0);
    if (!cap.isOpened()) {
        cout > frame;

        int64 start = cv::getTickCount();
        int w = frame.cols;
        int h = frame.rows;
        int _max = std::max(h, w);
        cv::Mat image = cv::Mat::zeros(cv::Size(_max, _max), CV_8UC3);
        cv::Rect roi(0, 0, w, h);
        frame.copyTo(image(roi));

        float x_factor = image.cols / input_w;
        float y_factor = image.rows / input_h;

        cv::Mat blob_image;
        resize(image, blob_image, cv::Size(input_w, input_h));
        blob_image.convertTo(blob_image, CV_32F);
        blob_image = blob_image / 255.0;

        // 将图片数据填充到tensor数据内存中
        fill_tensor_data_image(input_image_tensor, blob_image);

        // 执行推理计算
        infer_request.infer();

        // 获得推理结果
        const ov::Tensor& output_tensor = infer_request.get_tensor("output");

        // 解析推理结果,YOLOv5 output format: cx,cy,w,h,score
        cv::Mat det_output(out_rows, out_cols, CV_32F, (float*)output_tensor.data());

        std::vector<:rect> boxes;
        std::vector classIds;
        std::vector confidences;

        for (int i = 0; i (i, 4);
            if (confidence  0.5)
            {
                float cx = det_output.at(i, 0);
                float cy = det_output.at(i, 1);
                float ow = det_output.at(i, 2);
                float oh = det_output.at(i, 3);
                int x = static_cast((cx - 0.5 * ow) * x_factor);
                int y = static_cast((cy - 0.5 * oh) * y_factor);
                int width = static_cast(ow * x_factor);
                int height = static_cast(oh * y_factor);
                cv::Rect box;
                box.x = x;
                box.y = y;
                box.width = width;
                box.height = height;

                boxes.push_back(box);
                classIds.push_back(classIdPoint.x);
                confidences.push_back(score);
            }
        }
        // NMS
        std::vector indexes;
        cv::dnn::NMSBoxes(boxes, confidences, 0.25, 0.45, indexes);
        for (size_t i = 0; i (cv::getTickFrequency());
        cout 

10、Python版本SDK配置与YOLOv5部署推理

Python版本环境配置

  • 环境变量与DLL加载支持
  • VS2019支持
  • Python的PYTHONPATH支持与配置

OpenVINO计算机视觉模型加速插图51

  • 测试安装与配置
  • 控制台python环境下导入openvino测试:
from openvino.inference_engine import IECore

OpenVINO计算机视觉模型加速插图52

  • Pycharm测试:
from openvino.inference_engine import IECore

ie = IECore()
devices = ie.available_devices

for device in devices:
    print(device)

OpenVINO计算机视觉模型加速插图53

ResNet18图像分类部署推理

代码实现

from openvino.inference_engine import IECore
import numpy as np
import cv2 as cv

ie = IECore()
for device in ie.available_devices:
    print(device)

with open('imagenet_classes.txt') as f:
    labels = [line.strip() for line in f.readlines()]

model_xml = "resnet18.xml"
model_bin = "resnet18.bin"

net = ie.read_network(model=model_xml, weights= model_bin)
input_blob = next(iter(net.input_info))
out_blob = next(iter(net.outputs))

n, c, h, w = net.input_info[input_blob].input_data.shape
print(n, c, h, w)

src = cv.imread("D:/images/messi.jpg")
image = cv.resize(src, (w, h))
image = np.float32(image) / 255.0
image[:, :, ] -= (np.float32(0.485), np.float32(0.456), np.float32(0.406))
image[:, :, ] /= (np.float32(0.229), np.float32(0.224), np.float32(0.225))
image = image.transpose(2, 0, 1)

exec_net = ie.load_network(network=net, device_name="CPU")
res = exec_net.infer(inputs={input_blob:[image]})

res = res[out_blob]
print(res.shape)
label_index = np.argmax(res, 1)[0]
print(label_index, labels[label_index])
cv.putText(src, labels[label_index], (50, 50), cv.FONT_HERSHEY_SIMPLEX, 1.0, (0, 0, 255), 2, 8)
cv.namedWindow("image classification", cv.WINDOW_FREERATIO)
cv.imshow("image classification", src)
cv.waitKey(0)

效果:

OpenVINO计算机视觉模型加速插图54

Python版本YOLOv5部署推理

代码实现:

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

import argparse
import os
import platform
import sys
from pathlib import Path

import torch

FILE = Path(__file__).resolve()
ROOT = FILE.parents[0]  # YOLOv5 root directory
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))  # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative

from models.common import DetectMultiBackend
from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadScreenshots, LoadStreams
from utils.general import (LOGGER, Profile, check_file, check_img_size, check_imshow, check_requirements, colorstr, cv2,
                           increment_path, non_max_suppression, print_args, scale_boxes, strip_optimizer, xyxy2xywh)
from utils.plots import Annotator, colors, save_one_box
from utils.torch_utils import select_device, smart_inference_mode

@smart_inference_mode()
def run(
        weights=ROOT / 'yolov5s.onnx',  # model path or triton URL
        # weights=ROOT / 'yolov5s.pt',  # model path or triton URL
        source= 'D:/images/video/Boogie_Up.mp4',  # file/dir/URL/glob/screen/0(webcam)
        # source=ROOT / 'data/images',  # file/dir/URL/glob/screen/0(webcam)
        data=ROOT / 'data/coco128.yaml',  # dataset.yaml path
        imgsz=(640, 640),  # inference size (height, width)
        conf_thres=0.25,  # confidence threshold
        iou_thres=0.45,  # NMS IOU threshold
        max_det=1000,  # maximum detections per image
        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
        view_img=False,  # show results
        save_txt=False,  # save results to *.txt
        save_conf=False,  # save confidences in --save-txt labels
        save_crop=False,  # save cropped prediction boxes
        nosave=False,  # do not save images/videos
        classes=None,  # filter by class: --class 0, or --class 0 2 3
        agnostic_nms=False,  # class-agnostic NMS
        augment=False,  # augmented inference
        visualize=False,  # visualize features
        update=False,  # update all models
        project=ROOT / 'runs/detect',  # save results to project/name
        name='exp',  # save results to project/name
        exist_ok=False,  # existing project/name ok, do not increment
        line_thickness=3,  # bounding box thickness (pixels)
        hide_labels=False,  # hide labels
        hide_conf=False,  # hide confidences
        half=False,  # use FP16 half-precision inference
        dnn=False,  # use OpenCV DNN for ONNX inference
        vid_stride=1,  # video frame-rate stride
):
    source = str(source)
    save_img = not nosave and not source.endswith('.txt')  # save inference images
    is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS)
    is_url = source.lower().startswith(('rtsp://', 'rtmp://', 'http://', 'https://'))
    webcam = source.isnumeric() or source.endswith('.streams') or (is_url and not is_file)
    screenshot = source.lower().startswith('screen')
    if is_url and is_file:
        source = check_file(source)  # download

    # Directories
    save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
    (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

    # Load model
    device = select_device(device)
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
    stride, names, pt = model.stride, model.names, model.pt
    imgsz = check_img_size(imgsz, s=stride)  # check image size

    # Dataloader
    bs = 1  # batch_size
    if webcam:
        view_img = check_imshow(warn=True)
        dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
        bs = len(dataset)
    elif screenshot:
        dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
    vid_path, vid_writer = [None] * bs, [None] * bs

    # Run inference
    model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))  # warmup
    seen, windows, dt = 0, [], (Profile(), Profile(), Profile())
    for path, im, im0s, vid_cap, s in dataset:
        with dt[0]:
            im = torch.from_numpy(im).to(model.device)
            im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
            im /= 255  # 0 - 255 to 0.0 - 1.0
            if len(im.shape) == 3:
                im = im[None]  # expand for batch dim

        # Inference
        with dt[1]:
            visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
            pred = model(im, augment=augment, visualize=visualize)

        # NMS
        with dt[2]:
            pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

        # Second-stage classifier (optional)
        # pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)

        # Process predictions
        for i, det in enumerate(pred):  # per image
            seen += 1
            if webcam:  # batch_size >= 1
                p, im0, frame = path[i], im0s[i].copy(), dataset.count
                s += f'{i}: '
            else:
                p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0)

            p = Path(p)  # to Path
            save_path = str(save_dir / p.name)  # im.jpg
            txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # im.txt
            s += '%gx%g ' % im.shape[2:]  # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            imc = im0.copy() if save_crop else im0  # for save_crop
            annotator = Annotator(im0, line_width=line_thickness, example=str(names))
            if len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, 5].unique():
                    n = (det[:, 5] == c).sum()  # detections per class
                    s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
                        with open(f'{txt_path}.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + 'n')

                    if save_img or save_crop or view_img:  # Add bbox to image
                        c = int(cls)  # integer class
                        label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                        annotator.box_label(xyxy, label, color=colors(c, True))
                    if save_crop:
                        save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

            # Stream results
            im0 = annotator.result()
            if view_img:
                if platform.system() == 'Linux' and p not in windows:
                    windows.append(p)
                    cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)  # allow window resize (Linux)
                    cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])
                cv2.imshow(str(p), im0)
                cv2.waitKey(1)  # 1 millisecond

            # Save results (image with detections)
            if save_img:
                if dataset.mode == 'image':
                    cv2.imwrite(save_path, im0)
                else:  # 'video' or 'stream'
                    if vid_path[i] != save_path:  # new video
                        vid_path[i] = save_path
                        if isinstance(vid_writer[i], cv2.VideoWriter):
                            vid_writer[i].release()  # release previous video writer
                        if vid_cap:  # video
                            fps = vid_cap.get(cv2.CAP_PROP_FPS)
                            w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                            h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                        else:  # stream
                            fps, w, h = 30, im0.shape[1], im0.shape[0]
                        save_path = str(Path(save_path).with_suffix('.mp4'))  # force *.mp4 suffix on results videos
                        vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
                    vid_writer[i].write(im0)

        # Print time (inference-only)
        LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")

    # Print results
    t = tuple(x.t / seen * 1E3 for x in dt)  # speeds per image
    LOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}' % t)
    if save_txt or save_img:
        s = f"n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
        LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
    if update:
        strip_optimizer(weights[0])  # update model (to fix SourceChangeWarning)

def parse_opt():  //要在这里进行相关设置
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.onnx', help='model path or triton URL')
    # parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model path or triton URL')
    parser.add_argument('--source', type=str, default='D:/images/video/Boogie_Up.mp4', help='file/dir/URL/glob/screen/0(webcam)')
    # parser.add_argument('--source', type=str, default=ROOT / 'data/images', help='file/dir/URL/glob/screen/0(webcam)')
    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='(optional) dataset.yaml path')
    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
    parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
    parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    # parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='show results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes')
    parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --classes 0, or --classes 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--visualize', action='store_true', help='visualize features')
    parser.add_argument('--update', action='store_true', help='update all models')
    parser.add_argument('--project', default=ROOT / 'runs/detect', help='save results to project/name')
    parser.add_argument('--name', default='exp', help='save results to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)')
    parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels')
    parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences')
    parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')
    parser.add_argument('--dnn', action='store_true', help='use OpenCV DNN for ONNX inference')
    parser.add_argument('--vid-stride', type=int, default=1, help='video frame-rate stride')
    opt = parser.parse_args()
    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand
    print_args(vars(opt))
    return opt

def main(opt):
    check_requirements(exclude=('tensorboard', 'thop'))
    run(**vars(opt))

if __name__ == "__main__":
    opt = parse_opt()
    main(opt)

效果:

OpenVINO计算机视觉模型加速插图55

总结思维导图:

OpenVINO计算机视觉模型加速插图56

文章来源于互联网:OpenVINO计算机视觉模型加速

THE END
分享
二维码