Testing and demonstrating functionalities of tensorRT framework in a C++ application. In particular, I tackle how NMS plugin can be used.
The goal of this repo is to look into how one can implement deep learning based detection inference workloads (For example, those using YOLO) The main library is largely based on cyrusbehr's repo, but I removed building related functions. Use trtexec to build the engine.
This repo aims to provide three things:
- Lay out how to prepare a onnx based detection model, add NMS TensorRT plugin
add_nms_to_graph.py
(You should edit this file to fit your model) - Provide a C++ TensorRT execution library
trt_infer_engine.h, trt_infer_engine.cpp
- Demonstrate how this library can be used in real scenario (WIP)
In add_nms_to_graph.py
, onnxsim is used to simplify the graph and apply some basic operation fusion(conv+BN -> conv, trim out constants). Although running onnxsim doesn't actually improve the end-result trt engine, it makes visualization way simpler so I consider it as best practice.
Then, EfficientNMS plugin is attached to appropriate output nodes. TensorRT plugins such as batchedNMS and efficientNMS enable NMS inside TensorRT model. Unfortunately, other popular NMS methods such as matrix NMS and soft NMS do not have plugins yet, as far as I know. As softNMS is widely used in the industry, one can try implementing softNMS plugin.
Although devs claim that EfficientNMS plugin will be deprecated and users should be using INMSLayer instead,(source) it's way simpler to just use EfficientNMS plugin. So I use it here.
An example of this would look something like this:
trtexec --onnx=ppyoloe_plus_crn_l_80e_coco_w_trt_nms.onnx --saveEngine=ppyoloe_plus_crn_l_80e_coco_w_trt_nms.trt --fp16 --infStreams=1 --memPoolSize=workspace:2048 --iterations=100
You are encouraged to try multiple values of infStreams or memPoolSize, to see which maximizes the inference speed in your system.