Skip to content

Deploying SSD mobileNet V2 on the NVIDIA Jetson and Nano platforms


For one of our clients we were asked to port an object detection neural network to an NVIDIA based mobile platform (Jetson and Nano based). The neural network, created in TensorFlow, was based on the SSD-mobilenet V2 network, but had a number of customizations to make it more suitable to the particular problem that the client faced. During the course of this project we realized that the available open-source resources had several problems for which there was no clear solutions. The problems are discussed in various places such as GitHub Issues against the TensorRT and TensorFlow models repository, but also on the NVIDIA developer forums and on StackOverflow. In this post we cover all the problems we faced and the solutions we found in the hope that it helps others with deploying their solutions on these mobile devices.

Object detection in action, bounding boxes around common objects

Training and Conversion Process

The conversion from a TensorFlow checkpoint to an optimized deployment binary requires the following steps:

  1. Download and setup the TensorFlow Object Detection API
  2. Download a trained checkpoint from the TensorFlow detection model zoo (for this post we focus on ssd_mobilenet_v2_coco ).
  3. Train the network using new data starting from the downloaded checkpoint. When using your custom training data you often change the number of classes and the resolution, for this example we use the following settings:
    ● 6 object classes.
    ● An image resolution of 450 by 450 pixels.
  4. Export the trained snapshot to a frozen inference graph via the tool.
  5. Convert the frozen graph to an UFF file via TensorRT.
  6. Load the UFF file and build a TensorRT execution engine.

For steps 1 to 4 we use the tools supplied by the Google TensorFlow team. As it turns out, the latest versions of those tools are incompatible with the publicly released tools and configuration files that are required for step 5 and 6. Since steps 5 and 6 are required to create a high performance, standalone, inference engine for our target devices we are in a bit of a pickle. One common solution is to not use the latest version of the Object Detection tools, but rather a version nearly 2 years old (early 2018). Although that works, it is not preferable as we lose out on two years of developments, features, bug fixes and optimizations. We therefore set out to fix the reported and observed problems in order to complete all the above-mentioned steps using the latest (at the time of publishing)* version of the TensorFlow Object Detection tool-set.

In the next section we cover the problems we have observed and the fixes we have found in order to solve the problems. By using all the below fixes we have been able to successfully (re)train MobileNet V2 (with different feature extraction back-ends), convert it to UFF and build a TensorRT execution engine.

Software used:

TensorRT conversion process

To convert a frozen graph to UFF you can use the convert-to-uff conversion tool (which is a wrapper around the uff.from_tensorflow API) and a configuration file. The latter contains references to the plugins (including their settings) and graph modification operations required to convert the graph into a set of operations that is supported by TensorRT. Once a UFF file is generated you have to load it into the TensorRT UFF Parser in order to build an execution engine. These steps can either be integrated into a single program, as done in this example, or performed separately. In this post we use the single program setup, and the TRT_object_detection repository as base for the experiments.

NVIDIA Jetson Nano
The NVIDIA Jetson Nano target platform

When a correct configuration is used, the frozen graph is converted into a UFF file, which is then loaded by the parser to create a network. Finally, this network is used to build and optimize an execution engine for the target platform. If the configuration file used does not exactly match the settings used in the frozen graph then any of the above steps can fail. In the next section we list a number of the commonly observed problems and the steps you have to take to fix them.

Observed conversion problems

We can split the observed problems into two categories:

  • Errors related to changes of the network for fine-tuning and retraining (number of classes, resolution).
  • Errors related to newer software versions (conversion errors, missing definitions, renamed classes and options, etc.).

Fine-tuning problems and solutions

Changed number of training classes

When you change the number of training classes you will have to update the number of classes that are configured for the “NMS_TRT” plugin. If you do not change the “numClasses” parameter you would get the following error:

python: nmsPlugin.cpp:140: virtual void 
const nvinfer1::Dims*, int, const nvinfer1::Dims*, int, nvinfer1::DataType, 
nvinfer1::PluginFormat, int): 
Assertion `numPriors * param.numClasses == inputDims[param.inputOrder[1]].d[0]' failed.

For the configuration, the correct number of classes to configure is one greater than the number of classes you have defined since you must include the background class. So for our example of 6 object classes we have to set:


Changed resolution

When the resolution of the input images is changed this affects the location and scaling of the grid anchors that are used by the feature extractor. Without changing the featureMapShapes parameter of the TensorRT GridAnchor plugin you will get the following error:

python: nmsPlugin.cpp:139: virtual void nvinfer1::plugin::DetectionOutput::configureWithFormat(
const nvinfer1::Dims*, int, const nvinfer1::Dims*, int, nvinfer1::DataType,
nvinfer1::PluginFormat, int): 
Assertion `numPriors * numLocClasses * 4 == inputDims[param.inputOrder[0]].d[0]' failed.

To solve this problem, you have to determine the correct sizes of the feature maps. You can get those sizes by inspecting the generated graph (e.g. in TensorBoard) or using the tool below (trimmed to fit this page), which is based on this StackOverflow post.

import sys
import tensorflow as tf
from object_detection.anchor_generators.multiple_grid_anchor_generator import create_ssd_anchors
from object_detection.models.ssd_mobilenet_v2_feature_extractor_test import SsdMobilenetV2FeatureExtractorTest

feature_extractor = SsdMobilenetV2FeatureExtractorTest()._create_feature_extractor(
    depth_multiplier=1, pad_to_multiple=1,)
image_batch_tensor = tf.zeros([1, int(sys.argv[1]), int(sys.argv[2]), 1])
       for feature_map in feature_extractor.extract_features(image_batch_tensor)])

Using this program we get the following for an image resolution of 450px:

$ python model/ 450 450
$ [(29, 29), (15, 15), (8, 8), (4, 4), (2, 2), (1, 1)]

Next, you take the first number of each tuple and add them to the GridAnchor configuration in your TensorRT configuration file, see for example the below settings:

PriorBox = gs.create_plugin_node(
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        # featureMapShapes=[19, 10, 5, 3, 2, 1], # Resolution 300
        featureMapShapes=[29, 15, 8, 4, 2, 1], # Resolution 450

Problems and solutions related to newer software versions

Input Order

For networks that use the “NMS_TRT” plugin (e.g. MobileNet and other object detection networks) you have to specify the inputOrder parameter. This is a list of 3 integers (0, 1 and 2) where the 0,1,2 refer to the matching input of the node as defined in the network. If this order is incorrect it will result in a crash during parsing as sizes mismatch. This is the same error message that would result from using an incorrect feature map when you’ve changed the resolution, so be careful how you interpret that error!

python: nmsPlugin.cpp:139: virtual void nvinfer1::plugin::DetectionOutput::configureWithFormat(
const nvinfer1::Dims*, int, const nvinfer1::Dims*, int, nvinfer1::DataType,
nvinfer1::PluginFormat, int): 
Assertion `numPriors * numLocClasses * 4 == inputDims[param.inputOrder[0]].d[0]' failed.

To solve this problem, you can either try out all 6 combinations, or inspect the text version of the UFF. Here we do the latter. Open the UFF.pbtxt file and browse to the NMS node, and inspect the order of the ‘inputs’, for example:

graphs {
  id: "main"
  nodes {
    id: "NMS"
    inputs: "Squeeze"
    inputs: "concat_priorbox"
    inputs: "concat_box_conf"
    operation: "_NMS_TRT"

Here we see that the inputs are in the following order: Squeeze, concat_priorbox, concat_box_conf.

The NMS plugin requires them in the following order: Squeeze, concat_box_conf, concat_priorbox.

So we have to remap the inputs such that the order is correct, to do this use the following parameter values: inputOrder=[0, 2, 1]

Unsupported operation _Cast

Newer versions of the Object Detection toolkit use a different name for the Input operation. The original mobilenet config is configured to look for the ‘ToFloat’ operation, but this has now been renamed as ‘_Cast’. Hence you will have to change/add the mapping to the namespace_plugin_map as follows:

namespace_plugin_map = {
    "MultipleGridAnchorGenerator": PriorBox,
    "Postprocessor": NMS,
    "Preprocessor": Input,
    "Cast": Input,
    "image_tensor": Input,

This solves the following error:

[TensorRT] ERROR: UffParser: Validator error: Cast: Unsupported operation _Cast
[TensorRT] ERROR: Network must have at least one output

Problems parsing GridAnchor

The final problem that we will address in this post is related to the GridAnchor. When using the latest versions of the Object Detection API and the UFF converter the resulting UFF file is missing an input element for the GridAnchor node. This results in a parsing failure with the following error:

[TensorRT] VERBOSE: UFFParser: Parsing GridAnchor[Op: _GridAnchor_TRT]. 
[libprotobuf FATAL /externals/protobuf/x86_64/10.0/include/google/protobuf/repeated_field.h:1408]
CHECK failed: (index) < (current_size_):

To work around the problem of the missing input node you can manually define a constant input tensor and set that as the input for the GridAnchor node. In this example the values and dimensions of the constant tensor are based on the version that is used in the older versions of the object detection API, namely: [1, 1]. To create and add the node we use the graphsurgeon Python library that is already used to remove and rename nodes during the UFF conversion process. The snippet below shows these steps.

 (... original code ...)
# Create a constant Tensor and set it as input for GridAnchor_TRT
data = np.array([1, 1], dtype=np.float32) 
anchor_input = gs.create_node("AnchorInput", "Const", value=data)  
graph.find_nodes_by_op("GridAnchor_TRT")[0].input.insert(0, "AnchorInput")

return graph

And with the above we accomplished our goals


The above steps show an example of the problems one can encounter when trying to use a combination of closed and open-source software where the supported versions are not always kept in sync. We hope that others can benefit from the exercise we went through in order to debug and fix all the above reported problems.

We have tested the above on the following pre-trained, and then re-trained by us, networks:

  • ssd_mobilenet_v2_coco_2018_03_29
  • ssd_inception_v2_coco_2018_01_28

The full configuration file that we used can be found here (note here we use the default settings for a network trained with the COCO dataset; 90 classes, 300×300 pixel resolution). This configuration file can be used in combination with the parse and build code in this repository.


The following configuration files contain all the above described fixes:


How can we help?

Reach out – we’d love to hear about how we can help.

We use cookies and similar technologies to enable services and functionality on our site and to understand your interaction with our service. By clicking on accept, you agree to our use of such technologies for analytics. See Privacy Policy

Leave this field blank