DeepStream Tracker#

This notebook is a continuation of the intro ๐Ÿ‘ˆ๐Ÿผ๐Ÿ‘ˆ๐Ÿผ and the pipeline ๐Ÿ‘ˆ๐Ÿผ๐Ÿ‘ˆ๐Ÿผโ€ฆ The tracker concept is the one I mentioned about being embarrassed in the Intro๐Ÿคฆ๐Ÿปโ€โ™‚๏ธ Well, here is a self-assignment to build a simple tracker, so I donโ€™t feel the pain of embarrassment anymore ๐Ÿ˜‰.

Some prerequisites: Download NVIDIA pretrained PeopleNet for detection from NGC

mkdir -p models/PeopleNet
cd models/PeopleNet
wget --no-check-certificate --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/deployable_quantized_onnx_v2.6.2/zip -O peoplenet_deployable_quantized_onnx_v2.6.2.zip
unzip peoplenet_deployable_quantized_onnx_v2.6.2.zip

houskeeping stuff that matters ๐Ÿคฆ๐Ÿปโ€โ™‚๏ธ#

import os

# Set the input video path to an environment variable
os.environ['TARGET_VIDEO_PATH']='input_720p.h264'
os.environ['TARGET_VIDEO_PATH_MP4']='input.mp4'

target_video_path=os.environ['TARGET_VIDEO_PATH']
target_video_path_mp4=os.environ['TARGET_VIDEO_PATH_MP4']

# Analyze video
!ffprobe -i $TARGET_VIDEO_PATH \
         -hide_banner
Input #0, h264, from 'input_720p.h264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (High), yuv420p(progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1200k tbn, 60 tbc

we will use this sample single view input video from a departmental store#

from IPython.display import Video
Video("input.mp4", width=720)

Initialize GStreamer and create pipeline#

# Import necessary GStreamer libraries and DeepStream python bindings
import gi
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst, GLib
import pyds

# Initialize GStreamer
Gst.init(None)

# Create Pipeline element that will form a connection of other elements
pipeline=Gst.Pipeline()
print('Created pipeline')
Created pipeline

Here is the shape of our pipeline#

tracker_pipeline

  1. We will start with the h264parse๐Ÿ‘ˆ๐Ÿผ that takes H.264 stream โ€“> Align frames, add metadata and then produce Parsed H.264 stream as an input to nvv4l2decoder๐Ÿ‘ˆ๐Ÿผ which then create RAW video/x-raw(memory:NVMM) i.e. decode video to raw frames using NVIDIA hardware

  2. Multiple streams (indicated by batch-size) of raw videos in NVMM memory format i.e. video/x-raw(memory:NVMM) is fed into nvstreammux๐Ÿ‘ˆ๐Ÿผ which combines multiple input streams into a single batched buffer creating output of video/x-raw(memory:NVMM), format=NV12. โš ๏ธ๐Ÿ‘‰๐Ÿผ Note: in our example we only have a single stream

  3. nvinfer๐Ÿ‘ˆ๐Ÿผ takes the batched buffer of raw frames and perform inference using the model we provide. In our case, we will use PeopleNet model to detect people in the frame. The output is same as input i.e. video/x-raw(memory:NVMM), format=NV12 but with metadata NvDsObjectMeta containing bounding boxes, confidence score, and class IDs. In our case the batch-size is 1 in streammux config and absent in config_infer_primary_peoplenet.txt config, so the inference is defaulted to sing single frame at a time

  4. The nvtracker๐Ÿ‘ˆ๐Ÿผ takes the output from nvinfer and perform tracking. The output is same as input i.e. video/x-raw(memory:NVMM), format=NV12 but with metadata NvDsObjectMeta containing bounding boxes, confidence score, class IDs - the tracker would a unique tracking_id. It continuously updates object positions across frames

  5. Lastly โค๏ธ nvinfer, nvtracker, and other DeepStream plugins generate metadata NvDsObjectMeta โ€” like bounding boxes, class labels, confidence scores, tracking IDs โ€” but they donโ€™t draw anything on the video. nvdsosd is what takes that metadata and visually overlays it on the frames

    • This is where we would add the probe function to make the bounding boxes a bit fancy and enlarge the labelsโ€ฆ

Create filesrc โ€“> h264parse โ€“> nvv4l2decoder โ€“> nvstreammux elements#

# Create Source element for reading from a file and set the location property
source=Gst.ElementFactory.make("filesrc", "file-source")
source.set_property('location', target_video_path)

# Create H264 Parser with h264parse as the input file is an elementary h264 stream
h264parser=Gst.ElementFactory.make("h264parse", "h264-parser")

# Create Decoder with nvv4l2decoder for accelerated decoding on GPU
decoder=Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")

# Create Streamux with nvstreammux to form batches for one or more sources and set properties
streammux=Gst.ElementFactory.make("nvstreammux", "stream-muxer")
streammux.set_property('width', 1920) 
streammux.set_property('height', 1080) 
streammux.set_property('batch-size', 1)

print('Created elements')
Created elements

Peoplenet model as detector configured in nvinfer#

The Peoplenet model detects persons, bags, and faces in an image / frame. We will configure this model using the nvinfer plugin as our primary inference element.

The nvinfer plugin performs transformation (format conversion and scaling) on the input frame based on network requirements and passes the transformed data to the low-level library. The low-level library pre-processes the transformed frames (performs normalization and mean subtraction) and produces final float RGB/BGR/GRAY planar data which are passed to the TensorRT engine for inferencing.

# Create Primary GStreamer Inference Element with nvinfer to run inference on the decoder's output after batching
pgie=Gst.ElementFactory.make("nvinfer", "primary-inference")

# Set the configuration-file-path property for nvinfer
pgie.set_property('config-file-path', '/dli/task/config/config_infer_primary.txt')

here is the config file config_infer_primary.txt

โš ๏ธ ๐Ÿ‘‰๐Ÿผ๐Ÿ‘‰๐Ÿผ๐Ÿ‘‰๐Ÿผ note: when you download the model, there is no engine file. DeepStream (via TensorRT) will attempt to build the engine file ...onnx_b1_gpu0_int8.engine on first runโ€ฆ

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373

infer-dims=3;544;960
int8-calib-file=../models/PeopleNet/resnet34_peoplenet_int8.txt
labelfile-path=../models/PeopleNet/labels.txt
onnx-file=../models/PeopleNet/resnet34_peoplenet.onnx
model-engine-file=../models/PeopleNet/resnet34_peoplenet.onnx_b1_gpu0_int8.engine

...

Ooooh and ahh - the tracker#

The nvtracker plugin is used to track objects across frames. The nvtracker documentation is huge - help yourself ;)

import configparser

tracker = Gst.ElementFactory.make("nvtracker", "tracker")

tracker_config_file = "/dli/task/config/tracker.txt"

# Parse tracker config file and set properties
config = configparser.ConfigParser()
config.read(tracker_config_file)
config.sections()

for key in config['tracker']:
    if key == 'tracker-width' :
        tracker_width = config.getint('tracker', key)
        tracker.set_property('tracker-width', tracker_width)
    if key == 'tracker-height' :
        tracker_height = config.getint('tracker', key)
        tracker.set_property('tracker-height', tracker_height)
    if key == 'gpu-id' :
        tracker_gpu_id = config.getint('tracker', key)
        tracker.set_property('gpu_id', tracker_gpu_id)
    if key == 'll-lib-file' :
        tracker_ll_lib_file = config.get('tracker', key)
        tracker.set_property('ll-lib-file', tracker_ll_lib_file)
    if key == 'll-config-file' :
        tracker_ll_config_file = config.get('tracker', key)
        tracker.set_property('ll-config-file', tracker_ll_config_file)

here is the config file tracker.txt

โš ๏ธ ๐Ÿ‘‰๐Ÿผ๐Ÿ‘‰๐Ÿผ๐Ÿ‘‰๐Ÿผ We will use the NvMultiObjectTracker (mentioned under ll-lib-file) as the low-level tracker library that already supports various types of multi-object tracking algorithms such as

  • Intersection-Over-Union (IOU) tracker

  • NVIDIAยฎ-enhanced Simple Online and Realtime Tracking (NvSORT)

  • NVIDIAยฎ-enhanced Online and Realtime Tracking with a Deep Association Metric (NvDeepSORT)

  • NvDCF tracker is an online multi-object tracker

The default for DeepStream 6.1 onwards is NvDCF - but we could explictily supply the backend tracker config under ll-config-file

[tracker]
enable=1
tracker-width=1920
tracker-height=1080
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml
gpu-id=0

Create rest of the elements#

  • The nvvideoconvert plugin converts frames from NV12 (YUV) to RGBA as required by nvdsosd. It is also capable of performing scaling, cropping, and rotating on the frames.

  • The nvdsosd plugin draws bounding boxes and texts based on the metadata. It requires RGBA buffer as well as NvDsBatchMeta.

  • The nvvideoconvert plugin converts frames from RGBA to I420 (YUV) as required by avenc_mpeg4.

  • The capsfilter plugin does not modify data as such, but can enforce limitations on the data format. We use it to enforce the video conversion by nvvideoconvert to I420 (YUV) format.

  • The avenc_mpeg4 plugin encodes the I420 formatted frames using the MPEG4 codec.

  • The filesink plugin writes incoming data to a file in the local file system.

More information about the plugins can be found in the DeepStream Plugin Guide and GStreamer Plugin Guide.

# Create Convertor to convert from YUV to RGBA as required by nvdsosd
nvvidconv1=Gst.ElementFactory.make("nvvideoconvert", "convertor1")

# Create OSD with nvdsosd to draw on the converted RGBA buffer
nvosd=Gst.ElementFactory.make("nvdsosd", "onscreendisplay")

# Create Convertor to convert from RGBA to I420 as required by encoder
nvvidconv2=Gst.ElementFactory.make("nvvideoconvert", "convertor2")

# Create Capsfilter to enforce frame image format
capsfilter=Gst.ElementFactory.make("capsfilter", "capsfilter")
caps=Gst.Caps.from_string("video/x-raw, format=I420")
capsfilter.set_property("caps", caps)

# Create Encoder to encode I420 formatted frames using the MPEG4 codec
encoder = Gst.ElementFactory.make("avenc_mpeg4", "encoder")
encoder.set_property("bitrate", 2000000)

# Create Sink and set the location for the output file
filesink=Gst.ElementFactory.make('filesink', 'filesink')
filesink.set_property('location', 'output_03_encoded.mpeg4')
filesink.set_property("sync", 1)
print('Additional Created elements')
Additional Created elements

Define bus call function#

Each pipeline contains a bus, which handles forwarding messages from streaming threads to the application. A message handler must be set, which will periodically check for new messages and call the callback function when a message is available. We define the callback below to handle end-of-stream (EOS) messages, warnings, and errors.

import gi
import sys
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst
def bus_call(bus, message, loop):
    t = message.type
    if t == Gst.MessageType.EOS:
        sys.stdout.write("End-of-stream")
        loop.quit()
    elif t==Gst.MessageType.WARNING:
        err, debug = message.parse_warning()
        sys.stderr.write("Warning: %s: %s\n" % (err, debug))
    elif t == Gst.MessageType.ERROR:
        err, debug = message.parse_error()
        sys.stderr.write("Error: %s: %s\n" % (err, debug))
        loop.quit()
    return True

# Create an event loop
loop=GLib.MainLoop()

# Feed GStreamer bus messages to loop
bus=pipeline.get_bus()
bus.add_signal_watch()
bus.connect("message", bus_call, loop)
print('Added bus message handler')
Added bus message handler

Probe function#

We will add a callback function on the sink pad of the nvdsosd plugin to access metadata. Using the metadata, the function will assign a unique color to each tracklet (object being tracked) and enlarge the label text (Object ID)โ€ฆ

import pyds
import random

def get_color_from_id(object_id):
    # Generate a consistent color based on object_id
    random.seed(object_id)
    r = random.random()
    g = random.random()
    b = random.random()
    return r, g, b

def osd_sink_pad_buffer_probe(pad, info, u_data):
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        return Gst.PadProbeReturn.OK

    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list

    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        l_obj = frame_meta.obj_meta_list
        while l_obj is not None:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break

            # Unique color per object using object_id
            object_id = obj_meta.object_id
            random.seed(object_id)
            r, g, b = random.random(), random.random(), random.random()

            obj_meta.rect_params.border_color.set(r, g, b, 1.0)
            obj_meta.rect_params.border_width = 5           
            
            # Set label font size and color
            obj_meta.text_params.font_params.font_size = 18  # Larger font
            obj_meta.text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)  # White text

            # Set label background to black
            obj_meta.text_params.set_bg_clr = 1  # Enable background color
            obj_meta.text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0)  # Black background


            l_obj = l_obj.next
        l_frame = l_frame.next

    return Gst.PadProbeReturn.OK
osd_sink_pad = nvosd.get_static_pad("sink")
osd_sink_pad.add_probe(Gst.PadProbeType.BUFFER, osd_sink_pad_buffer_probe, 0)
1

Running the pipeline#

print("Starting pipeline \n")
pipeline.set_state(Gst.State.PLAYING)
try:
    loop.run()
except:
    pass
# cleaning up as the pipeline comes to an end
pipeline.set_state(Gst.State.NULL)
Starting pipeline 

gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
~~ CLOG[include/modules/NvMultiObjectTracker/NvTrackerParams.hpp, getConfigRoot() @line 52]: [NvTrackerParams::getConfigRoot()] !!![WARNING] Invalid low-level config file caused an exception, but will go ahead with the default config values
~~ CLOG[include/modules/NvMultiObjectTracker/NvTrackerParams.hpp, getConfigRoot() @line 52]: [NvTrackerParams::getConfigRoot()] !!![WARNING] Invalid low-level config file caused an exception, but will go ahead with the default config values
[NvMultiObjectTracker] Initialized
0:00:03.203686321   611      0x4bbecf0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1988> [UID = 1]: deserialized trt engine from :/dli/task/models/PeopleNet/resnet34_peoplenet.onnx_b1_gpu0_int8.engine
ERROR: [TRT]: 3: Cannot find binding of given name: output_cov/Sigmoid
0:00:03.258359350   611      0x4bbecf0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: Could not find output layer 'output_cov/Sigmoid' in engine
ERROR: [TRT]: 3: Cannot find binding of given name: output_bbox/BiasAdd
0:00:03.258380058   611      0x4bbecf0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: Could not find output layer 'output_bbox/BiasAdd' in engine
0:00:03.258389086   611      0x4bbecf0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2091> [UID = 1]: Use deserialized engine model: /dli/task/models/PeopleNet/resnet34_peoplenet.onnx_b1_gpu0_int8.engine
0:00:03.262912004   611      0x4bbecf0 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:/dli/task/config/config_infer_primary.txt sucessfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1:0       3x544x960       
1   OUTPUT kFLOAT output_cov/Sigmoid:0 3x34x60         
2   OUTPUT kFLOAT output_bbox/BiasAdd:0 12x34x60        

nvstreammux: Successfully handled EOS for source_id=0
End-of-stream[NvMultiObjectTracker] De-initialized
<enum GST_STATE_CHANGE_SUCCESS of type Gst.StateChangeReturn>
# Convert MPEG4 video file to MP4 container file
!ffmpeg -i /dli/task/output_03_encoded.mpeg4 /dli/task/output.mp4 -y -loglevel quiet
from IPython.display import Video
Video("output.mp4", width=720)

niceโ€ฆ we could do all kinds of stuff and customizations with the metadata in the probe functionโ€ฆ we could also add secondory detectors to add high level caption or extract the cropped images of the detected objects to create embeddings etcโ€ฆ