DeepStream Tracker#
This notebook is a continuation of the intro ๐๐ผ๐๐ผ and the pipeline ๐๐ผ๐๐ผโฆ The tracker concept is the one I mentioned about being embarrassed in the Intro๐คฆ๐ปโโ๏ธ Well, here is a self-assignment to build a simple tracker, so I donโt feel the pain of embarrassment anymore ๐.
Some prerequisites: Download NVIDIA pretrained PeopleNet for detection from NGC
mkdir -p models/PeopleNet
cd models/PeopleNet
wget --no-check-certificate --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/tao/peoplenet/versions/deployable_quantized_onnx_v2.6.2/zip -O peoplenet_deployable_quantized_onnx_v2.6.2.zip
unzip peoplenet_deployable_quantized_onnx_v2.6.2.zip
houskeeping stuff that matters ๐คฆ๐ปโโ๏ธ#
import os
# Set the input video path to an environment variable
os.environ['TARGET_VIDEO_PATH']='input_720p.h264'
os.environ['TARGET_VIDEO_PATH_MP4']='input.mp4'
target_video_path=os.environ['TARGET_VIDEO_PATH']
target_video_path_mp4=os.environ['TARGET_VIDEO_PATH_MP4']
# Analyze video
!ffprobe -i $TARGET_VIDEO_PATH \
-hide_banner
Input #0, h264, from 'input_720p.h264':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1200k tbn, 60 tbc
we will use this sample single view input video from a departmental store#
from IPython.display import Video
Video("input.mp4", width=720)
Initialize GStreamer and create pipeline#
# Import necessary GStreamer libraries and DeepStream python bindings
import gi
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst, GLib
import pyds
# Initialize GStreamer
Gst.init(None)
# Create Pipeline element that will form a connection of other elements
pipeline=Gst.Pipeline()
print('Created pipeline')
Created pipeline
Here is the shape of our pipeline#
We will start with the h264parse๐๐ผ that takes
H.264 stream
โ> Align frames, add metadata and then produceParsed H.264 stream
as an input to nvv4l2decoder๐๐ผ which then create RAWvideo/x-raw(memory:NVMM)
i.e. decode video to raw frames using NVIDIA hardwareMultiple streams (indicated by
batch-size
) of raw videos in NVMM memory format i.e.video/x-raw(memory:NVMM)
is fed into nvstreammux๐๐ผ which combines multiple input streams into a single batched buffer creating output ofvideo/x-raw(memory:NVMM), format=NV12
. โ ๏ธ๐๐ผ Note: in our example we only have a single streamnvinfer๐๐ผ takes the batched buffer of raw frames and perform inference using the model we provide. In our case, we will use
PeopleNet
model to detect people in the frame. The output is same as input i.e.video/x-raw(memory:NVMM), format=NV12
but with metadataNvDsObjectMeta
containing bounding boxes, confidence score, and class IDs. In our case the batch-size is 1 instreammux
config and absent inconfig_infer_primary_peoplenet.txt
config, so the inference is defaulted to sing single frame at a timeThe nvtracker๐๐ผ takes the output from
nvinfer
and perform tracking. The output is same as input i.e.video/x-raw(memory:NVMM), format=NV12
but with metadataNvDsObjectMeta
containing bounding boxes, confidence score, class IDs - the tracker would a uniquetracking_id
. It continuously updates object positions across framesLastly โค๏ธ
nvinfer
,nvtracker
, and other DeepStream plugins generate metadataNvDsObjectMeta
โ like bounding boxes, class labels, confidence scores, tracking IDs โ but they donโt draw anything on the video.nvdsosd
is what takes that metadata and visually overlays it on the framesThis is where we would add the probe function to make the bounding boxes a bit fancy and enlarge the labelsโฆ
Create filesrc โ> h264parse โ> nvv4l2decoder โ> nvstreammux elements#
# Create Source element for reading from a file and set the location property
source=Gst.ElementFactory.make("filesrc", "file-source")
source.set_property('location', target_video_path)
# Create H264 Parser with h264parse as the input file is an elementary h264 stream
h264parser=Gst.ElementFactory.make("h264parse", "h264-parser")
# Create Decoder with nvv4l2decoder for accelerated decoding on GPU
decoder=Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")
# Create Streamux with nvstreammux to form batches for one or more sources and set properties
streammux=Gst.ElementFactory.make("nvstreammux", "stream-muxer")
streammux.set_property('width', 1920)
streammux.set_property('height', 1080)
streammux.set_property('batch-size', 1)
print('Created elements')
Created elements
Peoplenet model as detector configured in nvinfer#
The Peoplenet model detects persons, bags, and faces in an image / frame. We will configure this model using the nvinfer
plugin as our primary inference element.
The nvinfer plugin performs transformation (format conversion and scaling) on the input frame based on network requirements and passes the transformed data to the low-level library. The low-level library pre-processes the transformed frames (performs normalization and mean subtraction) and produces final float RGB/BGR/GRAY planar data which are passed to the TensorRT engine for inferencing.
# Create Primary GStreamer Inference Element with nvinfer to run inference on the decoder's output after batching
pgie=Gst.ElementFactory.make("nvinfer", "primary-inference")
# Set the configuration-file-path property for nvinfer
pgie.set_property('config-file-path', '/dli/task/config/config_infer_primary.txt')
here is the config file config_infer_primary.txt
โ ๏ธ ๐๐ผ๐๐ผ๐๐ผ note: when you download the model, there is no engine file. DeepStream (via TensorRT) will attempt to build the engine file
...onnx_b1_gpu0_int8.engine
on first runโฆ
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
infer-dims=3;544;960
int8-calib-file=../models/PeopleNet/resnet34_peoplenet_int8.txt
labelfile-path=../models/PeopleNet/labels.txt
onnx-file=../models/PeopleNet/resnet34_peoplenet.onnx
model-engine-file=../models/PeopleNet/resnet34_peoplenet.onnx_b1_gpu0_int8.engine
...
Ooooh and ahh - the tracker#
The nvtracker plugin is used to track objects across frames. The nvtracker documentation is huge - help yourself ;)
import configparser
tracker = Gst.ElementFactory.make("nvtracker", "tracker")
tracker_config_file = "/dli/task/config/tracker.txt"
# Parse tracker config file and set properties
config = configparser.ConfigParser()
config.read(tracker_config_file)
config.sections()
for key in config['tracker']:
if key == 'tracker-width' :
tracker_width = config.getint('tracker', key)
tracker.set_property('tracker-width', tracker_width)
if key == 'tracker-height' :
tracker_height = config.getint('tracker', key)
tracker.set_property('tracker-height', tracker_height)
if key == 'gpu-id' :
tracker_gpu_id = config.getint('tracker', key)
tracker.set_property('gpu_id', tracker_gpu_id)
if key == 'll-lib-file' :
tracker_ll_lib_file = config.get('tracker', key)
tracker.set_property('ll-lib-file', tracker_ll_lib_file)
if key == 'll-config-file' :
tracker_ll_config_file = config.get('tracker', key)
tracker.set_property('ll-config-file', tracker_ll_config_file)
here is the config file tracker.txt
โ ๏ธ ๐๐ผ๐๐ผ๐๐ผ We will use the NvMultiObjectTracker (mentioned under
ll-lib-file
) as the low-level tracker library that already supports various types of multi-object tracking algorithms such as
Intersection-Over-Union (IOU) tracker
NVIDIAยฎ-enhanced Simple Online and Realtime Tracking (NvSORT)
NVIDIAยฎ-enhanced Online and Realtime Tracking with a Deep Association Metric (NvDeepSORT)
NvDCF tracker is an online multi-object tracker
The default for DeepStream 6.1 onwards is NvDCF
- but we could explictily supply the backend tracker config under ll-config-file
[tracker]
enable=1
tracker-width=1920
tracker-height=1080
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml
gpu-id=0
Create rest of the elements#
The nvvideoconvert plugin converts frames from NV12 (YUV) to RGBA as required by
nvdsosd
. It is also capable of performing scaling, cropping, and rotating on the frames.The nvdsosd plugin draws bounding boxes and texts based on the metadata. It requires RGBA buffer as well as
NvDsBatchMeta
.The nvvideoconvert plugin converts frames from RGBA to I420 (YUV) as required by
avenc_mpeg4
.The capsfilter plugin does not modify data as such, but can enforce limitations on the data format. We use it to enforce the video conversion by
nvvideoconvert
to I420 (YUV) format.The avenc_mpeg4 plugin encodes the I420 formatted frames using the MPEG4 codec.
The filesink plugin writes incoming data to a file in the local file system.
More information about the plugins can be found in the DeepStream Plugin Guide and GStreamer Plugin Guide.
# Create Convertor to convert from YUV to RGBA as required by nvdsosd
nvvidconv1=Gst.ElementFactory.make("nvvideoconvert", "convertor1")
# Create OSD with nvdsosd to draw on the converted RGBA buffer
nvosd=Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
# Create Convertor to convert from RGBA to I420 as required by encoder
nvvidconv2=Gst.ElementFactory.make("nvvideoconvert", "convertor2")
# Create Capsfilter to enforce frame image format
capsfilter=Gst.ElementFactory.make("capsfilter", "capsfilter")
caps=Gst.Caps.from_string("video/x-raw, format=I420")
capsfilter.set_property("caps", caps)
# Create Encoder to encode I420 formatted frames using the MPEG4 codec
encoder = Gst.ElementFactory.make("avenc_mpeg4", "encoder")
encoder.set_property("bitrate", 2000000)
# Create Sink and set the location for the output file
filesink=Gst.ElementFactory.make('filesink', 'filesink')
filesink.set_property('location', 'output_03_encoded.mpeg4')
filesink.set_property("sync", 1)
print('Additional Created elements')
Additional Created elements
Add all elements to pipeline and link them#
# Add elements to pipeline
pipeline.add(source)
pipeline.add(h264parser)
pipeline.add(decoder)
pipeline.add(streammux)
pipeline.add(pgie)
pipeline.add(tracker)
pipeline.add(nvvidconv1)
pipeline.add(nvosd)
pipeline.add(nvvidconv2)
pipeline.add(capsfilter)
pipeline.add(encoder)
pipeline.add(filesink)
print('Added elements to pipeline')
Added elements to pipeline
# Link elements in the pipeline
source.link(h264parser)
h264parser.link(decoder)
# Link decoder source pad to streammux sink pad
decoder_srcpad=decoder.get_static_pad("src")
streammux_sinkpad=streammux.get_request_pad("sink_0")
decoder_srcpad.link(streammux_sinkpad)
streammux.link(pgie)
pgie.link(tracker)
tracker.link(nvvidconv1)
nvvidconv1.link(nvosd)
nvosd.link(nvvidconv2)
nvvidconv2.link(capsfilter)
capsfilter.link(encoder)
encoder.link(filesink)
print('Linked elements in pipeline')
Linked elements in pipeline
Define bus call function#
Each pipeline contains a bus, which handles forwarding messages from streaming threads to the application. A message handler must be set, which will periodically check for new messages and call the callback function when a message is available. We define the callback below to handle end-of-stream (EOS) messages, warnings, and errors.
import gi
import sys
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst
def bus_call(bus, message, loop):
t = message.type
if t == Gst.MessageType.EOS:
sys.stdout.write("End-of-stream")
loop.quit()
elif t==Gst.MessageType.WARNING:
err, debug = message.parse_warning()
sys.stderr.write("Warning: %s: %s\n" % (err, debug))
elif t == Gst.MessageType.ERROR:
err, debug = message.parse_error()
sys.stderr.write("Error: %s: %s\n" % (err, debug))
loop.quit()
return True
# Create an event loop
loop=GLib.MainLoop()
# Feed GStreamer bus messages to loop
bus=pipeline.get_bus()
bus.add_signal_watch()
bus.connect("message", bus_call, loop)
print('Added bus message handler')
Added bus message handler
Probe function#
We will add a callback function on the sink pad of the nvdsosd
plugin to access metadata. Using the metadata, the function will assign a unique color to each tracklet (object being tracked) and enlarge the label text (Object ID)โฆ
import pyds
import random
def get_color_from_id(object_id):
# Generate a consistent color based on object_id
random.seed(object_id)
r = random.random()
g = random.random()
b = random.random()
return r, g, b
def osd_sink_pad_buffer_probe(pad, info, u_data):
gst_buffer = info.get_buffer()
if not gst_buffer:
return Gst.PadProbeReturn.OK
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
try:
frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
except StopIteration:
break
l_obj = frame_meta.obj_meta_list
while l_obj is not None:
try:
obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
except StopIteration:
break
# Unique color per object using object_id
object_id = obj_meta.object_id
random.seed(object_id)
r, g, b = random.random(), random.random(), random.random()
obj_meta.rect_params.border_color.set(r, g, b, 1.0)
obj_meta.rect_params.border_width = 5
# Set label font size and color
obj_meta.text_params.font_params.font_size = 18 # Larger font
obj_meta.text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0) # White text
# Set label background to black
obj_meta.text_params.set_bg_clr = 1 # Enable background color
obj_meta.text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0) # Black background
l_obj = l_obj.next
l_frame = l_frame.next
return Gst.PadProbeReturn.OK
osd_sink_pad = nvosd.get_static_pad("sink")
osd_sink_pad.add_probe(Gst.PadProbeType.BUFFER, osd_sink_pad_buffer_probe, 0)
1
Running the pipeline#
print("Starting pipeline \n")
pipeline.set_state(Gst.State.PLAYING)
try:
loop.run()
except:
pass
# cleaning up as the pipeline comes to an end
pipeline.set_state(Gst.State.NULL)
Starting pipeline
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
~~ CLOG[include/modules/NvMultiObjectTracker/NvTrackerParams.hpp, getConfigRoot() @line 52]: [NvTrackerParams::getConfigRoot()] !!![WARNING] Invalid low-level config file caused an exception, but will go ahead with the default config values
~~ CLOG[include/modules/NvMultiObjectTracker/NvTrackerParams.hpp, getConfigRoot() @line 52]: [NvTrackerParams::getConfigRoot()] !!![WARNING] Invalid low-level config file caused an exception, but will go ahead with the default config values
[NvMultiObjectTracker] Initialized
0:00:03.203686321 611 0x4bbecf0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1988> [UID = 1]: deserialized trt engine from :/dli/task/models/PeopleNet/resnet34_peoplenet.onnx_b1_gpu0_int8.engine
ERROR: [TRT]: 3: Cannot find binding of given name: output_cov/Sigmoid
0:00:03.258359350 611 0x4bbecf0 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: Could not find output layer 'output_cov/Sigmoid' in engine
ERROR: [TRT]: 3: Cannot find binding of given name: output_bbox/BiasAdd
0:00:03.258380058 611 0x4bbecf0 WARN nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: Could not find output layer 'output_bbox/BiasAdd' in engine
0:00:03.258389086 611 0x4bbecf0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2091> [UID = 1]: Use deserialized engine model: /dli/task/models/PeopleNet/resnet34_peoplenet.onnx_b1_gpu0_int8.engine
0:00:03.262912004 611 0x4bbecf0 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:/dli/task/config/config_infer_primary.txt sucessfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1:0 3x544x960
1 OUTPUT kFLOAT output_cov/Sigmoid:0 3x34x60
2 OUTPUT kFLOAT output_bbox/BiasAdd:0 12x34x60
nvstreammux: Successfully handled EOS for source_id=0
End-of-stream[NvMultiObjectTracker] De-initialized
<enum GST_STATE_CHANGE_SUCCESS of type Gst.StateChangeReturn>
# Convert MPEG4 video file to MP4 container file
!ffmpeg -i /dli/task/output_03_encoded.mpeg4 /dli/task/output.mp4 -y -loglevel quiet
from IPython.display import Video
Video("output.mp4", width=720)
niceโฆ we could do all kinds of stuff and customizations with the metadata in the probe functionโฆ we could also add secondory detectors to add high level caption or extract the cropped images of the detected objects to create embeddings etcโฆ