Vigil #1 - Kicking things off

Introducing Vigil: concept, goals and initial architecture for a traffic accident detection system

Sep 28, 2024

Introducing Vigil

As I mentioned in my introductory post, the first project I am tackling in this series is Vigil: a real-time traffic accident detection system.

The gist for the first prototype is to build a system that extracts video frames from a large number of traffic cameras, runs them through a computer vision model that detects car accidents and then notifies a monitoring user interface. This brings together the disciplines of data engineering and machine learning, as it requires building a streaming data pipeline (capable of handling high throughput with low latency) as well as deploying a deep learning model.

Imagine a traffic control centre using Vigil to monitor hundreds of cameras simultaneously, receiving instant alerts when accidents occur, potentially saving crucial minutes in emergency response times.

If you are wondering where I came up with the name Vigil, it is due to one of my recent binges on the history of the Roman Empire. Vigil stands for watchman in Latin, and the Vigiles Urbani were the firefighters and police of ancient Rome. (source)

Future Developments

The use case and functionality of this first version may seem very limited, but I believe it makes up a solid end-to-end challenge. It is also a good foundation for further iterations in both the engineering and machine learning sides:

In the future we could:

Use transfer learning to modify a pre-trained model and retrain only the final layers.
Build an active learning system, where operators can review positive accident classifications and correct them if necessary. This annotated data is then used to further train the model.
Ultimately train my own object detection and classification model from scratch, using publicly available datasets together with data previously classified by Vigil.
Expand classification to infer other insights,
Store historical data and expand real time processing to include interval based metrics
Build a custom frontend dashboard to display both historical and real time data
And probably many more things that will come to mind as we build the project

Data sources

The main purpose of Vigil is the learning experience, but I thought why not make it open source and easily pluggable to different camera feed sources? This way in an improbable future it could be actually used for something.

After an initial research, I could find a good number of openly available traffic cam feeds from all kinds of different places around the world. And they largely seem to follow the same model: every feed has an HTTP endpoint that returns a 10-20 second mp4 video, that is refreshed every 5-10 minutes (this refresh rate varies greatly depending on the source). This immediately gave me the idea to track other metrics such as the freshness of the feed information.

In my searches I also found that the Portuguese authority for roads and transport provides a website with all the available traffic cams in the country. And this website uses an API that I can leverage to gather all available cameras and their feeds. This seems a perfect data source to use for the prototype, as it openly provides a good number of feeds that can be queried. I just hope they don’t find this newsletter and throw it behind some sort of authentication.

Design

After a bit of thinking I put together the following design:

As you can see, the system has only two main components: the frame extractor and the frame classifier.

Frame Extractor

This service will start by pulling the list of all available camera feeds, including their details (such as name, location, video url, etc) and store it in a single file on remote storage like S3 or similar. It will then spawn concurrent instances that will take one or more feeds, download their respective mp4 files, extract each frame and send it to a ZeroMQ message queue. Each message will contain a base64 blob containing the frame and some metadata such as the camera identifier and timestamp of capture.

Before processing each video file, the extractor will first verify if the video content has been processed or the feed has refreshed with a new video. The service will regularly pull the cameras feed data from the public API in order to keep it updated.

Frame Classifier

On the other end of the message queues will be multiple instances of Frame Classifiers subscribing to the frame messages. For this prototype we will grab a publicly available object detection model trained to classify accidents that will perform inference on each received frame.

Each classifier instance will establish an HTTP connection to a self-hosted Grafana dashboard and send a Server Sent Event (SSE) if the frame has been positively classified. The Grafana dashboard will also read from the camera feed metadata file to organise its layout.

Choice of technologies

For both the frame extractor and classifier services I will use Python. Why not a more performant and fashionable choice like Go or Rust? Well, for two reasons actually. First, because it has been my daily driver since I’ve been dedicated to implementing machine learning projects and thus I am far more productive using it. I don’t want to add the cognitive load of learning or relearning a language to the work of building this project. Maybe in the future I’ll rewrite the services and do some performance comparisons... The second reason is that I can easily grab a model already trained with PyTorch and quickly integrate it in the classifier service.

As for the choice of the message queue, you may be wondering why I did not go for a choice that provided consistency and persistence guarantees like Kafka? Well, because I don’t really need them. I can afford losing frames without compromising the classifying and alerting mechanism. Also, since each frame message contains the timestamp of capture, I also don’t really care about message ordering.

I also don’t want to spend time building a front-end interface when I already have a bunch of solutions that meet the monitoring requirements. Grafana is a straight-forward choice and has a vast ecosystem of extensions like support for server sent notifications or websockets.

Finally, a single YAML or JSON file will be enough to store the list of available cameras pulled from the public API. So, there is no need to introduce a data store for now.

Any of these choices may of course be changed as we implement the actual system or as we iterate to add more functionality or improve performance.

Getting my hands dirty

I am now ready to roll up my sleeves and start building the first prototype of Vigil. In the next post I plan to take you through the process of building the first proof of concept (POC). The main goal of this POC will be to have a very reduced but still end-to-end version of this first design, such as processing frames and sending alerts from a single camera feed.

See you then!