Skip to main content
Company Blog

With the exponential growth of cameras and visual recordings, it is becoming increasingly important to operationalize and automate the process of video identification and categorization. Applications ranging from identifying the correct cat video to visually categorizing objects are becoming more prevalent.  With millions of users around the world generating and consuming billions of minutes of video daily, you will need the infrastructure to handle this massive scale.

With the complexities of rapidly scalable infrastructure, managing multiple machine learning and deep learning packages, and high-performance mathematical computing, video processing can be complex and confusing.  Data scientists and engineers tasked with this endeavor will continuously encounter a number of architectural questions:

  1. How to scale and how scalable will the infrastructure be when built?
  2. With a heavy data sciences component, how can I integrate, maintain, and optimize the various machine learning and deep learning packages in addition to my Apache Spark infrastructure?
  3. How will the data engineers, data analysts, data scientists, and business stakeholders work together?

Our solution to this problem is the Databricks Unified Analytics Platform which includes the Databricks notebooks, collaboration, and workspace features that allows different personas of your organization to come together and collaborate in a single workspace.  Databricks includes the Databricks Runtime for Machine Learning which is preconfigured and optimized with Machine Learning frameworks, including but not limited to XGBoost, scikit-learn, TensorFlow, Keras, and Horovod. Databricks provides optimized auto-scale clusters for reduced costs as well as GPU support in both AWS and Azure.

In this blog, we will show how you can combine distributed computing with Apache Spark and deep learning pipelines (Keras, TensorFlow, and Spark Deep Learning pipelines) with the Databricks Runtime for Machine Learning to classify and identify suspicious videos.

Classifying Suspicious Videos

In our scenario, we have a set of videos from the EC Funded CAVIAR project/IST 2001 37540 datasets. We are using the Clips from INRIA (1st Set) with six basic scenarios acted out by the CAVIAR team members including:

  • Walking
  • Browsing
  • Resting, slumping or fainting
  • Leaving bags behind
  • People/groups meeting, walking together and splitting up
  • Two people fighting

In this blog post and the associated Identifying Suspicious Behavior in Video Databricks notebooks, we will pre-process, extract image features, and apply our machine learning against these videos.

Source: Reenactment of a fight scene by CAVIAR members - EC Funded CAVIAR project/IST 2001 37540 http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/

For example, we will identify suspicious images (such as the one below) extracted from our test dataset (such as above video) by applying a machine learning model trained against a different set of images extracted from our training video dataset.

High-Level Data Flow

The graphic below describes our high-level data flow for processing our source videos to the training and testing of a logistic regression model.

High-Level Data Flow for Processing Source Videos

The high-level data flow we will be performing is:

  • Videos: Utilize the EC Funded CAVIAR project/IST 2001 37540 Clips from INRIA (1st videos) as our set of training and test datasets (i.e. training and test set of videos).
  • Preprocessing: Extract images from those videos to create a set of training and test set of images.
  • DeepImageFeaturizer: Using Spark Deep Learning Pipeline’s DeepImageFeaturizer, create a training and test set of image features.
  • Logistic Regression: We will then train and fit a logistic regression model to classify suspicious vs. not suspicious image features (and ultimately video segments).

The libraries needed to perform this installation:

  • h5py
  • TensorFlow
  • Keras
  • Spark Deep Learning Pipelines
  • TensorFrames
  • OpenCV

With Databricks Runtime for ML, all but the OpenCV is already pre-installed and configured to run your Deep Learning pipelines with Keras, TensorFlow, and Spark Deep Learning pipelines.  With Databricks you also have the benefits of clusters that autoscale, being able to choose multiple cluster types, Databricks workspace environment including collaboration and multi-language support, and the Databricks Unified Analytics Platform to address all your analytics needs end-to-end.

Source Videos

To help jump-start your video processing, we have copied the CAVIAR Clips from INRIA (1st Set) videos [EC Funded CAVIAR project/IST 2001 37540] to /databricks-datasets.

https://www.youtube.com/watch?v=TMyqwGpdIRI

  • Training Videos (srcVideoPath): /databricks-datasets/cctvVideos/train/
  • Test Videos (srcTestVideoPath): /databricks-datasets/cctvVideos/test/
  • Labeled Data (labeledDataPath): /databricks-datasets/cctvVideos/labels/cctvFrames_train_labels.csv

Preprocessing


We will ultimately execute our machine learning models (logistic regression) against the features of individual images from the videos.   The first (preprocessing) step will be to extract individual images from the video. One approach (included in the Databricks notebook) is to use OpenCV to extract the images per second as noted in the following code snippet.

## Extract one video frame per second and save frame as JPG
def extractImages(pathIn):
   count = 0
   srcVideos = "/dbfs" + src + "(.*).mpg"
   p = re.compile(srcVideos)
   vidName = str(p.search(pathIn).group(1))
   vidcap = cv2.VideoCapture(pathIn)
   success,image = vidcap.read()
   success = True
   while success:
      vidcap.set(cv2.CAP_PROP_POS_MSEC,(count*1000))
      success,image = vidcap.read()
      print ('Read a new frame: ', success)
      cv2.imwrite("/dbfs" + tgt + vidName + "frame%04d.jpg" % count, image)     # save frame as JPEG file
      count = count + 1
      print ('Wrote a new frame')    

In this case, we’re extracting the videos from our dbfs location and using OpenCV’s VideoCapture method to create image frames (taken every 1000ms) and saving those images to dbfs.   The full code example can be found in the Identify Suspicious Behavior in Video Databricks notebooks.

Once you have extracted the images, you can read and view the extracted images using the following code snippet:

from pyspark.ml.image import ImageSchema

trainImages = ImageSchema.readImages(targetImgPath)
display(trainImages)

with the output similar to the following screenshot.

 

Note, we will perform this task on both the training and test set of videos.

DeepImageFeaturizer

Databricks Unified Analytics Platform for Video

As noted in A Gentle Introduction to Transfer Learning for Deep Learning, transfer learning is a technique where a model trained on one task (e.g. identifying images of cars) is re-purposed on another related task (e.g. identifying images of trucks).  In our scenario, we will be using Spark Deep Learning Pipelines to perform transfer learning on our images.

Source: Inception in TensorFlow

As noted in the following code snippet, we are using the Inception V3 model (Inception in TensorFlow) within the DeepImageFeaturizer to automatically extract the last layer of a pre-trained neural network to transform these images to numeric features.

# Build featurizer using DeepImageFeaturizer and InceptionV3 
featurizer = DeepImageFeaturizer( \
    inputCol="image", \
    outputCol="features", \ 
    modelName="InceptionV3" \
)

# Transform images to pull out 
#   - image (origin, height, width, nChannels, mode, data) 
#   - and features (udt)
features = featurizer.transform(images)

# Push feature information into Parquet file format
features.select( \
  "Image.origin", "features" \
).coalesce(2).write.mode("overwrite").parquet(filePath)

Both the training and test set of images (sourced from their respective videos) will be processed by the DeepImageFeaturizer and ultimately saved as features stored in Parquet files.

Logistic Regression

In the previous steps, we had gone through the process of converting our source training and test videos into images and then extracted and saved the features in Parquet format using OpenCV and Spark Deep Learning Pipelines DeepImageFeaturizer (with Inception V3).  At this point, we now have a set of numeric features to fit and test our ML model against. Because we have a training and test dataset and we are trying to classify whether an image (and its associated video) are suspicious, we have a classic supervised classification problem where we can give logistic regression a try.

This use case is supervised because included with the source dataset is the labeledDataPath which contains a labeled data CSV file (a mapping of image frame name and suspicious flag).  The following code snippet reads in this hand-labeled data (labels_df) and joins this to the training features Parquet files (featureDF) to create our train dataset.

# Read in hand-labeled data 
from pyspark.sql.functions import expr
labels = spark.read.csv( \
    labeledDataPath, header=True, inferSchema=True \
)
labels_df = labels.withColumn(
    "filePath", expr("concat('" + prefix + "', ImageName)") \
).drop('ImageName')

# Read in features data (saved in Parquet format)
featureDF = spark.read.parquet(imgFeaturesPath)

# Create train-ing dataset by joining labels and features
train = featureDF.join( \
   labels_df, featureDF.origin == labels_df.filePath \
).select("features", "label", featureDF.origin)

We can now fit a logistic regression model (lrModel) against this dataset as noted in the following code snippet.

from pyspark.ml.classification import LogisticRegression

# Fit LogisticRegression Model
lr = LogisticRegression( \
   maxIter=20, regParam=0.05, elasticNetParam=0.3, labelCol="label")
lrModel = lr.fit(train)

After training our model, we can now generate predictions on our test dataset, i.e. let our LR model predict which test videos are categorized as suspicious. As noted in the following code snippet, we load our test data (featuresTestDF) from Parquet and then generate the predictions on our test data (result) using the previously trained model (lrModel).

from pyspark.ml.classification import LogisticRegression, LogisticRegressionModel

# Load Test Data
featuresTestDF = spark.read.parquet(imgFeaturesTestPath)

# Generate predictions on test data
result = lrModel.transform(featuresTestDF)
result.createOrReplaceTempView("result")

Now that we have the results from our test run, we can also extract out the second element (prob2) of the probability vector so we can sort by it.

from pyspark.sql.functions import udf
from pyspark.sql.types import FloatType

# Extract first and second elements of the StructType
firstelement=udf(lambda v:float(v[0]),FloatType())
secondelement=udf(lambda v:float(v[1]),FloatType())

# Second element is what we need for probability
predictions = result.withColumn("prob2", secondelement('probability'))
predictions.createOrReplaceTempView("predictions")
In our example, the first row of the predictions DataFrame classifies the image as non-suspicious with prediction = 0. As we’re using binary logistic regression, the probability StructType of (firstelement, secondelement) means (probability of prediction = 0, probability of prediction = 1). Our focus is to review suspicious images hence why order by the second element (prob2).

We can execute the following Spark SQL query to review any suspicious images (where prediction = 1) ordered by prob2.

%sql
select origin, probability, prob2, prediction from predictions where prediction = 1  order by prob2 desc

Based on the above results, we can now view the top three frames that are classified as suspicious.

displayImg("dbfs:/mnt/tardis/videos/cctvFrames/test/Fight_OneManDownframe0024.jpg")

displayImg("dbfs:/mnt/tardis/videos/cctvFrames/test/Fight_OneManDownframe0014.jpg")

displayImg("dbfs:/mnt/tardis/videos/cctvFrames/test/Fight_OneManDownframe0017.jpg")

Based on the results, you can quickly identify the video as noted below.

displayDbfsVid("databricks-datasets/cctvVideos/mp4/test/Fight_OneManDown.mp4")

 

Source: Reenactment of a fight scene by CAVIAR members - EC Funded CAVIAR project/IST 2001 37540 http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/

 

Summary

In closing, we demonstrated how to classify and identify suspicious video using the Databricks Unified Analytics Platform: Databricks workspace to allow for collaboration and visualization of ML models, videos, and extracted images, Databricks Runtime for Machine Learning which comes preconfigured with Keras, TensorFlow, TensorFrames, and other machine learning and deep learning libraries to simplify maintenance of these various libraries, and optimized autoscaling of clusters with GPU support to scale up and scale out your high performance numerical computing.  Putting these components together simplifies the data flow and management of video classification (and other machine learning and deep learning problems) for you and your data practitioners. Try out the Identify Suspicious Behavior Databricks notebooks with Databricks Runtime for Machine Learning today.