Videos are an extremely rich source of data, providing an attractive mix of a visual story and the notion of flow in time, which allow the data science and computer vision community to interpret and understand data with a level of nuance that is lost in simple time-series data or still images. In order to take video footage and present it for analysis, there are a number of tools a data scientist must have in their belt. In this post, we give an introduction to video manipulation using a small set of image processing operations in python applied to each frame of either a video file or a webcam stream.
The Video Analyst’s Toolkit
While there are numerous tools that could go into the video analyst’s toolkit, 2 of the most common are OpenCV, which provides a rich library of functions for manipulating and transforming video images, and imutils which provides convenience functions for doing basic image manipulation and interfacing with matplotlib.
Reading the Frames of a Video
Did you ever create an animation flipbook by drawing independent drawings or sketches in a notebook and flipping the pages? Conceptually, this is what occurs when processing and analysing videos: for each frame of the video, do something. To start processing and analysing a video, you first need a video stream, which can either be generated and processed in real-time or from a pre-existing source:
if args_dict('video', None) is None: video_stream = VideoStream(src=0).start() else: video_stream = cv2.VideoCapture(args_dict['video'])
With our streaming source established, we simply stream until there is no more video footage available:
while True: frame = video_stream.read() # difference between frame from webcam vs frame from video frame = frame if args_dict.get("video", None) is None else frame # stop frame extraction once you get at the end of video file if frame is None: break
Each frame of the video is now stored in frame as a BGR array (the convention used by OpenCV), so we are ready to apply image processing operations to each frame to display a collage.
Manipulating the Frames of a Video
With the video now being streamed and available for processing, you can introduce the functionality to perform necessary transformation and analysis (e.g. localising faces on the image frame, parsing the image through a classifier, or applying a model for detecting a person or object). The path we follow here is to apply a set of image processing operations, mostly colour conversions and edge detections, to merge them into a collage:
# re-scale the dimensions of the frame - 50% in each dimension img_small = cv2.resize(frame, (0, 0), fx=0.5, fy=0.5) # convert to grayscale img_gray = cv2.cvtColor(img_small, cv2.COLOR_BGR2GRAY) # invert the gray scale img_gray_inv = 255 - img_gray # detect edges using the Canny method img_canny_edge = cv2.Canny(img_blur, 0, 50) # apply gaussian blurring - convolve with gaussian kernel, size(9, 9) img_blur = cv2.GaussianBlur(img_gray_inv, (9, 9), 0, 0) # generate sketch transformation img_sketch = cv2.divide(img_gray, 255 - img_blur, scale=256) # highlight edges in black/white space with a threshold img_edges = cv2.adaptiveThreshold(img_blur, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.TRESH_BINARY, 15, 2) # generate collage - stacking images upper_row = np.hstack((img_small, cv2.cvtColor(img_gray, cv2.COLOR_GRAY2BGR) cv2.cvtColor(img_gray_inv, cv2.COLOR_GRAY2BGR))) lower_row = np.hstack((img_canny_edge, cv2.cvtColor(img_sketch, cv2.COLOR_GRAY2BGR) cv2.cvtColor(img_edges, cv2.COLOR_GRAY2BGR))) # merge rows for collage collage = np.vstack((upper_row, lower_row)) # display the processed frames cv2.imshow('Frame', collage) # exit the stream if you press "q" on the keyboard keyboard_event = cv2.waitKey(1) & 0xFF if keyboard_event == ord("q"): break
A single example picture with the image processing operations is shown, displaying the original and grayscale images in the top row and the edges plus the sketch transformations in the bottom.
The next example highlights a video stream with the same image manipulations applied. In human vision, we are used to seeing the world in colour and, with it, the associated flow of time. But some features are simpler to analyse in other spaces, like the isolation of text, numbers, or object shapes using an edge detector on images, in order to increase the contrast of these regions so that we can, for instance, apply optical character recognition or project the edge silhouette of a human action. By doing this, we can reduce noise and amplify our options for potential classification tools.
Manipulating videos for any kind of image processing or deep learning application starts with one fundamental step: learn how to read the frames of the stream and manipulate them. In this post we demonstrated the basics of reading and processing video streams with an emphasis on pure image processing operations. In future posts on the topic, we’ll look at applying inferences of a model related to object detection or feature extraction, stay tuned as we continue to add tools to your video processing and manipulation repertoire.