From images to locations

Blog / Luis Torres / December 13, 2017

This post is part of the image processing tools saga. In the first post of this series, we explored the numerical nature of images as arrays and applied some operations on them. In this post we describe how a few of these functions can be used to isolate features in images. In particular, we extract alphanumeric characters from images by using the open-source Python library OpenCV.

Bounding The Object

Imagine the following situation: I show you an image with multiple alphanumeric characters (numbers and letters mixed together), and ask you to draw a rectangle around each character with a pen. Surely, you would consider such a task quite easy, but this is not true for a computer. If you do not believe me, check the amount of code involved in any automatic car license plate reader in commerce. This task can be performed manually in a computer, of course, by replacing the pen with a mouse, drawing by hand rectangles around the characters and, bingo, same result. Let’s introduce some drama in the story and require this tool to draw bounding boxes to be autonomous, that is, not requiring human intervention (e.g. mouse or pen-related). The combination of our brain and eyes represents the most powerful image processing tool at our disposal and we are used to its maximum capabilities. It is when we try to replicate this model of vision inside a computer that we realise how complex this process really is. Let’s have a look at some of those issues with an example.

Extracting Features From Images

We are going to implement a bounding box drawer tool using image processing operations step by step. We will look at two cases: an idealised case and an actual one. In the former, we analyse an ideal, (almost) noise-free high-quality image, while in the latter we try to cope with a real world example.

Two examples of these cases are presented in the image below.

Fig.1 – Images with numbers.

The image on the left shows an artificial image with numbers I created for this experiment, while the one on the right is an example of a real marathon BIB, containing a number among other writings. In both cases, the goal is to use image processing techniques to generate bounding boxes around each digit.

Bounding the digits

In this section we present a small set of Python statements to implement the generation of the bounding boxes. There are multiple combinations of functions by which we can achieve this goal, but in this post we are considering a very straightforward approach. Describing the workflow, and connecting with our first post in the series, we simply perform the following image processing operations:

  1. Read input image and convert to grayscale
  2. Apply Gaussian blurring
  3. Perform edge detection
  4. Enclose the contours with bounding boxes

Read the image and convert to grayscale. After this the output of the operation looks like the images below.

import cv2
    # path to the image
    path_to_image = 'image.jpg'
    img = cv2.imread(path_to_image)
    # conversion to grayscale
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Fig.2 – Grayscale image.

strong>Apply gaussian blurring. The parameters of this operation are the size of the convolution kernel. This represents a filter to slide over the image in vertical and horizontal dimensions to smooth it, in this case with a gaussian function. We use one of sizes (5, 5), useful for the resolution scale of our image. The value for the dispersion in the (X, Y) dimensions is calculated from the image and that is why this parameter is set to zero.

img_blur_gauss = cv2.GaussianBlur(img_gray, (5, 5), 0)
Fig.3 – Gaussian blurred image.

Detect edges and contours. Simply apply the Canny edge detector using two threshold values, in this case set to (200, 255) respectively. If you decrease the lower threshold you detect the small scale edges, like the small fonts below the digits. If you increase the upper value you detect only the most prominent ones, like the ones defining the digits and the white rectangle around it. These parameters have the greatest influence on the ability to extract digits from the image. The extra flags are to select the most prominent contours in your sample (cv2.RETR_EXTERNAL) and to approximate the contour with the minimum amount of points (cv2.CHAIN_APPROX_SIMPLE).

edges = cv2.Canny(img_blur_gauss, 200, 255)
    img_mod, contours, hierarchy = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
Fig.4 – Edges and contours image.

Enclosing the contours with bounding boxes.Simply iterate over each detected contour in your sample and apply the boundingRect function to generate the coordinates of the corner and the width and height.

for contour in contours:
    # coordinates of the bounding boxes
    (x, y, w, h) = cv2.boundingRect(contour)
    # draw the bounding rectangle in the original image
    cv2.rectangle(img, (x, y),(x + w, y + h), (0, 0, 255), 2)
Fig.5 – bounding boxes image.

So, on the left is our idealised figure with the bounding boxes without using a pen and on the right is the processed image of the BIB number. Everything worked following the idealised harmony in the left. In the right, this is the time to say, wait! In addition to the digit segmentation we also see the presence of bounding boxes not associated with a character: false positive features. This is the main penalty to be paid for using pure image processing techniques to isolate digits. Any feature with a strong signal in the edge detection or the contour isolation steps will propagate up to the bounding box generation step. Heuristic techniques can be used to get rid of such detections; filtering by the area of the bounding box or its aspect ratio to name a few. In addition, the associated parameters for each function (such as the extreme values for edge detection or the size of the kernel for blurring) present alternatives to how the image can be processed.

Merging the operations into a single script:

import cv2
    # path to the image and read it into an array
    path_to_image = ‘image.jpg’
    img = cv2.imread(path_to_image)
    # gray scale
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    # gaussian blurring
    img_blur_gauss = cv2.GaussianBlur(img_gray, (5, 5), 0)
    # edge detection
    edges = cv2.Canny(img_blur_gauss, 200, 255)
    # contour detection
    img_mod, contours, hierarchy = cv2.findContours(edges,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
    # iteration over the contours to find bounding boxes
    for contour in contours:
    # coordinates of the bounding boxes
    (x, y, w, h) = cv2.boundingRect(contour)
    # draw the bounding rectangle in the original image
    cv2.rectangle(img, (x, y),(x + w, y + h), (0, 0, 255), 2)
    # saving the processed image to output.jpg
    cv2.imwrite(‘output.jpg’, img)

Transition to Artificial Intelligence

It is obvious that the problem in our hands has multiple free parameters, and that from a scientific perspective seems not to be optimal; one set of parameters for one situation may not be the best choice for another situation. Is there a universal tool to implement the character segmentation with no reference to this multiple parameter conundrum? The human brain solves that problem almost automatically, even when the numbers are white and the background is dark or an even more complicated colour, shape or orientation. Although our visual system already has information to adapt automatically to those situations, just imagine your life if our visual brain system could only read numbers in high contrast scenarios.

To handle a situation with degenerated optimal parameters and to reduce the amount of false positives, a class of AI models applied to computer vision, (based on neural networks in particular) are now being used to classify images [1]. Particular examples of tasks in this area are oriented to work as object locators [2] and even assign each pixel of an image to a specific label (known as instance segmentation [3]). Therefore, a currently hot topic in computer vision is the development of label and box generators for images, giving as output the coordinates and the label of the detected object. The next post in this series will tackle these topics, so stay tuned.

In the meantime, as a human, it is a pleasure to simply say, see you next time!

References

  1. https://pjreddie.com/darknet/yolo/
  2. keras-retinanet
  3. Mask_RCNN

Header image courtesy of Amador Loureiro.

Thanks to Elodie Thilliez, Nicola Pastorello and Shannon Pace for proofreading and providing suggestions.