The Basics of Image Processing

Lavanya Gupta
WiCDS
Published in
7 min readJan 1, 2021

--

Empower your deep learning models by harnessing some immensely powerful image processing algorithms.

Source

Motivation

Many deep learning courses start with an introduction to the basic image processing techniques (like resizing, cropping, color-to-grayscale, rotation, etc.) but only provide a cursory glance at these concepts.
In this journey, we often discount the importance of some immensely powerful image processing algorithms which can do wonders to our final model predictions.

For folks who are familiar with the general ML paradigm, you know how vital data cleaning and feature engineering is for the success of any model. Image processing (loosely speaking) can be seen as a combination of these two steps in an ML pipeline.

In this post, I will discuss a few selected algorithms which I used in my project of extracting named entities from unstructured pdf documents. As you can imagine, scanned pdfs suffer from a variety of problems — dull background, indistinct boundaries between text characters and the background, blurred/overlapping text, unaligned page margin cuts etc.

Let’s see how the image processing algorithms can address the above issues.

1. Converting between Colored and Grayscale images

There are several reasons why we favor working with grayscale images when just starting out with any computer vision application. For learning image processing, it’s better to understand grayscale processing first and understand how it applies to multichannel processing rather than starting with full-color imaging and missing all the important insights (like edges, contours, shapes, etc.) that can be learned from single-channel processing.

Colored images usually have 3 channels — Red, Green, Blue.
red = image[:, :, 0]
green = image[:, :, 1]
blue = image[:, :, 2]

If your machine uses 8-bits to represent an integer, then each channel pixel value will lie in the range [0, 255] because a pixel is nothing but an integer.
Similarly, if your machine uses 4-bits to represent an integer, then each channel pixel value will lie in the range [0, 15].

In a black-and-white image, every pixel value is either 0 or 1.

A grayscale image has only 1 channel and the pixel values are in the range [0, 255]. These values make up the different shades of gray.

A colored image has 3 channels (i.e. R-G-B) and the pixel values in each channel are in the range [0, 255].

2. Thresholding

It is the technique to partition a grayscale image into foreground and background by making it black-and-white (also called binary image). To threshold on color images, we must first convert them to grayscale.

I like to understand thresholding as an exaggeration of the pixel values. The most basic example is using a threshold = 127. If we use this, all values on the darker end (< 127) are converted to black (0), and all values on the lighter end (> 127) are converted to white (1).

Source

Thresholding is also the most basic form of image segmentation and object detection.

Global and Local thresholding

Global thresholding: If the background is uniform and has high contrast (with the foreground), then global thresholding works the best. It is also faster than local thresholding. We can find the optimal global threshold value for an image by using any of the popular thresholding techniques like Otsu, Triangle, Mean, Yen etc.

Local (or adaptive) thresholding: If the background is not well illuminated (uneven background) and the background does not have high contrast, then we use local thresholding. In local thresholding, instead of applying a single global threshold to the entire image, we calculate a threshold value for every small region in the image. This region is defined by the block size, which is the number of neighboring pixels to consider for calculating the threshold in a given region.

Since the black distorted background is present only in a part of the image, local thresholding works better than global thresholding.

3. Contrast

In simple words, the contrast is the difference between the maximum and minimum pixel intensities of an image. It is a measure of the range of an image, i.e. how spread the pixel intensities are. High contrast is usually desired in images as it enhances the image features. Ideally, we’d like images to use the entire range of pixel values available to them.

In the example below of an X-ray scan, the high contrast image on the right helps the doctors find relevant details like cancer in the lungs or bone fractures. These might easily get missed (due to poor visibility) in the low-contrast image on the left due to the fact that most pixels are at the same intensity value.

Let’s take a look at another example. If you plot the histogram of the pixel intensities of an image, it will either be:
- Left-skewed: representing a very dark (or dull) image
- Right-skewed: representing a very light (or bright) image
- Centered: representing an image where all pixels are mostly grey

There are 2 commonly used techniques for contrast enhancement:
1. Contrast stretching
2. Histogram equalization

1. Contrast stretching
Stretch the pixel values so that the entire range of pixel values is filled.

Considering an 8-bit (integer) representation of pixel values, the algorithm will map the minimum intensity in the image to 0 and the maximum intensity in the image to 255, and normalizes all the other intermediate pixel values in this range [0,255].

So, in the below formula:
a = 0
b= 2⁸-1 = 255
c = min pixel intensity in the image
d = max pixel intensity in the image

Think of this as a linear normalization or scaling function applied to all pixel values in the image. Since there exists a one-to-one relationship of the intensity values between the source image and the transformed image, the original image can be easily restored from the contrast-stretched image.

Source

2. Histogram equalization
Spread out the most frequently occurring pixel values using the probability density function of the pixel intensities.

Histogram equalization seeks to flatten the pixel intensity histogram so that the intensities are more or less equiprobable in being encountered in the image

Once histogram equalization is performed, there is no way of getting back the original image.

Fig. b) Original d) Histogram equalization f) Contrast stretching

4. Morphological operations

Morphological image processing is a collection of non-linear operations that process images based on the shape or morphology of features in an image. We probe the image with a small filter/kernel called a structuring element, which defines the region of interest or neighborhood around a pixel. The value of a given pixel in the output image is calculated based on the corresponding pixel in the input image and its neighbors.

Morphological operations alter the intensities of only specific regions of an image. Unlike the convolution operations which affect an entire image, morphological operations isolate certain sections of an image, and then expand or contract just those regions to achieve the desired effect. Also, unlike convolutions, the number of rows and number of columns of the image does not change after applying a morphological operation.

The most basic morphological operations (that I also used for my project) are:
a) Dilation: Adds pixels to the object boundaries
b) Erosion: Remove pixels from the object boundaries
This is a very good article explaining the intuition and working of these algorithms.

The number of pixels added or removed from the objects in an image depends on the size and shape of the structuring element used to process the image.

Dilation and erosion are often used in various combinations to implement other morphological operations. For example, the definition of a morphological opening of an image is an erosion followed by a dilation, using the same structuring element for both operations.
A few other commonly used morphological operations are closing, white tophat, black tophat, skeletonize, convex hull.
To see how easily they can be implemented using scikit-image, check this.

Structuring element — The structuring element defines the neighborhood of the pixel of interest. A 2D grid of 0s and 1s specifies the configuration of the structuring element. The grid should emulate the shape of the object that we wish to dilate or erode from the image.
NOTE: For those of you familiar with convolutional neural networks in deep learning, the structuring element resembles the kernel/filter that we use in the convolution operations.

--

--

Lavanya Gupta
WiCDS

Carnegie Mellon Grad | AWS ML Specialist | Instructor & Mentor for ML/Data Science