What can we actually do with Deep Learning in Image Processing

Computers today can do almost everything we imagine with images and videos. Deep Learning, as a powerful tool in the area of Artificial Intelligence, can be very helpful state of the art gadget in Image Processing.

Application areas are numerous: from civil and industrial applications such as mobile telephone industry, augmented reality, home automation, gaming, retail and infotainment to some serious surveillance applications. If you ever wondered what is behind some FacebookAmazon or Pinterest algorithms for image classification and searches, it is Deep Learning.

Here are some concrete ideas of what we can actually accomplish with this tool:

Different stages of image and video processing examples

Note: If you are not familiar with the theory behind Deep Learning (read our next text about Deep Learning theory and Convolutional Neural Networks (CNN))

Pre-processing stage

Pre-processing represents the stage where we prepare our images for some future applications (from preparation to post on Instagram or showing on surveillance monitors, to preparation to use them in some complex applications, such as tracking, video stabilization…)

1.Noise Reduction for Image Restoration:

The process can go like this:

  • Prepare dataset, which consists of clean and corrupted image pairs (used to train a CNN)
  • Given a noisy image, predict clean image (using CNN)
  • Learn how to map corrupted image patches to clean ones, implicitly capturing the characteristic appearance of noise in natural images

Here is an example of thermal image denoising:

Noise reduction in thermal image

2. Image dehazing:

  • The presence of haze (dust, smoke, fog, mist, rain, snow…) directly influences visibility of the scene by reducing contrast and obscuring objects.  In severe haze conditions the image can practically loose the most of visual information.
  • This problem is where powerful optics cannot provide solution, and digital image processing is a must.
  • Video dehazing based on haze imaging includes estimation of haze transmission map, which needs to be “subtracted” or removed from the hazy image.
Removing haze from image

3. Image Enhancement:

  • Improve the visibility in the night conditions with CNNs
Improving visibility in night conditions
  • Improve the visibility in low contrast conditions with CNNs
Improve contrast in image for better visibility

4. Artefact Reduction

  • An efficient neural network can be used for seamless attenuation of different compression artefacts
  • Reduce JPEG compression artefacts
  • Reduce Twitter compression artefacts

Image processing stage

This is the stage where we actually use the information from images to artificially create a content and add meaning to group of pixels.

5. Boundary Detection

  • Detect object boundaries by using CNNs
Boundary detection

6. Feature learning – context encoders

  • Context Encoders – a CNN trained to generate the contents of an arbitrary image region conditioned on its surroundings
  • Context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s).
Missing part of image retreival

7. Object Detection

Here we will explain the terms object classification, object localization, object detection and image segmentation:

  • Object detection models identify single or multiple relevant objects in a single image
  • The localization of the objects is provided, in comparison with image classification
Classification, localization, detection and segmentation explained

Object detection is VERY popular topic nowadays in scientific community, so several datasets have been released for object detection challenges:

     – PASCAL Visual Object Classification (PASCAL VOC) dataset

     – ImageNet

     – Common Objects in COntext (COCO) dataset

Researchers publish results of their algorithms applied to these challenges.

Overview of the scores on the 2007, 2010, 2012 PASCAL VOC and 2015, 2016 COCO datasets, by using different networks

8. Image Segmentation

  • The real time demo of Image Segmentation with SegNet can be seen here:


  • SegNet is a deep encoder-decoder architecture for multi-class pixelwise segmentation researched and developed by members of the Computer Vision and Robotics Group at the University of Cambridge, UK.

9. Object recognition

Object recognition is explained with the image below:

Detection, action classification, image captioning explained
  • Convolutional Neural Networks (CNN) have become an important tool for object recognition.
  • Convolutional Neural Networks for Visual Recognition example in real time can be found here:


Video processing

Video processing actually represents image processing on the set of frames (which represent the sequence).

Processing can be offline and online (http://www.smartimagingblog.com/2019/04/07/what-is-real-time-processing-online-vs-offline/).

This actually means the sequence can be processed in real time (real time processing), frame by frame (and we should monitor the processing time between frames), or the sequence can be recorded and then processed, so there is no need for taking care of processing time.

In surveillance and monitoring, it is necessary to have real time video processing, which affects the complexity of the algorithm used. On the other hand, when we need post-processing in, for example, our recorded videos, complexity of the algorithms can rise. 

10. Object Tracking

  • Object tracking has always been a challenging problem in a field of computer vision.
  • Popular challenges or contests like the Visual Object Tracking (VOT) challenge and Visual Object Tracking in Thermal Infrared VOT-TIR challenge are a proof that object tracking is an ongoing demanding problem.
  • Deep Learning methods can also be applied in the task of single or multiple object tracking.

11. Digital Video Stabilization

  • Digital video stabilization is a task of removing jitter and shaking in video, due to unwanted camera motion (because of holding camera in hand or platform where camera is mounted is shaking)
  • Digital video stabilization is a task that can be performed on surveillance or monitoring cameras, therefore in real time or we might want to remove hand-shaking from our pre-recorded videos (offline stabilization).
  • Deep Learning framework enables faster algorithms, which can be applied succesfully in real time processing.
(a) Shaky video sequence from Matlab examples
(b) Stabilized video sequence

12. Video enhancement

  • Image enhancement techniques, mentioned in pre-processing section, can also be applied on the video sequence, in the same way.
  • The only concern can be about the complexity of the algorithm, in the case of real time processing.

This would be all for the post today, but if you want to study the topic more, here are some scientific papers that talk about the mentioned algorithms.