Convolutional Neural Networks (CNNs) — And all that.

Surya Maddula
8 min readMar 4, 2022

Images used in my articles are Properties of the Respective Organizations and are used here solely for Reference, Illustrative and Educational Purposes Only. (Images Source: Google [Aside from some images, in whose case, the source is specifically mentioned below the image])

Let’s Start with the Basics.

What is a Neural Network?

Neural Network (Source: IBM)

Here’s another similar image with a fresh style of Illustration.

Neural Network

Here’s Another one

Source: V7 Labs

And here’s another depicting a Simpler Neural Network.

Source: Investopedia

Neural Networks are loosely modeled after how neurons in the brain behave. The key advantage of Neural Networks is that they can extract data features automatically without needing the input of the programmer. A Neural Network is a system of organizing ML Algorithms to perform certain tasks. It is a fast and efficient way to solve problems for which the dataset is very large, such as in images.

In a Neural Network, there is 1 Input layer, Multiple ‘Hidden’ Layers, and 1 Output Layer.

Types of Neural Networks.

This Image correctly depicts almost all of the Neural Networks. You can find a Medium Article explaining this image Here.

A Mostly Complete Chart of Neural Networks

Performance of different Neural Networks against Amount of Data.

Performance of different Neural Networks against Amount of Data.

As you can see in the figure given above, the large NN tends to perform better with enormous amounts of data, whereas the traditional ML Algorithms stop improving after a certain ‘saturation’ point, meaning the improvement is static after a certain point, and there is no change.

How does a Neural Network, Work?

  • A Neural Network is divided into multiple layers and each layer is further divided into several blocks called nodes.
  • Each node has its own task to accomplish which is then passed to the next layer. The first layer of neural networks is known as the input layer.
  • The job of the input layer is to acquire data and feed it to the neural network. No process occurs here. Next to it, are the hidden layers.
  • They are called ‘hidden’ layers because these layers are hidden and are not visible to the user. Each node of these hidden layers has its own ML Algorithm which executes the data received from the input layer. The hidden layers are the only places where any processing takes place.
  • There can be multiple Hidden Layers depending upon the complexity of the Neural Network Function. The number of nodes also depends upon this criterion.
  • The last hidden layer passes the final processed data to the output layer, which then gives it to the user as the final output. No Processing occurs here.

Convolutional Neural Networks (CNNs)

Now that we understand Neural Networks a little bit better, let’s talk about CNN's.

A CNN Sequence to Classify & Segregate Handwritten Characters

Introduction

CNN's (Convolutional Neural Networks) are a subtype of Deep Neural Network that can identify and classify key features in images and are used in image processing & Analysis. Photo and Video recognition, Image classification, medical image analysis, computer vision, and NLP are a few of the applications.

It’s a Deep Learning Algorithm which can take in an image as input, assign importance (learnable weights and biases) to various aspects/objects in the image, and distinguish one from the other.

The mathematical function of convolution, which is a special kind of linear operation in which two functions are multiplied to produce a third function that expresses how the shape of one function is modified by the other, is represented by the word “Convolution” in Convolutional Neural Network. In Simple terms, two matrices are multiplied to get an output that is used to extract features from an image.

The Process of deploying a CNN is given below-

The Image below shows the process of deploying a CNN into a real-life use case, in this case-Identifying the image. The CNN takes the input image, processes, and understands it & gives out the output values, showing how sure it is, that the image is a particular object.

For Instance: In this case, the CNN is 70% sure that this is an image of a car, 20% sure that this is an image of a truck, & 10% sure that this is an image of a bicycle.

Process of Deploying a CNN

You can see the enlarged version of the CNN shown above here.

Layers/Divisions of CNNs

Basic Layers of a CNN

The Layers of a CNN are Convolutional Layer, Pooling Layer & Fully Connected Layer.

When these layers are stacked/put together, a CNN architecture Is formed. Let’s Understand each one of them a little in detail.

Convolutional Layer

This is the first layer that extracts the different features & characteristics from the input images. The convolution mathematical operation is done between the input image and a filter of a specific size MxM in this layer.

The dot product between the filter and the sections of the input image with regard to the size of the filter is taken by sliding the filter across the input image (MxM).

The Feature map is the outcome, and it includes information of the image such as its corners and edges. This feature map is then input to further layers, which learn a range of other features from the input image.

The feature maps of a CNN capture the result of applying the filters to an input image i.e. at each layer, the feature map is the output of that layer. The reason for visualising a feature map for a specific input image is to try to gain some understanding of what features our CNN detects.

Pooling Layer

A Pooling Layer is usually applied after the Convolutional Layer. This layer’s primary goal is to lower the size of the convolved feature map to reduce computational expenditures.

This is achieved by reducing the connections between layers and operating independently on each feature map. There are various kinds of Pooling procedures, depending on the mechanism used.

Fully Connected Layer

The weights and biases, as well as the neurons, make up the Fully Connected (FC) layer, which is used to link the neurons between two layers. The last several layers of a CNN Architecture are positioned before the output layer.

The preceding layers’ input images are flattened and supplied to the FC layer in this step. After that, the flattened vector is sent via a few additional FC levels, where the mathematical functional operations are normally performed. The categorization procedure gets started at this point.

Disadvantages of CNNs

The disadvantages of CNN Models are:

  • Classification of Images captured different Positions & Angles
  • Adversarial Examples
  • Coordinate Frame
  • And some other Minor Disadvantages

Let’s Take a look at each one a little in detail.

Classification of Images captured Different Positions & Angles

One of the numerous challenges in the field of Computer Vision is to deal with variance in the data present in the real world.

Human visual system can identify images irrespective of-

  • Difference in Angles
  • Difference In Backgrounds
  • Difference in Lighting, Saturation, Colour etc.

But that is not the case when we take data with the same problems. A Model’s perspective is different. In its perspective, 2 identical images, one in black and white and the other in colour is different, and it differentiates between them.

Let’s Understand a little better.

The Statue of Unity, Gujarat, India

The above image is a collage of 5 versions of the same image, with the difference being lighting, colour, angle, shade, perspective etc.

CNNs have exceptional performance while classifying images which are like the dataset. However, If the images contain some degree of tilt or rotation, or if the images have different lighting or shade, then CNNs usually have difficulty in classifying the image.

This can be solved by adding different variations to the image during the training process otherwise known as Data Augmentation.

What is Data Augmentation?

Data Augmentation, Source: Analytics India Magazine

Data Augmentation is a set of techniques to artificially increase the amount of data by generating new data points and perspectives from existing data. This includes making slight changes to data or using Deep Learning models to generate new data points or new perspectives to train the model better.

For a full Articles explaining Data Augmentation, check out the links provided here and here.

Adversarial Examples

From the above drawbacks, it is certain that CNNs recognize the images in a different sense from humans and the need for more training Augmented data won’t solve the problem of learning the object.

If the CNN takes an image along with some noise it recognizes the image as a completely different image whereas the human visual system will identify it as the same image with the noise. This also proves that CNNs are using quite different information from a regular visual system to recognize images.

The slightly modified images are known as “adversarial examples”.

Two different Percepts of the same image.

Coordinate Frame

Convolutional networks recognize the image in terms of cluster of pixels which are arranged in distinct patterns and do not understand them as components which are present in the image. The images as visualized by CNN do not have any internal representations of components and their part-whole relationships.

CNNs do not have coordinate frames which are a basic component of human vision. Coordinate frame is a mental model which keeps track of the orientation and unique features of an object. For example, if we look at the following figure, we can identify that the image on the right, if turned upside-down will give us the image on the left. Just by mentally adjusting our coordinate frame in the brain we can see both faces, irrespective of the picture’s orientation. This is where the human Coordinate frame enables humans to see both the faces.

Minor Disadvantages of CNNs

  • A CNN is significantly slower due to an operation such as maxpool.
  • If the CNN has several layers, then the training process takes a lot of time if the computer doesn’t consist of a good GPU.
  • A ConvNet requires a large Dataset to process and train the neural network.

You can learn a little bit more about the Disadvantages of CNNs here.

Redirections for Further Research

Convolutional Networks Overview by Mr. Sargur Srihari from the University of Buffalo.

Thanks for Reading, Happy Learning!

--

--

Surya Maddula

Student Researcher @ Columbia • TKS 23' & 24' • Patented Innovator • National Record Holder • Growth Engineer