AI, Artificial Intelligence, Computer Vision, Computer Vision Algorithms, Facial Recognition, Machine Learning, ML

A Primer Into Computer Vision

surveillance-gdb238abc7_640

In our previous articles, we examined two key areas of AI: Machine Learning and Neural Networks. In this latest post, we take a look at yet another facet of AI – Computer Vision. Essentially, this is the field of study where a computer tries to replicate the vision processing of the human eye.

What Is Computer Vision?

Computer vision can be defined technically as follows:

“It enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs — and take actions or make recommendations based on that information. If AI enables computers to think, computer vision enables them to see, observe and understand.”

(SOURCE:  1).

A Computer Vision (CV) system is like the other mechanisms we have reviewed. But instead of taking quantitative data such as large sets of information to create an output, CV takes other forms of qualitative data, as provided in the above definition. From there, it is fed into the system so it can gain an understanding of what it is actually seeing, much like how the human brain processes visual inputs from the external environment.

Importance Of The Visual Cortex

Any CV system is designed to replicate as much as possible, the processes of the Visual Cortex of the human brain. In the world of neuroscience, this is often referred to as “V1”. It is actually found in both cerebral hemispheres, and this is the part that is responsible for the processing of visual stimuli that is sent to the human brain for understanding.

But keep in mind that just like a CV system, any information first needs to be sent to the brain before any processing can actually occur. This is where the role of Iris and Retina comes into play. Let’s look at both.

1) The Iris:

This is the colored region between the pupil and the whites of the eye, also known as the “Sclera”, and this part of the human eye controls the amount of light that enters the pupil. For example, if there is a lot of light present in the external environment, the Iris closes off to a certain extent. In contrast, if there is less light, the Iris will expand so that there is light coming into the eye. 

This is done so that the human brain does not receive any visual inputs that are skewed, but of course, if the individual is suffering from an ailment of the Iris, these inputs will then be highly irregular.  An image of the iris can be seen below:

2) The Retina:

This is a layer of tissue that wraps around about 65% of the eye. The Retina’s primary function is to collect all of the visual stimuli that have been collected and transmit that into the Optic Nerve.  

There are two parts to the Retina:

  • Rods: These are located on the outer part of the Retina. This is what allows the human being to see visual stimuli under dark conditions.
  • Cones: These are located in the central part of the Retina, and this is what captures the detail of the external stimuli. The information that is transmitted to the Retina is then sent over to the Optic Nerve, which joins the Optic Disc. This is the point of capture where all visual stimuli is sent to the human brain for processing. 

This is illustrated below, and the Optic Disc can be seen to the right, in the central portion of the image:

Algorithms Used In Computer Vision

Some of the more widely used computer vision algorithms include the following:

1) Image Classification:

Image classification is where visual inputs that are fed into a CV system are placed into appropriate categories. For example, if a computer vision system sees an image of a car, it should be able to automatically place it under the heading of “Automobiles”. 

However, a major roadblock at this present time is that while the CV system is fed that same image of the car, there is no category assigned to it.  So, how will it know where to place that image? The goal with this algorithm is for the system to learn what that image is, and on its own, automatically create a category label so the picture of the car can be placed there.

2) Object Detection:

With object detection algorithms, the goal for a CV system takes to capture all of the different objects in an image, create a boxed region around them, and assign them with a label. Back to our car example again, suppose there is an image of 20 cars, some are Hondas and some are Toyotas. An object detection CV system should be able to pick out the former ones first and create a box around them. 

For example, if the Honda is an Accord model, then a box should be placed around it with the label “Honda Accord”. This same process holds true for Toyota cars. In this situation, the CV system should be able to discriminate against multiple objects that look the same from the outset.

3) Sematic Segmentation:

With semantic segmentation algorithms, an image is broken down at its most granular level – i.e. the pixel. The goal of semantic computer vision systems is to understand what each pixel means and how it contributes to the overall image. This is useful when determining where the exact boundaries of each object lie in the image, without approximating it.

4) Instance Segmentation:

Instance segmentation algorithms work to look at the entire image. However, not only does it do that, but it is also specifically designed to break a certain object down into its most minute detail. Going back once again to our car example, suppose this system picks out a Honda Accord as the object, from there, it should be able to identify its color, types of tires, etc.

Applications of Computer Vision

Let’s take a look at some commonly used computer vision applications:

1) Facial Recognition:

This is a Biometric-based modality that confirms the identity of an individual based upon their facial features. As this technology is still far from perfect, computer vision is being used to augment it in an attempt to fully guarantee the identity of the person in question. This combination is now being used effectively in CCTV camera technology.

2) Healthcare:

The bulk of the information and data that is collected from a patient is visual in nature, such as X Rays, CAT scan images, etc. Computer Vision can be used here to confirm the medical diagnosis formulated by the healthcare provider, making it a big advantage in the healthcare industry.

3) Agriculture:

In agriculture applications, images that are collected from satellites or drones can be used to analyze the health of crops in certain areas of a field. This is particularly useful in early diagnosis so that the producer can tackle any problems just as soon as they emerge, as opposed to waiting until later.

Conclusions on Computer Vision

This article aimed to provide an overview of the importance of the visual cortex and how it impacts computer vision applications. We also looked at the main computer vision algorithms being utilized by today’s AI professionals, and how they work to learn and make sense of the images and objects presented.

The examples we provided on how this technology is being used across industries were to give you an idea of how this form of AI is being used and the advantages it brings, such as faster and simplified processes, improved products and services, as well as all-important cost reduction and improved ROI.

Sources

1) https://www.ibm.com/topics/computer-vision   

Anthony Figueroa

Want to work with Anthony Figueroa? We are hiring :)

Other articles you might like