For a very long time, computer scientists and engineers have been working to make computers perform tasks achievable by humans. One field that has brought us
Close to achieving this is artificial intelligence. Artificial intelligence is a broad subject that cut across computer vision, machine learning and natural language processing
Amongst these, computer vision is one of the most advanced and has had a greater impact for good. So what exactly is computer vision?. In its simplest form,
Computer vision is simply a branch of computer science that deals with making computers see or perceive the world the way the human eye does. Computer vision
Finds numerous application in numerous areas ranging from medicine like quick detection of cancer, diabetes through retina scan, in robotics like self-driving cars
Robot manipulators, autonomous mobile robots, in agriculture like detecting plant diseases and increasing farmer’s output…and so on. To achieve this feat, computers are
Trained by showing them different and many examples of what they would need to identify and the computer learns its features. so say for instance
You want the computer to be able to identify a car; what you would want to do is show the computer different cars in different colors and orientation such that
When the computer sees a new car that it hasn’t seen before, it is able to identify that it is a car.
But to get started, we need to get some intuition as to how a computer interprets images.
This tutorial and the ones that follow will take you from a gentler introduction to computer vision using OpenCV, (An open-source computer vision library) to advanced techniques for computer vision like deep learning using Keras and Tensorflow. Lets
Jump right into it.
Images are interpreted by computers as a giant stack of three-dimensional arrays. Every point in the colored image represents color value intensity between (0 to 255).
A value of 0 means completely black while a value of 255 means completely white. To a computer, a colored image is a stack of color channels typically Blue, Green, Red (BGR) for short. A colored image is gotten by stacking these layers on top of each other, think of this as mixing primary colors.
As humans, our eyes give us a visual of our world, but in the case of computers, the camera is what performs this function, so for this tutorial, you can make use of your computer’s webcam.
To get started with computer vision, you need to have python installed. To do that visit www.python.org/downloads/release/python-382/ to download the latest version
That matches your operating system and next, install NumPy and OpenCV
:pip install numpy
C:\Users\hp>pip install numpy
:pip install opencv-python
C:\Users\hp>pip install opencv-python
$pip install numpy
root@parallels-vm:~# pip install numpy
$pip install opencv-python
root@parallels-vm:~# pip install opencv-python
After installation is completed, test your installation by typing python on the command line and then enter import NumPy, import cv2
If you get the result above then you are good to go!
Open an image using OpenCV
Since we are talking about computer vision, we would be working with images. And one of the first steps in OpenCV is opening an image from a file directory.
Open your python idle and click new. Make sure you have an image in the same file directory as you would store the python code we are about to write. In the
Python scripts enter the following:
Line 1 we imported the library
Line 2, we imported cv2 which is the
Line 4 we read the image using
cv2.imread() method. The name of the image and the image type (jpg or png) to be loaded is taken as a parameter
And finally, in line 5, we show the image using
cv2.imshow(). The function takes two parameters, the name of the window to show the image and the variable holding the image
Loading an image in grayscale
Images contain so much information. And sometimes you want to remove unwanted information so we can start analyzing the image for important features. Say for instance you want to identify a face in an image; the color of the image is irrelevant. to load an image in grayscale we just add one more value to
While we might want to load the image directly as grayscale, sometimes we might want to keep the original colored image. In such a case, we run the code below.
Loading a live video from a camera
After exiting the code, you should see the arrays printed on the debugger
On the debugger enter
The value you see is the number of rows (480) and column (640)
frame.ndim gives the dimension of the colored image-3 as we already discussed that a colored image is a stack of 3 color channels.