Computer Vision is a field of artificial intelligence that deals with images and pictures to solve real-life visual problems. The ability of the computer to recognize, understand and identify digital images or videos to automate tasks is the main goal which computer vision tasks seek to accomplish and perform successfully.
ç动态数值Vision是人工智能与图像和图片交易,以解决现实生活中的视觉问题的领域。 计算机识别,理解和识别数字图像或视频以自动执行任务的能力是计算机视觉任务寻求成功完成和执行的主要目标。
Humans have no problem to identify the objects and the surroundings around them. However, it is not so easy for computers to identify and distinguish the various patterns, visuals, images, and objects in the environment. The reason for this difficulty arises because the interpretability of the human brain and eyes differ from computers which interpret most of the outputs in either 0’s or 1’s i.e. in binary. The images are often times converted in arrays of three dimensions consisting of the colors red, blue, green. They have a range of values that can be computed from 0 to 255 and using this conventional means of arrays, we can write code exclusive to identify and recognize images. With the rising technology and advancements in machine learning, deep learning, and computer vision, modern computer vision projects can solve complicated tasks like image segmentation and classification, object detection, face recognition, and so much more.
人类可以轻松识别物体及其周围的环境。 但是,计算机识别并区分环境中的各种图案,视觉效果,图像和对象并非易事。 出现这种困难的原因是因为人脑和眼睛的可解释性与计算机不同,计算机以0或1(即二进制)解释大多数输出。 图像通常按三维阵列进行转换,三维阵列由红色,蓝色,绿色组成。 它们具有可以在0到255之间计算的值范围,并且使用这种常规的数组方法,我们可以编写专有代码来识别和识别图像。 随着技术的进步以及机器学习,深度学习和计算机视觉的进步,现代计算机视觉项目可以解决复杂的任务,例如图像分割和分类,对象检测,人脸识别等等。
We will be looking at two projects for beginners to get started with computer vision, then we will look at two more intermediate level projects to gain a more solid foundation of computer vision with machine learning and deep learning. Finally, we will look at one advanced level computer vision project using deep learning. For each project, we will briefly discuss the theory related to the particular project. After this, we will understand how these projects can be handled and optimized. I will try to provide at least one link to the resources that will help you to get started with each of these projects.
我们将研究两个项目供初学者使用计算机视觉入门,然后我们将研究另外两个中级项目,以通过机器学习和深度学习获得更坚实的计算机视觉基础。 最后,我们将研究一个使用深度学习的高级计算机视觉项目。 对于每个项目,我们将简要讨论与特定项目有关的理论。 之后,我们将了解如何处理和优化这些项目。 我将尝试提供至少一个指向这些资源的链接,这些链接将帮助您开始使用这些项目。
初级计算机视觉项目:(Beginner level computer vision projects:)
1.颜色检测- (1. Color Detection —)
This is a basic project for beginners to get started with the computer vision module open-cv. Here, you can learn how exactly you can distinguish the various colors apart from each other. This starter project also helps in the understanding the concepts of masking and is perfect for a beginner level computer vision project. The task is to distinguish between the various colors like red, green, blue, black, white, etc. from the specific frame and display only the visible colors. This project allows the user to gain a better understanding of how exactly masking works for more complicated image classification and image segmentation tasks. This beginner project can be used to learn more detailed concepts of how exactly these images of numpy arrays are exactly stacked in the form of RGB images. You can also learn about the conversion of images from the color form into a form of grayscale images.
这是初学者入门的计算机视觉模块open-cv的基础项目。 在这里,您可以了解如何准确地区分各种颜色。 该入门项目还有助于理解蒙版的概念,非常适合初学者级别的计算机视觉项目。 任务是从特定框架中区分各种颜色,例如红色,绿色,蓝色,黑色,白色等,并仅显示可见颜色。 该项目使用户可以更好地了解遮罩对于更复杂的图像分类和图像分割任务的工作原理。 该初学者项目可用于学习有关如何将这些numpy数组的图像准确地以RGB图像形式正确堆叠的更详细的概念。 您还可以了解将图像从彩色形式转换为灰度图像形式的知识。
More complex projects can be achieved with the same task by using deep learning models such as UNET or CANET to solve more complex image segmentation and classification tasks along with the maskings of each image. There is a wide range of complex projects available with deep learning approaches if you want to learn more.
通过使用诸如UNET或CANET之类的深度学习模型来解决更复杂的图像分割和分类任务以及每个图像的遮罩,可以用相同的任务完成更复杂的项目。 如果您想了解更多信息,则可以使用深度学习方法来获得各种各样的复杂项目。
There are lots of free resources available online to get started with the color detection project of your choice. After researching and looking at the various resources and choices I found the below reference to be quite optimal because it has a YouTube video as well a detailed explanation of the code. Both the starter code and the video demonstration is provided by them.
在线提供了许多免费资源,可以开始使用您选择的颜色检测项目。 在研究并查看了各种资源和选择之后,我发现以下参考文献是最佳的,因为它具有YouTube视频以及代码的详细说明。 它们都提供了入门代码和视频演示。
2.光学字符识别(OCR)— (2. Optical Character Recognition (OCR) —)
This is another basic project best suited for beginners. Optical character recognition is the conversion of 2-Dimensional text data into a form of machine-encoded text by the use of an electronic or mechanical device. You use computer vision to read the image or text files. After reading the images, use the pytesseract module of python to read the text data in the image or the PDF and then convert them into a string of data that can be displayed in python.
这是另一个最适合初学者的基础项目。 光学字符识别是通过使用电子或机械设备将二维文本数据转换为机器编码文本的形式。 您使用计算机视觉读取图像或文本文件。 读取图像后,使用python的pytesseract模块读取图像或PDF中的文本数据,然后将它们转换为可以在python中显示的数据字符串。
The installation of the pytesseract module might be slightly complicated so refer to a good guide to get started with the installation procedure. You can also look at the resource link provided below to make the overall installation process easier. It also guides you through an intuitive understanding of optical character recognition. Once you have an in-depth understanding of how OCR works and the tools required, you can proceed to compute more complex problems. This can be using sequence to sequence attention models to convert the data read by OCR from one language into another.
pytesseract模块的安装可能会稍微复杂一些,因此请参考良好的指南以开始进行安装过程。 您也可以查看下面提供的资源链接,以简化整个安装过程。 它还会指导您直观了解光学字符识别。 一旦您对OCR的工作原理和所需的工具有了深入的了解,就可以继续计算更复杂的问题。 可以使用序列对注意力模型进行序列化,以将OCR读取的数据从一种语言转换为另一种语言。
Here are two links that will help you to get started with Google text-to-speech and optical character recognition. View the references provided in the optical character recognition link to understand more concepts and learn about OCR in a more detailed approach.
这是两个链接,可帮助您开始使用Google文本语音转换和光学字符识别。 查看光学字符识别链接中提供的参考,以了解更多概念并以更详细的方式了解OCR。
中级计算机视觉项目: (Intermediate level computer vision projects:)
1.使用深度学习进行人脸识别- (1. Face Recognition using Deep Learning —)
Face recognition is the procedural recognition of a human face along with the authorized name of the user. Face detection is a simpler task and can be considered as a beginner level project. Face detection is one of the steps that is required for face recognition. Face detection is a method of distinguishing the face of a human from the other parts of the body and the background. The haar cascade classifier can be used for the purpose of face detection and accurately detect multiple faces in the frame. The haar cascade classifier for frontal face is usually an XML file that can be used with the open-cv module for reading the faces and then detecting the faces. A machine learning model such as the histogram of oriented gradients (H.O.G) which can be used with labeled data along with support vector machines (SVM’s) to perform this task as well.
人脸识别是对人脸以及用户授权名称的程序识别。 人脸检测是一项较简单的任务,可以视为初学者级项目。 人脸检测是人脸识别所需的步骤之一。 人脸检测是一种将人的脸与身体其他部位和背景区分开的方法。 haar级联分类器可用于面部检测的目的,并准确检测帧中的多个面部。 用于正面人脸的haar级联分类器通常是XML文件,可与open-cv模块一起使用以读取人脸然后检测人脸。 诸如定向梯度直方图(HOG)之类的机器学习模型也可以与标记数据以及支持向量机(SVM)一起使用,以执行此任务。
The best approach for face recognition is to make use of the DNN’s (deep neural networks). After the detection of faces, we can use the approach of deep learning to solve face recognition tasks. There is a huge variety of transfer learning models like VGG-16 architecture, RESNET-50 architecture, face net architecture, etc. which can simplify the procedure to construct a deep learning model and allow users to build high-quality face recognition systems. You can also build a custom deep learning model for solving the face recognition task. The modern models built for face recognition are highly accurate and provide an accuracy of almost over 99% for labeled datasets. The applications for the face recognition models can be used in security systems, surveillance, attendance systems, and a lot more.
面部识别的最佳方法是利用DNN(深度神经网络)。 在检测到人脸之后,我们可以使用深度学习的方法来解决人脸识别任务。 迁移学习模型种类繁多,例如VGG-16架构,RESNET-50架构,人脸网络架构等,可以简化构建深度学习模型的过程,并允许用户构建高质量的人脸识别系统。 您还可以构建自定义的深度学习模型来解决人脸识别任务。 用于人脸识别的现代模型具有很高的准确性,可为标记的数据集提供几乎超过99%的准确性。 人脸识别模型的应用程序可用于安全系统,监视,考勤系统等。
Below is an example of a face recognition model built by me using the methods of VGG-16 transfer learning for face recognition after the face detection is performed by the haar cascade classifier. Check it out to learn a more detailed explanation of how exactly you can build your very own face recognition model.
以下是由我通过Haar级联分类器执行人脸识别后,使用VGG-16转移学习方法进行人脸识别的人脸识别模型的示例。 进行检查,以了解有关如何精确构建自己的面部识别模型的更详细说明。
2.对象检测/对象跟踪- (2. Object Detection/Object Tracking —)
This computer vision project could easily be considered a fairly advanced one but there are so many free tools and resources that are available that you could complete this task without any complications. The object detection task is the method of drawing a bounding box around the recognized object and identifying the recognized object according to the determined labels and predict these with specific accuracies. the object tracking is slightly different in comparison to the object detection, as you not only detect the particular object but also follow the object with the bounding box around it. Object detection is a computer vision technique that allows us to identify and locate objects in an image or video. With this kind of identification and localization, object detection can be used to count objects in a scene and determine and track their precise locations, all while accurately labeling them. An example of this can be either following a particular vehicle on a road path or tracking a ball in any sports game like golf, cricket, baseball, etc. The various algorithms to perform these tasks are R-CNN’s (Region-based convolutional neural networks), SSD (single shot detector), and YOLO (you only look once) among many others.
这个计算机视觉项目很容易被认为是一个相当高级的项目,但是有太多可用的免费工具和资源,您可以毫无困难地完成此任务。 对象检测任务是这样一种方法:在已识别的对象周围绘制一个边界框,并根据确定的标签识别已识别的对象,并以特定的精度进行预测。 与对象检测相比,对象跟踪略有不同,因为您不仅可以检测到特定对象,还可以跟随对象并使其周围带有边界框。 对象检测是一种计算机视觉技术,可让我们识别和定位图像或视频中的对象。 通过这种识别和定位,对象检测可用于对场景中的对象进行计数并确定和跟踪其精确位置,同时还能对它们进行精确标记。 这样的示例可以是沿着道路上的特定车辆行驶,或者是在任何体育比赛中(例如高尔夫,板球,棒球等)跟踪球。执行这些任务的各种算法是R-CNN(基于区域的卷积神经网络) ),SSD(单发检测器)和YOLO(您只能看一次)等等。
I am going to mention 2 of the best resources by two talented programmers. One method is more so for embedded systems like the raspberry pi and the other one is for PC related real-time webcam object detection. These two below resources are some of the best ways to get started with object detection/object tracking and they have YouTube videos explaining them in detail as well. Please do check out these resources to gain a better understanding of object detection.
我将提到两个有才华的程序员的最佳资源中的2个。 对于像树莓派这样的嵌入式系统,一种方法更为有效,而另一种方法则是与PC相关的实时网络摄像头对象检测。 下面的这两个资源是开始进行对象检测/对象跟踪的一些最佳方法,并且还有YouTube视频也对它们进行了详细说明。 请检查这些资源,以更好地了解对象检测。
高级计算机视觉项目: (Advanced level computer vision projects:)
1.人类的情感和手势识别 (1. Human Emotion and Gesture Recognition —)
This project uses computer vision and deep learning to detect the various faces and classify the emotions of that particular face. Not only do the models classify the emotions but also detects and classifies the different hand gestures of the recognized fingers accordingly. After distinguishing the human emotions or gestures a vocal response is provided by the trained model with the accurate prediction of the human emotion or gesture respectively. The best part about this project is the wide range of data set choices you have available to you.
该项目使用计算机视觉和深度学习来检测各种面Kong并对该特定面Kong的情绪进行分类。 这些模型不仅可以对情绪进行分类,而且可以相应地检测和分类识别出的手指的不同手势。 在区分人类情绪或手势之后,由训练模型提供的语音响应分别具有对人类情绪或手势的准确预测。 该项目最好的部分是您可以使用的多种数据集选择。
The below link is a reference to one of the deep learning projects done by me by using methodologies of computer vision, data augmentation, and libraries such as TensorFlow and Keras to build deep learning models. I would highly recommend viewers to check the below 2-part series for a complete breakdown, analysis, and understanding of how to compute the following advanced computer vision task. Also, make sure to refer to the Google text-to-speech link provided in the previous section to understand how the vocal text conversion of text to speech works.
以下链接是对我通过使用计算机视觉,数据扩充和TensorFlow和Keras等库构建深度学习模型的方法完成的一个深度学习项目的引用。 我强烈建议观看者检查以下两部分的系列,以获取完整的细分,分析和对如何计算以下高级计算机视觉任务的理解。 另外,请确保参考上一节中提供的Google文本语音转换链接,以了解将语音文本转换为语音文本的工作方式。
结论:(Conclusion:)
These are the 5 awesome computer vision project ideas across various difficulty levels. The brief theory for each of the concepts along with a link to some helpful resources was provided accordingly. I hope this article helps the viewers to dive into the amazing field of computer vision and explore the various projects offered by the stream. If you are interested in learning everything about machine learning then feel free to check out my tutorial series that explains every concept about machine learning from scratch by referring to the link which is provided below. The parts of the series will be constantly updated on a weekly basis or sometimes even faster.
这些是跨各种难度级别的5个很棒的计算机视觉项目构想。 相应地提供了每个概念的简要理论以及一些有用资源的链接。 我希望本文能帮助观众深入研究计算机视觉的惊人领域,并探索该流提供的各种项目。 如果您有兴趣学习有关机器学习的所有知识,请随时阅读我的教程系列,该教程通过参考下面提供的链接从头开始解释机器学习的每个概念。 该系列的各个部分将每周或有时甚至更快地不断更新。
Thank you all for sticking on till the end and I hope you enjoyed the read. Have a wonderful day!
谢谢大家一直坚持到最后,希望您阅读愉快。 祝你有美好的一天!