Glowing orange lines and boxes in a grid represent an artificial intelligence deep learning network

Convolutional Neural Networks (CNNs), Deep Learning, and Computer Vision

Convolutional neural networks (CNNs) are deep learning architectures that are used in various applications, including image and video processing, natural language processing (NLP), and recommendation systems.

CNN Deep Learning Takeaways

  • A CNN model is a type of deep learning algorithm that analyzes and learns features from large amounts of data.

  • Developing and deploying a CNN model is a complex process with three stages: training, optimizing, and inference.

  • Computer vision combines hardware and software optimized for CNN operations to accelerate image processing and analysis.

  • Computer vision technologies extract insights from videos for use in sectors like industrial, healthcare, and government.

  • Intel's AI developer ecosystem offers software resources, tools, and silicon optimized for CNN operations.

author-image

作者

What Are CNNs and Deep Learning?

A convolutional neural network is a type of deep learning algorithm that is most often applied to analyze and learn visual features from large amounts of data. While primarily used for image-related AI applications, CNNs can be used for other AI tasks, including natural language processing and in recommendation engines.

AI, Machine Learning, and Deep Learning

Before you dive deeper into how CNNs work, it is important to understand how these deep learning algorithms relate to the broader field of AI and the distinctions between commonly used AI-related key terms.

  • Artificial intelligence: The field of computer science focused on intelligent computer programs that can sense, reason, act, and adapt.
  • Machine learning: A subset of AI in which algorithms can improve in performance over time when exposed to more data.
  • Neural network: A series of algorithms used as a process in machine learning that can recognize patterns and relationships in large quantities of data. Neural networks use a logic structure inspired by the human brain and are the foundation for deep learning algorithms.
  • Deep learning: A subset of machine learning in which multilayered neural networks learn from vast amounts of data.

How Do CNNs Work?

Convolutional neural networks work by ingesting and processing large amounts of data in a grid format and then extracting important granular features for classification and detection. CNNs typically consist of three types of layers: a convolutional layer, a pooling layer, and a fully connected layer. Each layer serves a different purpose, performs a task on ingested data, and learns increasing amounts of complexity.

CNNs for Video Analytics

To better understand how CNNs work, let's look at an example of CNNs used for video analytics, a process in which CNN-based computer vision models analyze captured video and extract actionable insights. Computer vision is a subfield of both deep and machine learning that combines cameras, edge- or cloud-based computing, software, deep learning, and CNNs to form neural networks that guide systems in their image processing and analysis. Once fully trained, computer vision models can perform object recognition and detection and even track movement.

For this video analytics example, let's assume the input data is a set of millions of images of cars.

  • Convolutional layers apply filters to the input data and learn feature detections. Typically, there are multiple convolutional layers connected via pooling layers. The early convolutional layers extract general or low-level features, such as lines and edges, while the later layers learn finer details or high-level features, such as car headlights or tires.
  • Pooling layers decrease the size of the convolutional feature map to reduce the computational costs.
  • Fully connected layers learn global patterns based on the high-level features output from the convolutional and pooling layers and generate the global patterns for cars. Once the input data is passed through the fully connected layer, the final layer activates the model, and the neural network issues its predictions.

How Are CNNs Developed?

CNNs are critical to deep learning and enabling diverse use cases across industries and the globe. But to truly grasp their impact, you have to understand how they are developed. CNN development is a time-consuming and complex three-step process, which includes training, optimization, and inference. Intel works directly with developers and data scientists to find new ways to streamline and accelerate this process so new solutions can be up and running faster and easier.

Training

Training of neural networks is typically the most time-consuming and challenging part of creating CNNs for deep learning. During the supervised learning stage, developers teach the network how to perform a specific task like image classification. This involves gathering a large data set of thousands or millions of images, feeding the images to the network, and allowing the network to predict what the image represents. If the prediction is wrong, the neurons must be updated to the correct answer so that future predictions for the same image are accurate. This process continues until the developer is satisfied with the neural network's prediction accuracy. However, Intel has created a toolkit to drastically shorten this process—the Intel® oneAPI DL Frame Developer Toolkit. It offers already-optimized building blocks to streamline designing, training, and validating neural networks.

Optimization

Many developers optimize the neural network without knowing that it's a stage of development called optimization. When done correctly, optimization can drastically simplify the network model and improve the inference performance. The Intel® Distribution of OpenVINO™ toolkit allows developers to convert and optimize their neural network models that are developed using popular frameworks like TensorFlow, PyTorch, and Caffe. The post-training optimization tool in the toolkit helps reduce model size while improving latency with little degradation in accuracy and without retraining.

Inference

After a neural network is trained and optimized, it is deployed as a model to run inference—to classify, recognize, and process new inputs and make new predictions. With the Intel® Distribution of OpenVINO™ toolkit Inference Engine, developers can tune for performance by compiling the optimized network and managing inference operations on specific devices. It also auto-optimizes through device discovery, load balancing, and inferencing parallelism across CPU, GPU and other Intel® hardware devices. View the Intel® Distribution of OpenVINO™ toolkit performance benchmark results.

Deep learning and CNNs will continue to be some of the most powerful AI tools for developers and businesses well into the future.

Global Uses of CNNs, Deep Learning, and Computer Vision

From auto manufacturers and city governments to airports and retail stores, businesses across all industries are leveraging computer vision models in a variety of ways. The number of use cases for deep learning–based computer vision will only increase as compute technology continues to advance and AI can be accelerated at less cost. Here are some common ways CNNs, deep learning, and computer vision are being used across the world.

Industrial: Defect Detection

Manual defect detection in industrial manufacturing is expensive, prone to human errors, and completed in dangerous or harsh environments and is hard to find skilled inspectors for. That's why some manufacturers are starting to explore the use of deep learning, inference, machine vision, and computer vision technologies to automate defect detection on assembly lines.

For example, robotic arc welding is essential to modern heavy machinery manufacturing but is prone to defects involving weld porosity. Porous welds lead to weakness that can't pass inspection, and welds may have to be reworked or materials scrapped altogether. To help with this perpetual problem, Intel created an automated weld-defect detection solution based on ADLINK's EOS vision system with the ADLINK Edge IoT software stack and the Intel® Distribution of OpenVINO™ toolkit action recognition model. At the core of the solution is a neural network–based AI action recognition model trained on welds with and without porosity defects. It detects porosity defects in near-real time, allowing them to be acted upon immediately. Using this solution can help manufacturers reduce delays, wastage, and costs while increasing productivity.1

Worker Safety: PPE Detection

In industries that have an increased risk of employee injuries, such as construction, the importance of workers wearing personal protective equipment (PPE)—hard hats, specialized shoes, vests, glasses, and harnesses—cannot be overstated. It's often difficult for site supervisors to check for PPE compliance among all workers. By training an object detection CNN deep learning model, developers can enable computer vision technologies to identify and determine if workers follow their PPE mandates.

Intelligent Security Systems (ISS) SecurOS Helmet Detection module helps businesses create safer spaces for their workers. It uses neural network algorithms, the Intel® Distribution of OpenVINO™ toolkit, and computer vision technologies to detect if workers are wearing their protective helmets or hard hats. If PPE is not detected, the solution sends alerts in near-real time to inform supervisors of a worker's noncompliance.

Smart and Secure Cities: Traffic Control and Safety

Keeping citizens and passengers safe and secure is always top of mind for city government and public transportation officials.

Implementing deep learning–based video systems can deliver additional levels of detail that can take safety to the next level. For example, when managing traffic in large cities with more-granular information, transit officials are able to track trucks hauling hazardous materials and reroute them away from congested, high-population areas.

Another way smart cities are enhancing public safety, while also increasing sustainability, is through reducing traffic congestion. The major metropolis of Taipei, Taiwan, recently implemented a traffic control solution that helped to decrease traffic congestion by 10 to 15 percent.2 The smart traffic signal solution enables visual machine data collection and inference at the edge—inside traffic control signals—to get real-time traffic insights and help lower infrastructure costs. Optimized with the Intel® Distribution of OpenVINO™ toolkit, the solution uses an embedded Intel® Pentium® processor for machine vision workloads. Additionally, because inference takes place in traffic signal devices, less network infrastructure was required, allowing the city's Traffic Engineering Office to lower network communications costs by 85 percent.

Retail: Shelf Inventory Monitoring

Today's shoppers have high expectations when it comes to finding and purchasing the products they want. Low inventory means lost sales and unhappy customers. Traditionally, managing stock is done manually, a time-consuming task that's subject to human error. Automating shelf inspection with real-time shelf monitoring via AI and computer vision can make inventory management faster and more accurate.

An Intel-enabled solution provider uses deep learning and computer vision for its hybrid retail inventory monitoring solution. The solution uses deep neural networks optimized by the Intel® Distribution of OpenVINO™ toolkit and Intel® DevCloud for the Edge to perform SKU-level product detection via fixed cameras. Edge inferencing is handled by an Intel® Xeon® Scalable processor, and a lightweight PC serves as the gateway. With on-demand and near-real-time detection and notifications, the solution enables retailers to accelerate inventory monitoring and complete it with exceptional granularity and accuracy.

Healthcare: Accelerated Medical Imaging

Visual analysis of CT scans and other types of medical imaging can be a time-consuming manual task for radiologists, particularly when patient volume is high. Applying AI algorithms to imaging devices can help flag critical cases and prioritize them for radiologists, potentially expediting time to diagnosis, improving outcomes, and reducing healthcare costs. The need for AI-powered imaging was never more apparent than at the beginning of the COVID-19 pandemic. In healthcare systems, there was a pressing need for fast and effective screening tools to identify infected patients to ensure isolation and treatment. However, medical professionals reported that the largest bottlenecks in triage and diagnosis were caused by the scarcity and long processing time of viral tests. To help clinicians detect COVID-19 in patients, DarwinAI developed the COVID-Net CNN architecture with optimizations made using the Intel® Distribution of OpenVINO™ toolkit. When DarwinAI developers put the neural network model architecture to the test, it resulted in an accuracy rate of 98.1 percent while having relatively low architectural and computational complexity.3 This allowed radiologists the ability to diagnose a greater number of patients.

Why Choose Intel for AI

From AI frameworks to optimized neural network models, development tools for deep learning inference, and accelerators and storage infrastructure optimized for AI, Intel's end-to-end portfolio includes everything developers and businesses need to build and deploy AI applications at scale.

Intel® software products for AI developers include:

Intel® AI hardware products include:

Fueling the Future with CNNs and Deep Learning

Deep learning and CNNs will continue to be some of the most powerful AI tools for developers and businesses well into the future. Businesses will always be driven to find new, innovative solutions to help address their unique challenges. Many will turn to technologies that are built upon deep learning and CNNs—such as computer vision, AI, augmented reality, and virtual reality—as solutions. As AI advances, Intel is committed to making it as seamless as possible for developers, data scientists, researchers, and data engineers to prepare, build, deploy, and scale their AI solutions.

Frequently Asked Questions

A convolutional neural network is a type of deep learning algorithm that is most often applied to analyze and learn visual features from large amounts of data. While primarily used for image-related AI applications, CNNs can be used for other AI tasks, including natural language processing and in recommendation engines.

Convolutional neural networks work by ingesting and processing large amounts of visual data in a grid format and then extracting important granular features for classification and detection. All CNNs have a convolutional layer, a pooling layer, and a fully connected layer. Each layer serves a different purpose, performs a task on ingested data, and learns increasing amounts of complexity.