A couple of years back, I realized whenever I took pictures, my phone started recognizing faces and drew a little green box around them. I thought that was a great idea because it notched up my skills as casual photographer from capturing friends and family with the notorious slight blur to decent pictures with crisp contours and in-focus smiles. Since then, I’ve noticed that my phone has continued to dramatically improve and can now recognize not only multiple faces with ease, but even detects sceneries and other details reliably.
The answer as to how my phone does this is directly related to Deep Learning (DL). Deep Learning is one of the fastest growing disciplines in the field of machine learning. Citing Wikipedia, “Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.”
So What Is Machine Learning?
Machine learning is based on digital neural networks (DNN) resembling a network of biological neurons. The building blocks of the DNN are perceptrons, the equivalent to neurons. Perceptrons receive inputs from an external source or perceptrons of a lower layer. The information is processed by a transfer function summing up the input signals, and an activation function which decides if the output is active or not. Different perceptron behavior is achieved by adjusting weight factors of the input signals to decide which input signals are relevant or not to achieve a stimulus dependent output.
How do Artificial Neuronal Networks Learn?
Instead of programming explicit rules to make decisions, digital neural networks are trained by providing input sample data sets (e.g. actual pictures) and output labels describing the data (e.g. face, tree, cat, dog, etc.). An algorithm adjusts values of internal variables digitally connecting neurons. On a high level, those values determine to which degree digital neurons interact with each other. For example, a “zero” means neurons are not connected, while a “one” represents a strong connection between two neurons. Over time, the network “learns” to correctly recognize patterns by calculating the values of the internal parameters. In a sense, the network writes its own implicit rules when provided with input data and output labels. Once the learning phase is completed, the set of values and the digital neural network can be ported to the system that runs the application.
The Transformation to Deep Learning
Because the rate of learning of digital neural networks is considerably slow and computationally intense, training has been very time- and resource-intense. Until recently, practical implementations of digital neural networks were non-hierarchical and the size of the network was constrained by the available computation resources. Due to the recent quantum leap in computation performance by utilizing NVIDIA GPUs for computation tasks, data crunching performance such as what has historically been performed by supercomputers have become available to a broad field of users. Fueled by this development, a new class of machine learning algorithms—deep learning algorithms—has gained increasing popularity. In contrast to previous machine learning algorithms, deep learning algorithms are constituted by multiple hierarchical layers of digital neurons or abstraction layers (e.g., see http://devblogs.nvidia.com/parallelforall/accelerate-machine-learning-cudnn-deep-neural-network-library/). “Deep” refers to large number of hidden layers in the DNN. Benefits of hierarchical neural networks persist in a drastically improved ability to recognize patterns resulting in more reliable applications.
The training phase of deep neural networks requires significantly more computation resources than the application itself. Once trained, simple Deep-Learning based applications like the facial recognition task of a cellphone camera or basic speech or written text recognition can be transferred to less powerful systems like phones, PCs, etc. Other more demanding applications including high quality natural speech recognition, I’m thinking Apple’s Siri, Microsoft’s Cortana, or Google’s “Okay Google” will continue to rely on GPU accelerated clusters. The same applies to video surveillance implementations and applications in the medical sector like medial image analysis and diagnostic assistance of X-ray and MRI images, to mention only a few.
With the ever-growing performance of CPUs, GPUs and computer systems in general, and as deep learning applications becoming more and more sophisticated, I am excited to see the next new trick my phone can perform and the increasing variety of deep-learning based applications and products that will be introduced that are aimed at improving our lives. I’m talking about self-driving cars, smart homes with interactive technologies like Amazon’s Echo at the helm, near-sentient digital personal assistants, real-time language translators, and improvements in medical diagnostics tools that can save our lives before we even realized there was a problem. There are probably hundreds of other applications I haven’t even thought of, that forward thinking companies are busy working on today. But I have a feeling that when we look back 10 years from now, the things that we have only seen in sci-fi movies will be embedded parts of our daily lives, all thanks to the groundwork laid out by current explorations into the power of deep learning techniques.
AMAX Deep Learning Solutions
Our mission at AMAX is to actively participate in the development of deep learning applications by providing workstation to cluster level GPU-accelerated compute solutions with industry leading performance specifically optimized for deep learning applications.
I am looking forward to hearing about your application and how we at AMAX can help.
Dr. Rene Meyer, PhD, Director of Product Development at AMAX