TechNight: Deep Learning in Computer Vision

  • 22/05/2018
  • 5 minuten leestijd

TechNight: Deep Learning in Computer Vision

Last month's technight was about one of the hottest topics in IT: Deep Learning. Due to increased hardware performance and several key improvements in the algorithms it's become very easy to get started with Artificial Intelligence. But how do these algorithms work?

The subject of the developers.nl TechNight was 'Deep Learning for Computer Vision'. It was presented by Wouter de Winter, who works as an AI consultant and Technical Lead for Artificial Industry. He first explained what Machine Learning is, and then went on to explain what Deep Learning is, what kind of algorithms are used and gave several interactive demos to explain these concepts.

What is deep learning?

To explain deep learning we first have to take a step back and explain the concepts that came before it. For deep learning this is machine learning (ML). An abstract way to look at machine learning is:

Input -> Blackbox -> Output

You put some data in (e.g. a picture of a cat), the algorithm is the black box and you get some output (e.g. the label 'cat'). There are several branches of machine learning:

  • Supervised learning, where you train the algorithm with training data that's already been categorised.
  • Unsupervised learning, mainly used for clustering data where the input is unlabeled.
  • Reinforcement learning, stimulate the agent to take an action to maximize some kind of reward. The example Wouter showed is a machine that learns how to flip pancakes and catch them: https://www.youtube.com/watch?v=W_gxLKSsSIE

Machine learning consists of five steps:

  1. Get the data
  2. Clean the data, f.e. make sure all the input images are the same size
  3. Training, train the ML algorithm with the training data
  4. Testing, a small part of the data should not be used for training, but used to validate the ML algorithm works as intended. If you use the entire dataset, you run the risk of overfitting where your ML algorithm only understands the dataset and not any new input.
  5. Improve, improve either the algorithm or the input data. It's likely you need to improve the data first.

One example he gave was the American army who tried to use machine learning to shoot camouflaged tanks. The ML algorithm had very high success rate during the testing phase, but when they tried to use it in practice it didn't work. It turned out that in the researchers' data set photos of camouflaged tanks had been taken on cloudy days, while photos of plain forest had been taken on sunny days. The ML algorithm had learned to distinguish cloudy days from sunny days, instead of distinguishing camouflaged tanks from an empty forest. While this specific story may have been an urban legend (https://www.gwern.net/Tanks), it shows dataset bias can be a real risk.

Deep Neural Network (DNN)

Neural networks can be though of as a bunch of equations (also called neurons) that are connected to each other. Each neuron takes some numbers in and sends something else out based on its equation. By changing the equations slightly, you can get the output closer to what you want, which is how they 'learn'. There are several ways to arrange these neurons, one of them is 'deep learning', where you arrange the neurons in layers, where each layer takes input from the previous layer and sends them to the next layer. Another concept is a hidden layer, which is not directly connected to the input or output data. Hidden layers identify significant patterns or features in the output of the previous layer by combining specific outputs with specific weights.

Another key concept is backpropagation. When the neural network outputs a wrong answer, you can calculate the direct cause of the error in the previous layer, and keep repeating that process until you get to the hidden layer closest to the input layer.

Other key concepts include activation functions (such as ReLU) and a way to initialize neuron values using Xavier initialization.

To experiment with DNN's and see how neurons behave based on activation functions and different input check out https://playground.tensorflow.org/

Introduction to Convolutional Neural Networks (CNN)

In Convolutional (filtering and encoding by transformation) Neural Networks every network layer acts as a detection filter for the presence of specific features or patterns present in the original data. The first layers in a CNN detect (large) features that can be recognised and interpreted relatively easy. Later layers detect increasingly (smaller) features that are more abstract (and are usually present in many of the larger features detected by earlier layers). The last layer of the CNN is able to make an ultra-specific classification by combining all the specific features detected by the previous layers in the input data.

Wouter had a simple example of CNNs using Keras in an ipython notebook, which can be found here: https://github.com/wouterdewinter/keras-mnist-cnn

Generative Adversarial Network (GAN)

GAN's are a type of unsupervised learning, where two neural networks compete against each other in a zero-sum game framework. One network (the generator) generates candidates and the other network (the discriminator) evaluates them. F.e. these networks have been used to generate fake, but realistic images of humans.

Example of a GAN

Case study: Bloomy flower recognition

The next demonstration was the Bloomy app. The idea is someone can make a photo of a flower bouquet and the app then recognises all the flowers in the bouquet. To do this they first had to gather a large number of photos for multiple different flowers and preprocess them to ensure they're all the same. They used Google Image search to gather around 300 photos for 10 different flowers. He then did a live demo with the app to recognise a rose.

Crowd sourcing image recognition

Finally, Wouter asked the audience to create a classifier for two flowers. Every audience member got to see pictures of two types of flowers and was asked to determine which type of flower it was. After ~200 classifications the classifier was able to correctly identify almost 90% of all the pictures. For the Bloomy app they're using the same approach to let customers themselves identify new flowers. The demo shows that even with relatively little amounts of information you can still get a very good result.

You can create your own crowd sourced deep convolutional neural network by checking the code here: https://github.com/wouterdewinter/deep-hive

Conclusion

The ideas behind Artificial Intelligence have been around since the fifties. Due to some novel improvements and faster GPU hardware we can now create and train extremely complex neural networks with millions of neurons. While the subject is very complex, even people completely new to the subject were able to at least understand the basics of how these algorithms work. With higher-level libraries like Keras you can create a complex neural networks with only a few lines of code. Wouter concluded that the best time to get started with Artificial Intelligence/Deep Learning is right now!