Deep Learning: The Multilayer Perceptron
Read Time ~ 6 Minutes
In my last article, What is Artificial Intelligence Anyways?, I discussed a the difference between AI, ML, and DL. I also gave a brief overview of all three.
In this article, I wanted to take a closer look at a personal favorite paradigm of mine, deep learning. Specifically, I want to go over one of the most common model architectures that you will encounter while studying and applying deep learning, The Multilayer Perceptron. If this article goes well I plan on doing a write up for more models in the future. Let’s get started!
Multilayer Perceptron
The Multilayer Perceptron (MLP), or sometimes called a Feed-Forward Network (FFN), is one of the most common types of deep learning models you will encounter.
It was actually developed in the late 50s early 60s but didn’t see widespread use until the early 2000s due to the increase in availability of cheap compute power.
In order to understand the MLP a bit better, let’s see a real world example of one in action by discussing the MNIST Dataset.
The MNIST Dataset is a large collection of hand written digits (0-9) each 28x28 pixels in resolution. It is a common dataset to use as your first introduction the deep learning. A sample of the MNIST Dataset looks like this:
A handwritten number 4 from MNIST
Using these images of handwritten digits, we can train a MLP to predict the digit based off the image it is supplied.
Let’s take a look at the underlying structure of an MLP and see how it works:
Let’s break this picture down piece by piece:
Neuron - The most basic object within a deep learning network, otherwise known as a neural network. These are pictured as the green, yellow, and red circles. These are used to store the data that is passed through the network. In the case of our network, the neurons store a specific pixel’s value (0-255 for color and 0-100 for grey scale) from the image we are passing through it. A network can have any amount of neurons, although there are some general “best practices” when building your models.
Layer - A layer is just a group of neurons. Its common to picture them as a vertical stack of neurons. The data within the network is passed from layer to layer as it is trained.
Weight - Weights are what allows our network to “learn”. They are pictured above as the arrows connecting all the neurons together. Think of them as bridges where the data is passed from one neuron to another. In order to cross that bridge the weight must “influence” the value being passed. In reality a weight is just a number that is multiplied to the value being passed. In our case, if a neuron were to affect our network substantially, a high number would be used to multiply the incoming data such as 1.5. On the contrary, if a neuron were to affect our network minimally, a low number would be used such as 0.1. The weights are initialized randomly and get updated during the learning process, more on that in a bit.
Bias - The bias is another parameter added to each individual neuron after the weights are applied. They affect how “active” the neuron is and allow the model to pick up on more complex relationships within the data.
Activation Function - Within each neuron is an activation function and is the final step when passing data from one neuron to another. The activation function takes the data that was multiplied by the weight, added or subtracted to by the bias, and performs one final transformation. The transformation is the introduction of non-linearity. Up until this point the model can be thought of as linear or a straight line. Adding non-linearity into the model produces a key characteristic, differentiability. This characteristic allows our model to learn.
Input Layer - This layer, pictured above as the left most vertical stack of green neurons, is where our data is inputted. The model that is shown actually has 784 input neurons. I just can’t show all those on a single image so the little black dots, or ellipsis, signifies the additional neurons. Each neuron corresponds to a pixel value within our images.
Hidden Layer - The hidden layers, represented as the 2 middle stack of yellow neurons, is where the learning takes place. As you can see, our input consists of 784 neurons. Those inputs are compressed to 100 neurons when passed to our first hidden layer and then compressed to 50 neurons when passed to our second hidden layer. The reason for this, other than saving on computational power, is to be able to represent our images in a lower dimensionality by extracting only the most important features during training.
Output Layer - Last but not least we have the output layer represented as the right stack of red neurons. This is where our model guesses which number our image is. We have 10 neurons in our output layer, on for each number 0-9.
I know that was a lot to cover but you now know the basic building blocks of not only a MLP but other models as well that build upon the MLP.
Sadly, I haven’t talked about how the model actually learns as it would take a full article just to explain and this one is getting a bit to long as is. I plan to cover that in a future article though so stay tuned!
Wrapping it up
Deep learning has got to be my favorite subject and I can’t wait to write more about it in the future! Thank you for reading!
I hope you enjoyed this edition of AI insights.
Until next time.
Andrew-
Have something you want me to write about?
Head over to the contact page and drop me a message. I will be more than happy to read it over and see if I can provide any insights!