Building Powerful CNNs: Best Practices and Techniques
Read Time ~ 7 Minutes
In a previous article, Deep Learning: The Convolutional Network, I went over the basics of what a CNN was. In this article I wanted to take a deeper dive on building effective CNNs as the effectiveness is generally tied to how well the architecture is laid out and it’s not all that obvious on how a CNN should be structured. Before we hop into the actual structure of the network let’s go over some of the different “parts” of a CNN which are technically hyperparameters. However, unlike traditional machine learning where default hyperparameters are fine to start out with, a CNN’s hyperparameters must be given a lot more attention prior to running the network.
Hyperparameters
While going over these hyperparameters it is sometimes easier to conceptualize them through the use of an example. So let’s take our example from the previous CNN article that I wrote. If you haven’t read it that’s ok, our example is classifying images of cats and dogs. Our images are 256x256 pixels, with three channels, RGB.
Filters - Filters are the actual “windows” that convolve over the input to capture features. You can think of them as a square each made up of m x n rows and columns respectively. Each cell within the filter is associated with a weight that is tunned by the network during back propagation. The weights determine what type of features the filter picks up from the input. So in our example, one filter might pick up horizontal edges while another might pick up vertical ones. The network learns which features are best to distinguish between images of cats and images of dogs. The actual hyperparameter itself is deciding how many filters to use. I’ll go over this once we discuss the other hyperparameters.
Kernel Size - Sometimes this is known as filter size because it does exactly that. It determines the size of each filter which comes in the form of m x n. An example of a kernel size would be 3x3 where we would have 3 rows and 3 columns to make up one square filter that convolves over the images. Generally, it’s a good idea to use odd sizes such as 3x3 or 5x5 due to a few factors such as better pixel alignment, no center ambiguity, and mathematical convenience.
Stride - This hyperparameter dictates how many pixels to move the filter as it convolves over our a network. For this one the default is 1x1 meaning we move the filter over one pixel as it convolves over our image. Using a 1x1 stride means no information is lost however our output dimensionality will roughly stay the same which can be good or bad depending on how much compute we want to use. On the other hand a 2x2 or 3x3 stride means we are skipping that many pixels each our filter convolves over our image. While we might lose a bit of information, our output space will be smaller making our network faster. It is best practice to leave this hyperparameter alone (using default 1x1 stride) and use a different technique that reduces our output dimensionality which I will discuss in a bit.
Padding - As our filter convolves over our images we have a choice to make when it gets the the edge. Do we “throw out” any convolves that do not contain complete information that are obtained by being on the edge of the image or keep them? If we want to keep them we actually need to “pad” the image to create complete convolves by adding zeros to the edge or “padding”. This effects the output space as we move through our network. So, if we want to keep the output space the same we should use padding, if not than don’t.
This is by no means an exaughtstive list of hyperparameters, it serves as a basis of the most common ones to tweak while building your network.
A Closer Look
Now that we have gone over some of the hyperparameters a bit more in detail let’s take a look at our example network we used from the previous article. As not to get bogged down in the last article I left a lot of things out but let’s zoom in now that we are more familiar with some of the options we have when it comes to building CNNs.
The above picture shows what a typical CNN network should consist of in a bit more detail. Let’s break down these layers one at a time.
The first layer is our convolutional layer. We start with 32 filters and a small kernel size of 3x3. In general it is good to start with small kernels sizes as we want to collect as much local information as possible. We keep stride at its default so we don’t lose any information and we use padding.
The second layer is one we have not discussed and it is a normalization layer that is applied to the output of our convolutional layer. Its job is to normalize the data being passed to it by transforming the data to a mean of about zero and a standard deviation of about 1. This aids our network in the training process by making it more stable as well as reducing the computational work load.
Next we have our activation layer to add non-linearity into our network for training. If you are unsure what this does I have another article, Neural Networks: How Do They Learn?, that goes over it a bit more in detail.
Finally, we have a Max Pool layer with a pool size of 2x2. What this does is looks at a 2x2 grid within our image and picks out the highest value. It uses that value to form another image only including those max values. This basically down samples our image and only includes the most important information.
From here we repeat steps 1-4 with a little twist. As you can see, by applying a max pool to our image we have effectively cut our image size in half while retaining only the most important features. We make use of this down sampled image by increasing the number of filters we use as well as the kernel size. What this means is our network starts to get more of a “broader” understanding of our image. Depending on how deep we want our network we can keep repeating this process until our image resolution is extremely small and we have extracted as much information as possible from it. After this we can apply a flatten layer and then some dense layers to compute our classification.
Wrapping it up
While this is a common way to build a CNN it is by no means the only way. There are many different variations being used and I encourage you to seek them out and experiment!
I hope you enjoyed this edition of AI insights.
Until next time.
Andrew-
Have something you want me to write about?
Head over to the contact page and drop me a message. I will be more than happy to read it over and see if I can provide any insights!