AlexNet for image classification is extremely important because it handles the issue by taking an image from one of 1,000 various classes (for example species of animals) and producing a matrix of 1000 integers as the result. The likelihood that the original picture matches the ith class is represented by the ith member of the output vector. As a result, the output vector’s total element count is 1.

  • AlexNet takes a 256×256 RGB picture as its input. All photos in the training dataset and all testing data must be 256×256 pixels in size.

If the input picture isn’t 256×256, it must be transformed before it can be used to train the network. To do so, the lesser dimension is scaled to this specific size, then the resultant picture is cropped to produce an image.

AlexNet architecture

AlexNet was far larger than prior Convolutional Neural Network architecture used mostly for computer vision problems. It contains 650,000 neurons, and it took two GTX 580 3GB GPUs five to six days to train. There are now far more complicated CNNs that can operate extremely quickly on faster GPUs, even on very huge datasets.

  • AlexNet model is composed of 3 connected layers and 5 convolutional layers

There are generally numerous kernels of the very same dimensions in each convolutional layer. The kernel’s width and height are normally equal, and the depth is equal to the number of lines.

The Overlapping Max Pooling layers come after the first two layers. Direct connections exist between the 3td, 4th, and 5th layers. The output of the 5th layer is fed into a sequence of two fully connected layers through an Overlapping Max Pooling layer. The second connected layer sends 1000 class labels into a softmax classifier.

After all of the convolution and completely linked layers, ReLU nonlinearity is applied. Before pooling, the first and second layers’ ReLU nonlinearity is followed by a local normalizing step. Normalization, however, was eventually found to be ineffective by researchers.


The usage of ReLU is a key component of the AlexNet. The traditional method of training a neural network model was to employ sigmoid or tang functions. AlexNet demonstrated that deep CNNs could be trained significantly quicker with ReLU nonlinearity than with saturated already mentioned activation functions.

  • AlexNet implementation shows that it was able to attain a 25% learning failure rate six times quicker using ReLUs than a comparable network using tanh.


The capacity to learn is affected by the structure of the Neural Network; nevertheless, if you are not cautious, it will easily remember the instances in the training examples without grasping the idea. As a result, while the Neural Network performs admirably on training data, it fails to grasp the real notion. It won’t work on new or previously unknown test data. This is referred to as overfitting.

  • Dropout – A neuron gets lost from the system with a rate of 0.5 in dropout. A neuron that has been dropped does not contribute to propagation. As a result, each input is routed through a unique network design.

As a result, the learned parameters are more stable and less likely to become overfitted. There is no dropout during testing, and the entire network is used, however, the output is adjusted by a ratio of 0.5 to compensate for the neurons that were missed during training. Dropout doubles the number of bits required to reach convergence, while AlexNet would overfit significantly without it.

  • Data augmentation– Overfitting may be avoided by presenting a Neural Net with several variations of the same image. In that way, you’re preventing it from memorizing. It is frequently feasible to produce extra data for free from existing data.

Simply randomly mirroring or cropping the original image will result in new data that is simply a warped version of the original data.