1、AlexNet简介 Alexnet是2012年ILSVRC 2012(ImageNet Large Scale Visual Recognition Challenge)竞赛的冠军网络, 分类准确率由传统的70%提升到80%(当时传统方法已进入瓶颈期,所以这么大的提升是非常厉害的)。它是由Hinton和他的学生Alex设计的。也是在那年之后,深度学习模型开始迅速发展。下面的图就是Alexnet原论文中截取的网络结构图。
Alexnet论文原文, AlexNet 
图中有上下两部分是因为作者使用两块GPU进行并行训练, 所以上下两部分的结果是一模一样的。我们直接看下面部分就行了。
接着说说该网络的亮点:
(1)首次使用了GPU进行网络加速训练
(2)使用了ReLU激活函数, 而不是传统的Sigmoid激活函数以及Tanh激活函数
(3)使用了LRN局部相应归一化
(4)在全连接层的前两层使用了Dropout方法按照一定比例随机失活神经元,以减少过拟合
接着给出卷积或池化后的矩阵尺寸大小计算公式
N = (W - F  + 2p)/s + 1
其中w是输入图片大小, F是 卷积核或池化核大小, p是padding的像素个数, s是步距。
接下来对每一层进行详细分析
2、模型结构参数剖析 卷积层1 
由于使用了两块GPU, 所以卷积核的个数需要乘以2:
1 2 3 4 5 6 Conv1:     input_ size: [224, 224, 3] -> output:(224 – 11 + (1 + 2))/4 + 1=55 ->(55, 55, 96)     kernels: 48 * 2     kernel_ size: 11     stride: 4 
Conv1: kernels=48 × 2 = 96, kernel_size=11, padding=[1, 2], stride=4  
因此卷积核的个数为96, kernel_size代表卷积核的尺寸, padding代表特征矩阵上下左右补零的参数,stride代表步距。
输入图片的shape=[224, 224, 3], 输出矩阵的计算公式为: (224  - 11 + (1 + 2)) / 4 + 1 = 55
所以输出矩阵的shape为[55, 55, 96]
Conv1 的实现过程【两个GPU计算过程一模一样,所以kernels就按照一块GPU来搭建】
1 2 3 4 self.conv1 = nn.Conv2d(in_channels=3 , kernel_size=11 , out_channels=48 , padding=2 , stride=4 ) x = layers.Conv2D(filters=48 , kernel_size=11 , strides=4 , activation="relu" )(x) 
最大池化下采样层1 
1 2 3 4 5 maxpooling1:  input_ size:(55, 55, 96) kernel_ size:3 padding:0 stride:2 outpu_ size(27, 27, 96) 
Maxpool1: kernel_size=3, padding=0, stride=2 
kernel_size表示池化核大小, padding表示矩阵上下左右补零的参数, stride代表步距。
输入特征矩阵的shape=[55, 55, 96], 输出特征矩阵的shape=[27, 27 , 96]
shape计算: (W -F + 2P)/S+1 = (55 - 3 + 2*0)/2 + 1=27
Maxpool1 的实现过程
1 2 3 4 self.maxpooling1 = nn.MaxPool2d(kernel_size=3 , stride=2 ) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x)   
卷积层2 
1 2 3 4 5 6 7 8 9 Conv2: input_ size:[27, 27, 96] kernel_ size:5 kernels: 128 * 2 padding:2 stride:1 output_ size:[27, 27, 256] 
Conv2: kernel=128×2, kernel_size=5, padding=2, stride=1  
输入特征矩阵的深度为[27, 27, 96], 输出特征矩阵尺寸计算公式为:(27 – 5 + 2 * 2)/1+ 1=27
所以输出特征矩阵的尺寸为[27, 27, 256]
Conv2 的实现过程
1 2 3 4 5 6 self.conv2 = nn.Conv2d(in_channels=48 , kernel_size=5 , out_channels=128 , padding=2 , stride=1 ) x = layers.Conv2D(filters=128 , kernel_size=5 , padding="same" , strides=1 , activation="relu" )(x) 
最大池化下采样层2 
1 2 3 4 5 6 7 maxpooling2: input_ size:[27, 27, 256]  kernel_ size:3 padding:0 stride:2   output_ size:[13, 13, 256] 
Maxpool2: kernel_size=3, padding=0, stride=2 
kernel_size表示池化核大小, padding表示矩阵上下左右补零的参数, stride代表步距。
输入特征矩阵的shape=[27, 27, 256], 输出特征矩阵的shape=[13, 13 , 256]
shape计算: (W -F + 2P)/S+1 = (27- 3 + 2*0)/2 + 1=13
Maxpool2 的实现过程
1 2 3 4 5 self.maxpooling2 = nn.MaxPool2d(kernel_size=3 , stride=2 ) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x) 
卷积层3 
1 2 3 4 5 6 7 conv3: input_ size:[13, 13, 256] kernels: 192*2 = 384 kernel_ size:3 padding:1 stride:1 output_ size:[13, 13, 384] 
Conv3: kernel=192×2, kernel_size=3, padding=1, stride=1  
输入特征矩阵的深度为[27, 27, 96], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13
所以输出特征矩阵的尺寸为[13, 13, 384]
Conv3 的实现过程
1 2 3 4 5 6 7 self.conv3 = nn.Conv2d(in_channels=128 , kernel_size=3 , out_channels=192 , padding=1 , stride=1 ) x = layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x) 
卷积层4 
1 2 3 4 5 6 7 8 9 conv4: input_ size:[13, 13, 384] kernels: 192*2 = 384 kernel_ size:3 padding:1 stride:1 output_ size:[13, 13, 384] 
Conv4: kernel=192×2, kernel_size=3, padding=1, stride=1  
输入特征矩阵的深度为[13, 13, 384], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13
所以输出特征矩阵的尺寸为[13, 13, 384]
Conv4 的实现过程
1 2 3 4 5 6 7 self.conv4 = nn.Conv2d(in_channels=192 , kernel_size=3 , out_channels=192 , padding=1 , stride=1 ) x = layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x) 
卷积层5 
1 2 3 4 5 6 7 8 9 conv5:  input_ size:[13, 13, 384] kernels: 128*2 = 256 kernel_ size:3 padding:1 stride:1 output_ size:[13, 13, 256] 
输入特征矩阵的深度为[13, 13, 384], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13
所以输出特征矩阵的尺寸为[13, 13, 256]
Conv5 的实现过程
1 2 3 4 5 self.conv5 = nn.Conv2d(in_channels=192 , kernel_size=3 , out_channels=128 , padding=1 , stride=1 ) x = layers.Conv2D(filters=128 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x) 
最大池化下采样3 
1 2 3 4 5 6 7 8 maxpool3: input_ size:[13, 13, 256]  kernel_ size:3 padding:0 stride:2 output_ size:[6, 6, 256] 
输入特征矩阵的shape=[13, 13, 256], 输出特征矩阵的shape=[6, 6, 256]
shape计算: (W -F + 2P)/S+1 = (13- 3 + 2*0)/2 + 1=6
Maxpool3 的实现过程
1 2 3 4 5 self.maxpooling3 = nn.MaxPool2d(kernel_size=3 , stride=2 ) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x) 
全连接层1 
uni_size: 4096, unit_size为全连接层的节点个数, 两块GPU所以翻倍 
全连接层2 
uni_size: 4096, unit_size为全连接层的节点个数, 两块GPU所以翻倍 
全连接层3 
uni_size: 1000 , 该层为输出层, 输出节点数对应分类任务中分类类别数。
3、参数列表 
名称 Input_size Kernel_size Kernel_num padding Stride Output_size 尺寸计算  
 
Conv1 
(224,  224,3) 
11 
48*2 
[1,2] 
4 
(55, 55, 96) 
(224-11+2*2)4+1=55 
 
Maxpooling1 
(55, 55, 96) 
3 
0 
2 
(27,  27, 96) 
(55-3+2*0)/2+1=27 
 
Conv2 
(27,  27, 96) 
5 
128*2 
2 
1 
(27,27, 256) 
(27-5+2*2)/1+1=27 
 
Maxpooling2 
(27,27, 256) 
3 
0 
2 
(13, 13, 256) 
(27-3+2*0)/2+1=13 
 
Conv3 
(13, 13, 256) 
3 
192*2 
1 
1 
(13, 13, 384) 
(13-3+2*1)/1+1=13 
 
Conv4 
(13, 13, 384) 
3 
192*2 
1 
1 
(13, 13, 384) 
(13-3+2*1)/1+1=13 
 
Conv5 
(13,13, 384) 
3 
128*2 
1 
1 
(13, 13, 256) 
(13-3+2*1)/1+1=13 
 
Maxpooling3 
(13,  13, 256) 
3 
0 
2 
(6,6,256) 
(13-3+2*0)/2+1=6 
 
FC1 
2048 
 
FC2 
2048 
 
FC3 
1000 
 
 
 
4、代码实现 1、pytorch实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 """ # @File       : model_alexnet.py # @Time       : # @Author     : # @version    :python 3.9 # @Software   : PyCharm # @Description: """ import  torchimport  torch.nn as  nnimport  torch.nn.functional as  Fclass  AlexNet (nn.Module):    def  __init__ (self ):         super (AlexNet, self).__init__()         self.conv1 = nn.Conv2d(in_channels=3 , kernel_size=11 , out_channels=48 , padding=2 , stride=4 )         self.maxpooling1 = nn.MaxPool2d(kernel_size=3 , stride=2 )         self.conv2 = nn.Conv2d(in_channels=48 , kernel_size=5 , out_channels=128 , padding=2 , stride=1 )         self.maxpooling2 = nn.MaxPool2d(kernel_size=3 , stride=2 )         self.conv3 = nn.Conv2d(in_channels=128 , kernel_size=3 , out_channels=192 , padding=1 , stride=1 )         self.conv4 = nn.Conv2d(in_channels=192 , kernel_size=3 , out_channels=192 , padding=1 , stride=1 )         self.conv5 = nn.Conv2d(in_channels=192 , kernel_size=3 , out_channels=128 , padding=1 , stride=1 )         self.maxpooling3 = nn.MaxPool2d(kernel_size=3 , stride=2 )         self.fc1 = nn.Linear(in_features=128  * 6  * 6 , out_features=2048 )         self.fc2 = nn.Linear(in_features=2048 , out_features=2048 )         self.fc3 = nn.Linear(in_features=2048 , out_features=5 )     def  forward (self, x ):         x = self.conv1(x)         x = self.maxpooling1(x)         x = self.conv2(x)         x = self.maxpooling2(x)         x = self.conv3(x)         x = self.conv4(x)         x = self.conv5(x)         x = self.maxpooling3(x)         x = x.view(-1 , 128  * 6  * 6 )         x = F.relu(self.fc1(x))         x = F.relu((self.fc2(x)))         x = self.fc3(x)         return  x 
2、TensorFlow实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 """ # @File       : model_alexnet.py # @Time       : # @Author     :0399 # @version    :python 3.9 # @Software   : PyCharm # @Description: """ from  tensorflow.keras import  layers, models, Model, Sequentialimport  tensorflow as  tfdef  AlexNet_v1 (im_height=224 , im_width=224 , num_classes=1000  ):         input_image = layers.Input(shape=(im_height, im_width, 3 ), dtype="float32" )            x = layers.ZeroPadding2D(((1 , 2 ), (1 , 2 )))(input_image)            x = layers.Conv2D(filters=48 , kernel_size=11 , strides=4 , activation="relu" )(x)     x = layers.MaxPool2D(pool_size=3 , strides=2 )(x)            x = layers.Conv2D(filters=128 , kernel_size=5 , padding="same" , strides=1 , activation="relu" )(x)          x = layers.MaxPool2D(pool_size=3 , strides=2 )(x)          x = layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x)          x = layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x)          x = layers.Conv2D(filters=128 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x)          x = layers.MaxPool2D(pool_size=3 , strides=2 )(x)     x = layers.Flatten()(x)       x = layers.Dropout(0.2 )(x)          x = layers.Dense(2048 , activation="relu" )(x)     x = layers.Dropout(0.2 )(x)          x = layers.Dense(2048 , activation="relu" )(x)     x = layers.Dense(num_classes)(x)     predict = layers.Softmax()(x)     print (predict)          model = models.Model(inputs=input_image, outputs=predict)     return  model class  AlexNet_v2 (Model ):    def  __init__ (self, num_classes=1000  ):         super (AlexNet_v2, self).__init__()         self.features = Sequential([                          layers.ZeroPadding2D(((1 , 2 ), (1 , 2 ))),                          layers.Conv2D(filters=48 , kernel_size=11 , strides=4 , activation="relu" ),                          layers.MaxPool2D(pool_size=3 , strides=2 ),                          layers.Conv2D(filters=128 , kernel_size=5 , padding="same" , activation="relu" ),                          layers.MaxPool2D(pool_size=3 , strides=2 ),                          layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , activation="relu" ),                          layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , activation="relu" ),                          layers.Conv2D(filters=128 , kernel_size=3 , padding="same" , activation="relu" ),                          layers.MaxPool2D(pool_size=3 , strides=2 )])                  self.flatten = layers.Flatten()         self.classifier = Sequential([             layers.Dropout(0.2 ),             layers.Dense(1024 , activation="relu" ),             layers.Dropout(0.2 ),             layers.Dense(128 , activation="relu" ),             layers.Dense(num_classes),             layers.Softmax()         ])     def  call (self, x ):         x = self.features(x)         x = self.flatten(x)         x = self.classifier(x)         return  x input  = tf.random.uniform(shape=(8 , 224 , 224 , 3 ))AlexNet_v1(224 , 224 , 5 )