1、AlexNet简介 Alexnet是2012年ILSVRC 2012(ImageNet Large Scale Visual Recognition Challenge)竞赛的冠军网络, 分类准确率由传统的70%提升到80%(当时传统方法已进入瓶颈期,所以这么大的提升是非常厉害的)。它是由Hinton和他的学生Alex设计的。也是在那年之后,深度学习模型开始迅速发展。下面的图就是Alexnet原论文中截取的网络结构图。
Alexnet论文原文, AlexNet
图中有上下两部分是因为作者使用两块GPU进行并行训练, 所以上下两部分的结果是一模一样的。我们直接看下面部分就行了。
接着说说该网络的亮点:
(1)首次使用了GPU进行网络加速训练
(2)使用了ReLU激活函数, 而不是传统的Sigmoid激活函数以及Tanh激活函数
(3)使用了LRN局部相应归一化
(4)在全连接层的前两层使用了Dropout方法按照一定比例随机失活神经元,以减少过拟合
接着给出卷积或池化后的矩阵尺寸大小计算公式
N = (W - F + 2p)/s + 1
其中w是输入图片大小, F是 卷积核或池化核大小, p是padding的像素个数, s是步距。
接下来对每一层进行详细分析
2、模型结构参数剖析 卷积层1
由于使用了两块GPU, 所以卷积核的个数需要乘以2:
1 2 3 4 5 6 Conv1: input_ size: [224, 224, 3] -> output:(224 – 11 + (1 + 2))/4 + 1=55 ->(55, 55, 96) kernels: 48 * 2 kernel_ size: 11 stride: 4
Conv1: kernels=48 × 2 = 96, kernel_size=11, padding=[1, 2], stride=4
因此卷积核的个数为96, kernel_size代表卷积核的尺寸, padding代表特征矩阵上下左右补零的参数,stride代表步距。
输入图片的shape=[224, 224, 3], 输出矩阵的计算公式为: (224 - 11 + (1 + 2)) / 4 + 1 = 55
所以输出矩阵的shape为[55, 55, 96]
Conv1 的实现过程【两个GPU计算过程一模一样,所以kernels就按照一块GPU来搭建】
1 2 3 4 self.conv1 = nn.Conv2d(in_channels=3 , kernel_size=11 , out_channels=48 , padding=2 , stride=4 ) x = layers.Conv2D(filters=48 , kernel_size=11 , strides=4 , activation="relu" )(x)
最大池化下采样层1
1 2 3 4 5 maxpooling1: input_ size:(55, 55, 96) kernel_ size:3 padding:0 stride:2 outpu_ size(27, 27, 96)
Maxpool1: kernel_size=3, padding=0, stride=2
kernel_size表示池化核大小, padding表示矩阵上下左右补零的参数, stride代表步距。
输入特征矩阵的shape=[55, 55, 96], 输出特征矩阵的shape=[27, 27 , 96]
shape计算: (W -F + 2P)/S+1 = (55 - 3 + 2*0)/2 + 1=27
Maxpool1 的实现过程
1 2 3 4 self.maxpooling1 = nn.MaxPool2d(kernel_size=3 , stride=2 ) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x)
卷积层2
1 2 3 4 5 6 7 8 9 Conv2: input_ size:[27, 27, 96] kernel_ size:5 kernels: 128 * 2 padding:2 stride:1 output_ size:[27, 27, 256]
Conv2: kernel=128×2, kernel_size=5, padding=2, stride=1
输入特征矩阵的深度为[27, 27, 96], 输出特征矩阵尺寸计算公式为:(27 – 5 + 2 * 2)/1+ 1=27
所以输出特征矩阵的尺寸为[27, 27, 256]
Conv2 的实现过程
1 2 3 4 5 6 self.conv2 = nn.Conv2d(in_channels=48 , kernel_size=5 , out_channels=128 , padding=2 , stride=1 ) x = layers.Conv2D(filters=128 , kernel_size=5 , padding="same" , strides=1 , activation="relu" )(x)
最大池化下采样层2
1 2 3 4 5 6 7 maxpooling2: input_ size:[27, 27, 256] kernel_ size:3 padding:0 stride:2 output_ size:[13, 13, 256]
Maxpool2: kernel_size=3, padding=0, stride=2
kernel_size表示池化核大小, padding表示矩阵上下左右补零的参数, stride代表步距。
输入特征矩阵的shape=[27, 27, 256], 输出特征矩阵的shape=[13, 13 , 256]
shape计算: (W -F + 2P)/S+1 = (27- 3 + 2*0)/2 + 1=13
Maxpool2 的实现过程
1 2 3 4 5 self.maxpooling2 = nn.MaxPool2d(kernel_size=3 , stride=2 ) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x)
卷积层3
1 2 3 4 5 6 7 conv3: input_ size:[13, 13, 256] kernels: 192*2 = 384 kernel_ size:3 padding:1 stride:1 output_ size:[13, 13, 384]
Conv3: kernel=192×2, kernel_size=3, padding=1, stride=1
输入特征矩阵的深度为[27, 27, 96], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13
所以输出特征矩阵的尺寸为[13, 13, 384]
Conv3 的实现过程
1 2 3 4 5 6 7 self.conv3 = nn.Conv2d(in_channels=128 , kernel_size=3 , out_channels=192 , padding=1 , stride=1 ) x = layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x)
卷积层4
1 2 3 4 5 6 7 8 9 conv4: input_ size:[13, 13, 384] kernels: 192*2 = 384 kernel_ size:3 padding:1 stride:1 output_ size:[13, 13, 384]
Conv4: kernel=192×2, kernel_size=3, padding=1, stride=1
输入特征矩阵的深度为[13, 13, 384], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13
所以输出特征矩阵的尺寸为[13, 13, 384]
Conv4 的实现过程
1 2 3 4 5 6 7 self.conv4 = nn.Conv2d(in_channels=192 , kernel_size=3 , out_channels=192 , padding=1 , stride=1 ) x = layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x)
卷积层5
1 2 3 4 5 6 7 8 9 conv5: input_ size:[13, 13, 384] kernels: 128*2 = 256 kernel_ size:3 padding:1 stride:1 output_ size:[13, 13, 256]
输入特征矩阵的深度为[13, 13, 384], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13
所以输出特征矩阵的尺寸为[13, 13, 256]
Conv5 的实现过程
1 2 3 4 5 self.conv5 = nn.Conv2d(in_channels=192 , kernel_size=3 , out_channels=128 , padding=1 , stride=1 ) x = layers.Conv2D(filters=128 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x)
最大池化下采样3
1 2 3 4 5 6 7 8 maxpool3: input_ size:[13, 13, 256] kernel_ size:3 padding:0 stride:2 output_ size:[6, 6, 256]
输入特征矩阵的shape=[13, 13, 256], 输出特征矩阵的shape=[6, 6, 256]
shape计算: (W -F + 2P)/S+1 = (13- 3 + 2*0)/2 + 1=6
Maxpool3 的实现过程
1 2 3 4 5 self.maxpooling3 = nn.MaxPool2d(kernel_size=3 , stride=2 ) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x)
全连接层1
uni_size: 4096, unit_size为全连接层的节点个数, 两块GPU所以翻倍
全连接层2
uni_size: 4096, unit_size为全连接层的节点个数, 两块GPU所以翻倍
全连接层3
uni_size: 1000 , 该层为输出层, 输出节点数对应分类任务中分类类别数。
3、参数列表
名称
Input_size
Kernel_size
Kernel_num
padding
Stride
Output_size
尺寸计算
Conv1
(224, 224,3)
11
48*2
[1,2]
4
(55, 55, 96)
(224-11+2*2)4+1=55
Maxpooling1
(55, 55, 96)
3
0
2
(27, 27, 96)
(55-3+2*0)/2+1=27
Conv2
(27, 27, 96)
5
128*2
2
1
(27,27, 256)
(27-5+2*2)/1+1=27
Maxpooling2
(27,27, 256)
3
0
2
(13, 13, 256)
(27-3+2*0)/2+1=13
Conv3
(13, 13, 256)
3
192*2
1
1
(13, 13, 384)
(13-3+2*1)/1+1=13
Conv4
(13, 13, 384)
3
192*2
1
1
(13, 13, 384)
(13-3+2*1)/1+1=13
Conv5
(13,13, 384)
3
128*2
1
1
(13, 13, 256)
(13-3+2*1)/1+1=13
Maxpooling3
(13, 13, 256)
3
0
2
(6,6,256)
(13-3+2*0)/2+1=6
FC1
2048
FC2
2048
FC3
1000
4、代码实现 1、pytorch实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 """ # @File : model_alexnet.py # @Time : # @Author : # @version :python 3.9 # @Software : PyCharm # @Description: """ import torchimport torch.nn as nnimport torch.nn.functional as Fclass AlexNet (nn.Module): def __init__ (self ): super (AlexNet, self).__init__() self.conv1 = nn.Conv2d(in_channels=3 , kernel_size=11 , out_channels=48 , padding=2 , stride=4 ) self.maxpooling1 = nn.MaxPool2d(kernel_size=3 , stride=2 ) self.conv2 = nn.Conv2d(in_channels=48 , kernel_size=5 , out_channels=128 , padding=2 , stride=1 ) self.maxpooling2 = nn.MaxPool2d(kernel_size=3 , stride=2 ) self.conv3 = nn.Conv2d(in_channels=128 , kernel_size=3 , out_channels=192 , padding=1 , stride=1 ) self.conv4 = nn.Conv2d(in_channels=192 , kernel_size=3 , out_channels=192 , padding=1 , stride=1 ) self.conv5 = nn.Conv2d(in_channels=192 , kernel_size=3 , out_channels=128 , padding=1 , stride=1 ) self.maxpooling3 = nn.MaxPool2d(kernel_size=3 , stride=2 ) self.fc1 = nn.Linear(in_features=128 * 6 * 6 , out_features=2048 ) self.fc2 = nn.Linear(in_features=2048 , out_features=2048 ) self.fc3 = nn.Linear(in_features=2048 , out_features=5 ) def forward (self, x ): x = self.conv1(x) x = self.maxpooling1(x) x = self.conv2(x) x = self.maxpooling2(x) x = self.conv3(x) x = self.conv4(x) x = self.conv5(x) x = self.maxpooling3(x) x = x.view(-1 , 128 * 6 * 6 ) x = F.relu(self.fc1(x)) x = F.relu((self.fc2(x))) x = self.fc3(x) return x
2、TensorFlow实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 """ # @File : model_alexnet.py # @Time : # @Author :0399 # @version :python 3.9 # @Software : PyCharm # @Description: """ from tensorflow.keras import layers, models, Model, Sequentialimport tensorflow as tfdef AlexNet_v1 (im_height=224 , im_width=224 , num_classes=1000 ): input_image = layers.Input(shape=(im_height, im_width, 3 ), dtype="float32" ) x = layers.ZeroPadding2D(((1 , 2 ), (1 , 2 )))(input_image) x = layers.Conv2D(filters=48 , kernel_size=11 , strides=4 , activation="relu" )(x) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x) x = layers.Conv2D(filters=128 , kernel_size=5 , padding="same" , strides=1 , activation="relu" )(x) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x) x = layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x) x = layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x) x = layers.Conv2D(filters=128 , kernel_size=3 , padding="same" , strides=1 , activation="relu" )(x) x = layers.MaxPool2D(pool_size=3 , strides=2 )(x) x = layers.Flatten()(x) x = layers.Dropout(0.2 )(x) x = layers.Dense(2048 , activation="relu" )(x) x = layers.Dropout(0.2 )(x) x = layers.Dense(2048 , activation="relu" )(x) x = layers.Dense(num_classes)(x) predict = layers.Softmax()(x) print (predict) model = models.Model(inputs=input_image, outputs=predict) return model class AlexNet_v2 (Model ): def __init__ (self, num_classes=1000 ): super (AlexNet_v2, self).__init__() self.features = Sequential([ layers.ZeroPadding2D(((1 , 2 ), (1 , 2 ))), layers.Conv2D(filters=48 , kernel_size=11 , strides=4 , activation="relu" ), layers.MaxPool2D(pool_size=3 , strides=2 ), layers.Conv2D(filters=128 , kernel_size=5 , padding="same" , activation="relu" ), layers.MaxPool2D(pool_size=3 , strides=2 ), layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , activation="relu" ), layers.Conv2D(filters=192 , kernel_size=3 , padding="same" , activation="relu" ), layers.Conv2D(filters=128 , kernel_size=3 , padding="same" , activation="relu" ), layers.MaxPool2D(pool_size=3 , strides=2 )]) self.flatten = layers.Flatten() self.classifier = Sequential([ layers.Dropout(0.2 ), layers.Dense(1024 , activation="relu" ), layers.Dropout(0.2 ), layers.Dense(128 , activation="relu" ), layers.Dense(num_classes), layers.Softmax() ]) def call (self, x ): x = self.features(x) x = self.flatten(x) x = self.classifier(x) return x input = tf.random.uniform(shape=(8 , 224 , 224 , 3 ))AlexNet_v1(224 , 224 , 5 )