1、AlexNet简介

Alexnet是2012年ILSVRC 2012(ImageNet Large Scale Visual Recognition Challenge)竞赛的冠军网络, 分类准确率由传统的70%提升到80%(当时传统方法已进入瓶颈期,所以这么大的提升是非常厉害的)。它是由Hinton和他的学生Alex设计的。也是在那年之后,深度学习模型开始迅速发展。下面的图就是Alexnet原论文中截取的网络结构图。

Alexnet论文原文, AlexNet

image

图中有上下两部分是因为作者使用两块GPU进行并行训练, 所以上下两部分的结果是一模一样的。我们直接看下面部分就行了。

接着说说该网络的亮点:

(1)首次使用了GPU进行网络加速训练

(2)使用了ReLU激活函数, 而不是传统的Sigmoid激活函数以及Tanh激活函数

(3)使用了LRN局部相应归一化

(4)在全连接层的前两层使用了Dropout方法按照一定比例随机失活神经元,以减少过拟合

接着给出卷积或池化后的矩阵尺寸大小计算公式

N = (W - F + 2p)/s + 1

其中w是输入图片大小, F是 卷积核或池化核大小, p是padding的像素个数, s是步距。

接下来对每一层进行详细分析

2、模型结构参数剖析

卷积层1

由于使用了两块GPU, 所以卷积核的个数需要乘以2:

1
2
3
4
5
6
Conv1:
input_size: [224, 224, 3] -> output:(224 – 11 + (1 + 2))/4 + 1=55 ->(55, 55, 96)
kernels: 48 * 2
kernel_size: 11
stride: 4

Conv1: kernels=48 × 2 = 96, kernel_size=11, padding=[1, 2], stride=4

因此卷积核的个数为96, kernel_size代表卷积核的尺寸, padding代表特征矩阵上下左右补零的参数,stride代表步距。

输入图片的shape=[224, 224, 3], 输出矩阵的计算公式为: (224 - 11 + (1 + 2)) / 4 + 1 = 55

所以输出矩阵的shape为[55, 55, 96]

Conv1的实现过程【两个GPU计算过程一模一样,所以kernels就按照一块GPU来搭建】

1
2
3
4
# pytroch
self.conv1 = nn.Conv2d(in_channels=3, kernel_size=11, out_channels=48, padding=2, stride=4)
# tensorflow
x = layers.Conv2D(filters=48, kernel_size=11, strides=4, activation="relu")(x)

最大池化下采样层1

1
2
3
4
5
maxpooling1: 
input_size:(55, 55, 96) kernel_size:3
padding:0
stride:2
outpu_size(27, 27, 96)

Maxpool1: kernel_size=3, padding=0, stride=2

kernel_size表示池化核大小, padding表示矩阵上下左右补零的参数, stride代表步距。

输入特征矩阵的shape=[55, 55, 96], 输出特征矩阵的shape=[27, 27 , 96]

shape计算: (W -F + 2P)/S+1 = (55 - 3 + 2*0)/2 + 1=27

Maxpool1的实现过程

1
2
3
4
# pytorch:
self.maxpooling1 = nn.MaxPool2d(kernel_size=3, stride=2)
# tensorflow
x = layers.MaxPool2D(pool_size=3, strides=2)(x) # [None, 27, 27, 48]

卷积层2

1
2
3
4
5
6
7
8
9
Conv2:
input_size:[27, 27, 96]
kernel_size:5
kernels: 128 * 2
padding:2
stride:1

output_size:[27, 27, 256]

Conv2: kernel=128×2, kernel_size=5, padding=2, stride=1

输入特征矩阵的深度为[27, 27, 96], 输出特征矩阵尺寸计算公式为:(27 – 5 + 2 * 2)/1+ 1=27

所以输出特征矩阵的尺寸为[27, 27, 256]

Conv2的实现过程

1
2
3
4
5
6
# pytorch
self.conv2 = nn.Conv2d(in_channels=48, kernel_size=5, out_channels=128, padding=2, stride=1)

# TensorFlow
# 当stride=1且padding=same时, 表示输出尺寸与输入尺寸相同 ->[None, 27, 27, 128]
x = layers.Conv2D(filters=128, kernel_size=5, padding="same", strides=1, activation="relu")(x)

最大池化下采样层2

1
2
3
4
5
6
7
maxpooling2:
input_size:[27, 27, 256]
kernel_size:3
padding:0
stride:2
output_size:[13, 13, 256]

Maxpool2: kernel_size=3, padding=0, stride=2

kernel_size表示池化核大小, padding表示矩阵上下左右补零的参数, stride代表步距。

输入特征矩阵的shape=[27, 27, 256], 输出特征矩阵的shape=[13, 13 , 256]

shape计算: (W -F + 2P)/S+1 = (27- 3 + 2*0)/2 + 1=13

Maxpool2的实现过程

1
2
3
4
5
# pytorch:
self.maxpooling2 = nn.MaxPool2d(kernel_size=3, stride=2)
# tensorflow
# [None, 27, 27, 128] -> [None, 13, 13, 128]
x = layers.MaxPool2D(pool_size=3, strides=2)(x)

卷积层3

1
2
3
4
5
6
7
conv3:
input_size:[13, 13, 256]
kernels: 192*2 = 384
kernel_size:3
padding:1
stride:1
output_size:[13, 13, 384]

Conv3: kernel=192×2, kernel_size=3, padding=1, stride=1

输入特征矩阵的深度为[27, 27, 96], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13

所以输出特征矩阵的尺寸为[13, 13, 384]

Conv3的实现过程

1
2
3
4
5
6
7
# pytorch
self.conv3 = nn.Conv2d(in_channels=128, kernel_size=3, out_channels=192, padding=1, stride=1)

# TensorFlow
# stride=1, padding=same, 输出不变 ->[None, 13, 13, 192]
x = layers.Conv2D(filters=192, kernel_size=3, padding="same", strides=1, activation="relu")(x)

卷积层4

1
2
3
4
5
6
7
8
9
conv4:
input_size:[13, 13, 384]
kernels: 192*2 = 384
kernel_size:3
padding:1
stride:1

output_size:[13, 13, 384]

Conv4: kernel=192×2, kernel_size=3, padding=1, stride=1

输入特征矩阵的深度为[13, 13, 384], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13

所以输出特征矩阵的尺寸为[13, 13, 384]

Conv4的实现过程

1
2
3
4
5
6
7
# pytorch
self.conv4 = nn.Conv2d(in_channels=192, kernel_size=3, out_channels=192, padding=1, stride=1)

# TensorFlow
# ->[None, 13, 13, 192]
x = layers.Conv2D(filters=192, kernel_size=3, padding="same", strides=1, activation="relu")(x)

卷积层5

1
2
3
4
5
6
7
8
9
conv5:
input_size:[13, 13, 384]
kernels: 128*2 = 256
kernel_size:3
padding:1
stride:1

output_size:[13, 13, 256]

输入特征矩阵的深度为[13, 13, 384], 输出特征矩阵尺寸计算公式为:(13– 3 + 2 * 1)/1+ 1=13

所以输出特征矩阵的尺寸为[13, 13, 256]

Conv5的实现过程

1
2
3
4
5
# pytorch
self.conv5 = nn.Conv2d(in_channels=192, kernel_size=3, out_channels=128, padding=1, stride=1)
# tensorflow
# ->[None, 13, 13, 128]
x = layers.Conv2D(filters=128, kernel_size=3, padding="same", strides=1, activation="relu")(x)

最大池化下采样3

1
2
3
4
5
6
7
8
maxpool3:
input_size:[13, 13, 256]
kernel_size:3
padding:0
stride:2

output_size:[6, 6, 256]

输入特征矩阵的shape=[13, 13, 256], 输出特征矩阵的shape=[6, 6, 256]

shape计算: (W -F + 2P)/S+1 = (13- 3 + 2*0)/2 + 1=6

Maxpool3的实现过程

1
2
3
4
5
# pytorch
self.maxpooling3 = nn.MaxPool2d(kernel_size=3, stride=2)
# tensorflow
# ->[None, 6, 6, 128]
x = layers.MaxPool2D(pool_size=3, strides=2)(x)

全连接层1

uni_size: 4096, unit_size为全连接层的节点个数, 两块GPU所以翻倍

全连接层2

uni_size: 4096, unit_size为全连接层的节点个数, 两块GPU所以翻倍

全连接层3

uni_size: 1000, 该层为输出层, 输出节点数对应分类任务中分类类别数。

3、参数列表

名称 Input_size Kernel_size Kernel_num padding Stride Output_size 尺寸计算
Conv1 (224, 224,3) 11 48*2 [1,2] 4 (55, 55, 96) (224-11+2*2)4+1=55
Maxpooling1 (55, 55, 96) 3 0 2 (27, 27, 96) (55-3+2*0)/2+1=27
Conv2 (27, 27, 96) 5 128*2 2 1 (27,27, 256) (27-5+2*2)/1+1=27
Maxpooling2 (27,27, 256) 3 0 2 (13, 13, 256) (27-3+2*0)/2+1=13
Conv3 (13, 13, 256) 3 192*2 1 1 (13, 13, 384) (13-3+2*1)/1+1=13
Conv4 (13, 13, 384) 3 192*2 1 1 (13, 13, 384) (13-3+2*1)/1+1=13
Conv5 (13,13, 384) 3 128*2 1 1 (13, 13, 256) (13-3+2*1)/1+1=13
Maxpooling3 (13, 13, 256) 3 0 2 (6,6,256) (13-3+2*0)/2+1=6
FC1 2048
FC2 2048
FC3 1000

4、代码实现

1、pytorch实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# !/usr/bin/env python
# -*-coding:utf-8 -*-
"""
# @File : model_alexnet.py
# @Time :
# @Author :
# @version :python 3.9
# @Software : PyCharm
# @Description:
"""
# ================【功能:】====================
import torch
import torch.nn as nn
import torch.nn.functional as F


class AlexNet(nn.Module):
def __init__(self):
super(AlexNet, self).__init__()
self.conv1 = nn.Conv2d(in_channels=3, kernel_size=11, out_channels=48, padding=2, stride=4)
self.maxpooling1 = nn.MaxPool2d(kernel_size=3, stride=2)
self.conv2 = nn.Conv2d(in_channels=48, kernel_size=5, out_channels=128, padding=2, stride=1)
self.maxpooling2 = nn.MaxPool2d(kernel_size=3, stride=2)
self.conv3 = nn.Conv2d(in_channels=128, kernel_size=3, out_channels=192, padding=1, stride=1)
self.conv4 = nn.Conv2d(in_channels=192, kernel_size=3, out_channels=192, padding=1, stride=1)
self.conv5 = nn.Conv2d(in_channels=192, kernel_size=3, out_channels=128, padding=1, stride=1)
self.maxpooling3 = nn.MaxPool2d(kernel_size=3, stride=2)
self.fc1 = nn.Linear(in_features=128 * 6 * 6, out_features=2048)
self.fc2 = nn.Linear(in_features=2048, out_features=2048)
self.fc3 = nn.Linear(in_features=2048, out_features=5)

def forward(self, x):
x = self.conv1(x)
x = self.maxpooling1(x)
x = self.conv2(x)
x = self.maxpooling2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.conv5(x)
x = self.maxpooling3(x)
x = x.view(-1, 128 * 6 * 6)
x = F.relu(self.fc1(x))
x = F.relu((self.fc2(x)))
x = self.fc3(x)
return x


# input = torch.rand([32, 3, 224, 224])
# alexnet = AlexNet()
# print(alexnet)
# output = alexnet(input)
# print(output)

2、TensorFlow实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# !/usr/bin/env python
# -*-coding:utf-8 -*-
"""
# @File : model_alexnet.py
# @Time :
# @Author :0399
# @version :python 3.9
# @Software : PyCharm
# @Description:
"""
# ================【功能:】====================
from tensorflow.keras import layers, models, Model, Sequential
import tensorflow as tf


def AlexNet_v1(im_height=224, im_width=224, num_classes=1000):
# tensorflow中的通道顺序是NHWC
input_image = layers.Input(shape=(im_height, im_width, 3), dtype="float32") # [None, 224, 224, 3]
# x = layers.Conv2D()
x = layers.ZeroPadding2D(((1, 2), (1, 2)))(input_image) # [None, 227, 227, 3]
# (227 - 11 + 2*0) / 4 + 1 = 55 -> [None, 55, 55, 48]
x = layers.Conv2D(filters=48, kernel_size=11, strides=4, activation="relu")(x)
x = layers.MaxPool2D(pool_size=3, strides=2)(x) # [None, 27, 27, 48]
# 当stride=1且padding=same时, 表示输出尺寸与输入尺寸相同 ->[None, 27, 27, 128]
x = layers.Conv2D(filters=128, kernel_size=5, padding="same", strides=1, activation="relu")(x)
# [None, 27, 27, 128] -> [None, 13, 13, 128]
x = layers.MaxPool2D(pool_size=3, strides=2)(x)
# stride=1, padding=same, 输出不变 ->[None, 13, 13, 192]
x = layers.Conv2D(filters=192, kernel_size=3, padding="same", strides=1, activation="relu")(x)
# ->[None, 13, 13, 192]
x = layers.Conv2D(filters=192, kernel_size=3, padding="same", strides=1, activation="relu")(x)
# ->[None, 13, 13, 128]
x = layers.Conv2D(filters=128, kernel_size=3, padding="same", strides=1, activation="relu")(x)
# ->[None, 6, 6, 128]
x = layers.MaxPool2D(pool_size=3, strides=2)(x)

x = layers.Flatten()(x) # [None, 128*6*6]
x = layers.Dropout(0.2)(x)
# [None, 2048]
x = layers.Dense(2048, activation="relu")(x)
x = layers.Dropout(0.2)(x)
# [None, 2048]
x = layers.Dense(2048, activation="relu")(x)
x = layers.Dense(num_classes)(x)

predict = layers.Softmax()(x)
print(predict)

# model = models.Model(inputs=input_image, outputs=predict)
model = models.Model(inputs=input_image, outputs=predict)
return model


class AlexNet_v2(Model):
def __init__(self, num_classes=1000):
super(AlexNet_v2, self).__init__()
self.features = Sequential([
# [None, 224, 224, 3] -> [None, 227, 227, 3]
layers.ZeroPadding2D(((1, 2), (1, 2))),
# padding="valid"表示向上取整 (227 - 11)/4 + 1=55 [None, 227, 227, 3]->[None, 55, 55, 48]
layers.Conv2D(filters=48, kernel_size=11, strides=4, activation="relu"),
# [55, 55, 48] -> [None, 27, 27, 48]
layers.MaxPool2D(pool_size=3, strides=2),
# stride=1, padding=same, 尺寸不变 [None, 27, 27, 48] ->[None, 27, 27, 128]
layers.Conv2D(filters=128, kernel_size=5, padding="same", activation="relu"),
# [None, 27, 27, 128] ->[None, 13, 13, 128]
layers.MaxPool2D(pool_size=3, strides=2),
# [None, 13, 13, 128] -> [None, 13, 13, 192]
layers.Conv2D(filters=192, kernel_size=3, padding="same", activation="relu"),
# [None, 13, 13, 192] -> [None, 13, 13, 192]
layers.Conv2D(filters=192, kernel_size=3, padding="same", activation="relu"),
# [None, 13, 13, 192] -> [None, 13, 13, 128]
layers.Conv2D(filters=128, kernel_size=3, padding="same", activation="relu"),
# [None, 13, 13, 128] -> [None, 6, 6, 128]
layers.MaxPool2D(pool_size=3, strides=2)])

# [None, 128*6*6]
self.flatten = layers.Flatten()
self.classifier = Sequential([
layers.Dropout(0.2),
layers.Dense(1024, activation="relu"),
layers.Dropout(0.2),
layers.Dense(128, activation="relu"),
layers.Dense(num_classes),
layers.Softmax()
])

def call(self, x):
x = self.features(x)
x = self.flatten(x)
x = self.classifier(x)
return x


# input = tf.random.uniform(shape=(16, 224, 32, 3))
input = tf.random.uniform(shape=(8, 224, 224, 3))
# alexnet = AlexNet_v2(num_classes=5)
# print(alexnet)
# print(alexnet.call(input))
AlexNet_v1(224, 224, 5)