1、简介

GoogleNet在2014年由Google团队提出,斩获当年ImageNet竞赛中Classification Task分类任务第一名。

论文原文: GoogleNet原文)

image

首先介绍一下该网络的亮点:

  • 引入Inception结构(融合不同尺度的特征信息)

  • 使用1×1的卷积核进行降维以及映射处理

  • 添加两个辅助分类器帮助训练

  • 丢弃全连接层,使用平均池化层(大大减少模型参数, 除去两个辅助分类器, 网络大小只有vgg的1/20)

接着分析一下Inception结构:

image

​ 左图是论文中提出的inception原始结构, 右图是inception加上降维功能的结构。

先看左图, inception一共有4个分支, 也就是说输入的特征矩阵并行通过这个4个分支得到四个输出, 然后在将这个四个输出在深度维度(channel维度)进行拼接得到最终的输出(注意:为了让四个分支的输出能够在深度方向进行拼接, 必须保证四个分支输出的特征矩阵高度和宽度都相同)

分支1是卷积核大小为1×1的卷积层, stride=1

分支2是卷积核大小为3×3的卷积层, stride=1, padding=1(保证输出特征矩阵的高和宽和输入特征矩阵相等)

分支3是卷积核大小为5×5的卷积层, stride=1, padding=2(保证输出特征矩阵的高和宽和输入特征矩阵相等)

分支4是池化核大小为3×3的最大池化下采样层, stride=1, padding=1(保证输出特征矩阵的高和宽和输入特征矩阵相等)

在看右图, 对比左图, 就是在分支2,3,4上加入了卷积核为1×1的卷积层, 目的是为了降维, 减少模型训练参数, 减少计算量。

下面看一下1×1卷积核如何减少模型参数的, 同样是对一个深度为512的特征矩阵使用64个大小为5×5的卷积核进行卷积, 不使用1×1卷积核进行降维一共需要819200个参数, 如果使用1×1卷积核进行降维, 一共需要50688个参数,明显减少了很多。

image

每个卷积核的参数如何确定呢, 下面是原论文中给出的参数列表, 对于我们搭建的inception模块, 所需要使用的参数有#1x1, #3x3reduce, #3x3, #5x5reduce, #5x5, poolproj这6个参数, 分别对应着所需要的卷积核的个数

下面将inception模块所用到的参数信息标注在每个分支上, #1x1对应着分支上1x1的卷积核个数, #3x3reduce对应着分支2上1x1的卷积核个数, #3x3对应着分支2上3x3的卷积核个数, #5x5reduce对应着分支3上1x1的卷积核个数, #5x5对应着分支3上5x5的卷积核个数,poolproj对应着分支4上1x1的卷积核个数。

image

接下来看辅助分类器结构,网络中的两个辅助分类器结构一模一样的, 如下图所示:

image

这两个辅助分类器的输入分别来自Inception(4a)和inception(4d)。

辅助分类器的第一层是一个平均池化下采样层, 池化核大小为5x5, stride=3,

第二层是卷积层, 卷积核大小为1x1, stride=1, 卷积核个数是128

第三层是全连接层, 节点个数为1024

第四层是全连接层, 节点个数为1000(对应分类任务中分类类别数)

下面给出了GoogleNet网络结构图

image

2、代码实现

1、pytorch实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# !/usr/bin/env python
# -*-coding:utf-8 -*-
"""
# @File : model_googlenet.py
# @Time :
# @Author :
# @version :python 3.9
# @Software : PyCharm
# @Description:
"""
# ================【功能:】====================
import torch.nn as nn
import torch
import torch.nn.functional as F
# pytorch官方参考代码:https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py

class GoogleNet(nn.Module):
def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
super(GoogleNet, self).__init__()
self.aux_logits = aux_logits
# (224 - 7 + 2 * 3)/2 + 1 = 112 (3, 224, 224) -> (64, 112, 112)
self.conv1 = BaseConv2d(in_channels=3, out_channels=64, kernel_size=7, padding=1, stride=2)
# (64, 112, 112) -> (64, 56, 56) 看一下这里的参数是如何计算的,
self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
# (56 - 3 + 2 * 0)/1 + 1 = 56 (64, 56, 56) -> (64, 56, 56)
self.conv2 = BaseConv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1)
# (56 - 3 + 2 * 0)/1 + 1 = 56 (64, 56, 56) -> (192, 56, 56)
self.conv3 = BaseConv2d(in_channels=64, out_channels=192, kernel_size=3, stride=1)

# (192, 56, 56) -> (192. 28. 28)
self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2)

# Inception3a 具体参数可查看表
self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)

self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)


if self.aux_logits:
# Inception4b 512
self.aux1 = InceptionAux(512, num_classes)
# Inception4e 528
self.aux2 = InceptionAux(528, num_classes)
# 指定输出固定尺寸
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.dropout = nn.Dropout(0.4)
self.fc = nn.Linear(1024, num_classes)
if init_weights:
self._initialize_weights()


def forward(self, x):
# [N, 3, 224, 224] -> [N, 64, 112, 112]
x = self.conv1(x)
# [N, 64, 112, 112] -> [N, 64, 56, 56]
x = self.maxpool1(x)
# [N, 64, 56, 56] -> [N, 56, 56, 64]
x = self.conv2(x)
# [N, 64, 56, 56] -> [N, 56, 56, 192]
x = self.conv3(x)
# [N, 56, 56, 192] -> [N, 28, 28, 192]
x = self.maxpool2(x)

# [N, 28, 28, 192] -> [N, 28, 28, 256]
x = self.inception3a(x)
# [N, 28, 28, 256] -> [N, 28, 28, 480]
x = self.inception3b(x)
# [N, 28, 28, 480] - > [N, 14, 14, 480]
x = self.maxpool3(x)

# [N, 14, 14, 480] -> [N, 14, 14, 512]
x = self.inception4a(x)
if self.training and self.aux_logits:
aux1 = self.aux1(x)
# [N, 14, 14, 512] -> [N, 14, 14, 512]
x = self.inception4b(x)
# [N, 14, 14, 512] -> [N, 14, 14, 512]
x = self.inception4c(x)
# [N, 14, 14, 512] -> [N, 14, 14, 528]
x = self.inception4d(x)
if self.training and self.aux_logits:
aux2 = self.aux2(x)

# [N, 14, 14, 528] -> [N, 14, 14, 832]
x = self.inception4e(x)
# [N, 14, 14, 832] - > [N, 7, 7, 832]
x = self.maxpool4(x)

# [N, 7, 7, 832] -> [N, 7, 7, 832]
x = self.inception5a(x)
# [N, 7, 7, 832] -> [N, 7, 7, 1024]
x = self.inception5b(x)

# [N, 7, 7, 1024] -> [N, 1, 1, 1024]
x = self.avgpool(x)
# [N, 1, 1, 1024] -> [N, 1024]
x = torch.flatten(x, 1)
x = self.dropout(x)
# [N, 1024] -> [N, num_classes]
x = self.fc(x)
if self.training and self.aux_logits:
return x, aux2, aux1
return x

def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)


class Inception(nn.Module):
def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
super(Inception, self).__init__()
"""
ch1x1:
ch3x3red: ch3x3reduce
ch3x3:
ch5x5red: ch5x5reduce
"""
self.branch1 = BaseConv2d(in_channels, ch1x1, kernel_size=1, stride=1)
self.branch2 = nn.Sequential(
BaseConv2d(in_channels=in_channels, out_channels=ch3x3red, kernel_size=1),
# 保证输出大小等于输入大小
BaseConv2d(in_channels=ch3x3red, out_channels=ch3x3, kernel_size=3, padding=1)
)
self.branch3 = nn.Sequential(
BaseConv2d(in_channels=in_channels, out_channels=ch5x5red, kernel_size=1),
# 保证输出大小等于输入大小
BaseConv2d(in_channels=ch5x5red, out_channels=ch5x5, kernel_size=5, padding=2)
)

self.branch4 = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
BaseConv2d(in_channels=in_channels, out_channels=pool_proj, kernel_size=1)
)

def forward(self, x):
branch1 = self.branch1(x)
branch2 = self.branch2(x)
branch3 = self.branch3(x)
branch4 = self.branch4(x)

outputs = [branch1, branch2, branch3, branch4]
# [batch, channel, h, w] torch.cat(outputs, 1)表示在channel维度上拼接
return torch.cat(outputs, 1)


class InceptionAux(nn.Module):
def __init__(self, in_channels, num_classes):
super(InceptionAux, self).__init__()
self.averagepool = nn.AvgPool2d(kernel_size=5, stride=3)
# output [batch, 128, 4, 4]
self.conv = BaseConv2d(in_channels, 128, kernel_size=1)

self.fc1 = nn.Linear(2048, 1024)
self.fc2 = nn.Linear(1024, num_classes)

def forward(self, x):
# aux1: [N, 512, 14, 14] aux2: [N, 528, 14, 14]
x = self.averagepool(x)
# aux1: [N, 512, 4, 4], aux2: [N, 528, 4, 4]
x = self.conv(x)
# [N, 128, 4, 4]
x = torch.flatten(x, 1)
x = F.dropout(x, 0.5, training=self.training)
# [N, 2048]
x = F.relu(self.fc1(x), inplace=True)
x = F.dropout(x, 0.5, training=self.training)
# [N, 2014]
x = self.fc2(x)
# [N, num_classes]
return x

class BaseConv2d(nn.Module):
def __init__(self, in_channels, out_channels, **kwargs):
super(BaseConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
self.relu = nn.ReLU(inplace=True)

def forward(self, x):
x = self.conv(x)
x = self.relu(x)
return x

# input = torch.rand((16, 3, 224, 224))
# googlenet = GoogleNet(num_classes=5, aux_logits=False, init_weights=True)
# print(googlenet)
# output = googlenet(input)
# print(output)
2、TensorFlow实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# !/usr/bin/env python
# -*-coding:utf-8 -*-
"""
# @File : model_googlenet.py
# @Time :
# @Author :0399
# @version :python 3.9
# @Software : PyCharm
# @Description:
"""
# ================【功能:】====================
from tensorflow.keras import layers, models, Model, Sequential
import tensorflow as tf


def GoogleNet(im_height=224, im_width=224, class_num=1000, aux_logits=False):
# tensorflow通道顺序 NHWC
# [None, 224, 224, 3]
input_image = layers.Input(shape=(im_height, im_width, 3), dtype="float32")
# [None, 224, 224, 3] -> [None, 112, 112, 64]
x = layers.Conv2D(filters=64, kernel_size=7, strides=2, padding="SAME",
activation="relu", name="conv2d_1")(input_image)
# [None, 112, 112, 64] -> [None, 56, 56, 64]
x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME", name="maxpool_1")(x)
# [None, 56, 56, 64] -> [None, 56, 56, 64]
x = layers.Conv2D(filters=64, kernel_size=1, strides=1, activation="relu", name="conv2d_2")(x)
# [None, 56, 56, 64] -> [None, 56, 56, 192] (56 - 3 + 2 * 1)/1 + 1 = 56
x = layers.Conv2D(filters=192, kernel_size=3, strides=1, padding="same", activation="relu", name="conv2d_3")(x)
# [None, 56, 56, 192] -> [None, 28, 28, 192]

x = layers.MaxPool2D(pool_size=3, strides=2)(x)
# [None, 28, 28, 192] -> [None, 28, 28, 256]
x = Inception(64, 96, 128, 16, 32, 32, name="inception3a")(x)
# [None, 28, 28, 256] -> [None, 28, 28, 480]
x = Inception(128, 128, 192, 32, 96, 64, name="inception3b")(x)
# [None, 28, 28, 480] -> [None, 14, 14, 480]
x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME", name="maxpool_2")(x)

# [None, 14, 14, 480] -> [None, 14, 14, 512]
x = Inception(192, 96, 208, 16, 48, 64, name="inception4a")(x)

if aux_logits:
aux1 = InceptionAux(class_num, name="aux1")(x)
# [None, 14, 14, 512] -> [None, 14, 14, 512]
x = Inception(160, 112, 224, 24, 64, 64, name="inception4b")(x)
# [None, 14, 14, 512] -> [None, 14, 14, 512]
x = Inception(128, 128, 256, 24, 64, 64, name="inception4c")(x)
# [None, 14, 14, 512] -> [None, 14, 14, 528]
x = Inception(112, 144, 288, 32, 64, 64, name="inception4d")(x)
if aux_logits:
aux2 = InceptionAux(class_num, name="aux2")(x)

# [None, 14, 14, 528] -> [None, 14, 14, 832]
x = Inception(256, 160, 320, 32, 128, 128, name="inception4e")(x)
# [None, 14, 14, 832] -> [None, 7, 7, 832]
x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME", name="maxpool_3")(x)
# [None, 7, 7, 832] -> [None, 7, 7, 832]
x = Inception(256, 160, 320, 32, 128, 128, name="inception5a")(x)
# [None, 7, 7, 832] -> [None, 7, 7, 1024]
x = Inception(384, 192, 384, 48, 128, 128, name="inception5b")(x)
# [None, 7, 7, 1024] -> [None, 1, 1, 1024]
x = layers.AvgPool2D(pool_size=7, strides=1, name="avgpool_1")(x)
# [None, 1, 1, 1024] -> [None, 1024*1*1]
x = layers.Flatten(name="output_flatten")(x)
#
x = layers.Dropout(rate=0.4, name="output_dropout")(x)
# [None, class_num]
x = layers.Dense(class_num, name="output_dense")(x)

aux3 = layers.Softmax(name="aux_3")(x)
if aux_logits:
model = models.Model(inputs=input_image, outputs=[aux1, aux2, aux3])
else:
model = models.Model(inputs=input_image, outputs=aux3)
return model


class Inception(layers.Layer):
def __init__(self, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj, **kwargs):
super(Inception, self).__init__()
self.branch1 = layers.Conv2D(filters=ch1x1, kernel_size=1, activation="relu")

self.branch2 = Sequential([
layers.Conv2D(filters=ch3x3red, kernel_size=1, activation="relu"),
layers.Conv2D(filters=ch3x3, kernel_size=3, padding="SAME", activation="relu")])

self.branch3 = Sequential([
layers.Conv2D(filters=ch5x5red, kernel_size=1, activation="relu"),
layers.Conv2D(filters=ch5x5, kernel_size=3, padding="SAME", activation="relu")])

self.branch4 = Sequential([
# caution: default stride=pool_size
layers.MaxPool2D(pool_size=3, strides=1, padding="SAME"),
layers.Conv2D(filters=pool_proj, kernel_size=1, activation="relu")
])

def call(self, input, **kwargs):
branch1 = self.branch1(input)
branch2 = self.branch2(input)
branch3 = self.branch3(input)
branch4 = self.branch4(input)
outputs = layers.concatenate([branch1, branch2, branch3, branch4])
return outputs


class InceptionAux(layers.Layer):
def __init__(self, num_classes, **kwargs):
super(InceptionAux, self).__init__()
self.avgpool = layers.AvgPool2D(pool_size=5, strides=3)
self.conv = layers.Conv2D(128, kernel_size=1, strides=1, activation="relu")

self.fc1 = layers.Dense(units=1024, activation="relu")
self.fc2 = layers.Dense(units=num_classes)
self.softmax = layers.Softmax()

def call(self, inputs, **kwargs):
# aux1 [None, 14, 14, 512] aux2 [None, 14, 14, 528]
# aux1: [None, 14, 14, 512] -> [None, 4, 4, 512] (14 - 5)/3 + 1 = 4
# axu2: [None, 14, 14, 528] -> [None, 4, 4, 528] (14 - 5)/3 + 1 = 4
x = self.avgpool(inputs)
# aux1 [None, 4, 4, 512]-> [4, 4, 512] aux2 [None, 4, 4, 528]-> [4, 4, 528]
x = self.conv(x)
#
x = layers.Flatten()(x)
x = self.fc1(x)
x = layers.Dropout(rate=0.5)(x)
x = self.fc2(x)
x = layers.Dropout(rate=0.5)(x)
x = self.softmax(x)
return x


# input = tf.random.uniform((16, 224, 224, 3))
# googlenet = GoogleNet(class_num=5, aux_logits=False)
# output = googlenet(input)
# print(output)