GoogleNet解析

1、简介

GoogleNet在2014年由Google团队提出，斩获当年ImageNet竞赛中Classification Task分类任务第一名。

首先介绍一下该网络的亮点：

引入Inception结构(融合不同尺度的特征信息)
使用1×1的卷积核进行降维以及映射处理
添加两个辅助分类器帮助训练
丢弃全连接层，使用平均池化层(大大减少模型参数, 除去两个辅助分类器，网络大小只有vgg的1/20)

接着分析一下Inception结构：

左图是论文中提出的inception原始结构，右图是inception加上降维功能的结构。

先看左图， inception一共有4个分支，也就是说输入的特征矩阵并行通过这个4个分支得到四个输出，然后在将这个四个输出在深度维度(channel维度)进行拼接得到最终的输出(注意：为了让四个分支的输出能够在深度方向进行拼接，必须保证四个分支输出的特征矩阵高度和宽度都相同)

分支1是卷积核大小为1×1的卷积层， stride=1

分支2是卷积核大小为3×3的卷积层， stride=1， padding=1(保证输出特征矩阵的高和宽和输入特征矩阵相等)

分支3是卷积核大小为5×5的卷积层， stride=1， padding=2(保证输出特征矩阵的高和宽和输入特征矩阵相等)

分支4是池化核大小为3×3的最大池化下采样层， stride=1， padding=1(保证输出特征矩阵的高和宽和输入特征矩阵相等)

在看右图，对比左图，就是在分支2,3,4上加入了卷积核为1×1的卷积层，目的是为了降维，减少模型训练参数，减少计算量。

下面看一下1×1卷积核如何减少模型参数的，同样是对一个深度为512的特征矩阵使用64个大小为5×5的卷积核进行卷积，不使用1×1卷积核进行降维一共需要819200个参数，如果使用1×1卷积核进行降维，一共需要50688个参数，明显减少了很多。

每个卷积核的参数如何确定呢，下面是原论文中给出的参数列表，对于我们搭建的inception模块，所需要使用的参数有#1x1, #3x3reduce, #3x3, #5x5reduce, #5x5, poolproj这6个参数，分别对应着所需要的卷积核的个数。

下面将inception模块所用到的参数信息标注在每个分支上， #1x1对应着分支上1x1的卷积核个数， #3x3reduce对应着分支2上1x1的卷积核个数， #3x3对应着分支2上3x3的卷积核个数， #5x5reduce对应着分支3上1x1的卷积核个数， #5x5对应着分支3上5x5的卷积核个数，poolproj对应着分支4上1x1的卷积核个数。

接下来看辅助分类器结构，网络中的两个辅助分类器结构一模一样的，如下图所示：

这两个辅助分类器的输入分别来自Inception(4a)和inception(4d)。

辅助分类器的第一层是一个平均池化下采样层，池化核大小为5x5， stride=3，

第二层是卷积层，卷积核大小为1x1, stride=1, 卷积核个数是128

第三层是全连接层，节点个数为1024

第四层是全连接层，节点个数为1000(对应分类任务中分类类别数)

下面给出了GoogleNet网络结构图

2、代码实现

1、pytorch实现

# !/usr/bin/env python
# -*-coding:utf-8 -*-
"""
# @File       : model_googlenet.py
# @Time       ：
# @Author     ：
# @version    ：python 3.9
# @Software   : PyCharm
# @Description：
"""
# ================【功能：】====================
import torch.nn as nn
import torch
import torch.nn.functional as F
# pytorch官方参考代码：https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py

class GoogleNet(nn.Module):
    def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
        super(GoogleNet, self).__init__()
        self.aux_logits = aux_logits
        # (224 - 7 + 2 * 3)/2 + 1 = 112 (3, 224, 224) -> (64, 112, 112)
        self.conv1 = BaseConv2d(in_channels=3, out_channels=64, kernel_size=7, padding=1, stride=2)
        # (64, 112, 112) -> (64, 56, 56) 看一下这里的参数是如何计算的，
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        # (56 - 3 + 2 * 0)/1 + 1 = 56 (64, 56, 56) -> (64, 56, 56)
        self.conv2 = BaseConv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1)
        # (56 - 3 + 2 * 0)/1 + 1 = 56 (64, 56, 56) -> (192, 56, 56)
        self.conv3 = BaseConv2d(in_channels=64, out_channels=192, kernel_size=3, stride=1)

        # (192, 56, 56) -> (192. 28. 28)
        self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2)

        # Inception3a 具体参数可查看表
        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
        self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)

        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)


        if self.aux_logits:
            # Inception4b  512
            self.aux1 = InceptionAux(512, num_classes)
            # Inception4e 528
            self.aux2 = InceptionAux(528, num_classes)
        # 指定输出固定尺寸
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
        if init_weights:
            self._initialize_weights()


    def forward(self, x):
        # [N, 3, 224, 224] -> [N, 64, 112, 112]
        x = self.conv1(x)
        # [N, 64, 112, 112] -> [N, 64, 56, 56]
        x = self.maxpool1(x)
        # [N, 64, 56, 56] -> [N, 56, 56, 64]
        x = self.conv2(x)
        # [N, 64, 56, 56] -> [N, 56, 56, 192]
        x = self.conv3(x)
        # [N, 56, 56, 192] -> [N, 28, 28, 192]
        x = self.maxpool2(x)

        # [N, 28, 28, 192] -> [N, 28, 28, 256]
        x = self.inception3a(x)
        # [N, 28, 28, 256] -> [N, 28, 28, 480]
        x = self.inception3b(x)
        # [N, 28, 28, 480] - > [N, 14, 14, 480]
        x = self.maxpool3(x)

        #  [N, 14, 14, 480] -> [N, 14, 14, 512]
        x = self.inception4a(x)
        if self.training and self.aux_logits:
            aux1 = self.aux1(x)
        #   [N, 14, 14, 512] -> [N, 14, 14, 512]
        x = self.inception4b(x)
        #   [N, 14, 14, 512] -> [N, 14, 14, 512]
        x = self.inception4c(x)
        #   [N, 14, 14, 512] -> [N, 14, 14, 528]
        x = self.inception4d(x)
        if self.training and self.aux_logits:
            aux2 = self.aux2(x)

        #   [N, 14, 14, 528] -> [N, 14, 14, 832]
        x = self.inception4e(x)
        # [N, 14, 14, 832] - > [N, 7, 7, 832]
        x = self.maxpool4(x)

        #  [N, 7, 7, 832] -> [N, 7, 7, 832]
        x = self.inception5a(x)
        #  [N, 7, 7, 832] -> [N, 7, 7, 1024]
        x = self.inception5b(x)

        #  [N, 7, 7, 1024] -> [N, 1, 1, 1024]
        x = self.avgpool(x)
        #  [N, 1, 1, 1024] -> [N, 1024]
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        # [N, 1024] -> [N, num_classes]
        x = self.fc(x)
        if self.training and self.aux_logits:
            return x, aux2, aux1
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
                elif isinstance(m, nn.Linear):
                    nn.init.normal_(m.weight, 0, 0.01)
                    nn.init.constant_(m.bias, 0)


class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()
        """
        ch1x1: 
        ch3x3red: ch3x3reduce
        ch3x3:
        ch5x5red: ch5x5reduce
        """
        self.branch1 = BaseConv2d(in_channels, ch1x1, kernel_size=1, stride=1)
        self.branch2 = nn.Sequential(
            BaseConv2d(in_channels=in_channels, out_channels=ch3x3red, kernel_size=1),
            # 保证输出大小等于输入大小
            BaseConv2d(in_channels=ch3x3red, out_channels=ch3x3, kernel_size=3, padding=1)
        )
        self.branch3 = nn.Sequential(
            BaseConv2d(in_channels=in_channels, out_channels=ch5x5red, kernel_size=1),
            # 保证输出大小等于输入大小
            BaseConv2d(in_channels=ch5x5red, out_channels=ch5x5, kernel_size=5, padding=2)
        )

        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            BaseConv2d(in_channels=in_channels, out_channels=pool_proj, kernel_size=1)
        )

    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        outputs = [branch1, branch2, branch3, branch4]
        # [batch, channel, h, w] torch.cat(outputs, 1)表示在channel维度上拼接
        return torch.cat(outputs, 1)


class InceptionAux(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(InceptionAux, self).__init__()
        self.averagepool = nn.AvgPool2d(kernel_size=5, stride=3)
        # output [batch, 128, 4, 4]
        self.conv = BaseConv2d(in_channels, 128, kernel_size=1)

        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    def forward(self, x):
        # aux1: [N, 512, 14, 14] aux2: [N, 528, 14, 14]
        x = self.averagepool(x)
        # aux1: [N, 512, 4, 4], aux2: [N, 528, 4, 4]
        x = self.conv(x)
        # [N, 128, 4, 4]
        x = torch.flatten(x, 1)
        x = F.dropout(x, 0.5, training=self.training)
        # [N, 2048]
        x = F.relu(self.fc1(x), inplace=True)
        x = F.dropout(x, 0.5, training=self.training)
        # [N, 2014]
        x = self.fc2(x)
        # [N, num_classes]
        return x

class BaseConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BaseConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x

# input = torch.rand((16, 3, 224, 224))
# googlenet = GoogleNet(num_classes=5, aux_logits=False, init_weights=True)
# print(googlenet)
# output = googlenet(input)
# print(output)

2、TensorFlow实现

# !/usr/bin/env python
# -*-coding:utf-8 -*-
"""
# @File       : model_googlenet.py
# @Time       ：
# @Author     ：0399
# @version    ：python 3.9
# @Software   : PyCharm
# @Description：
"""
# ================【功能：】====================
from tensorflow.keras import layers, models, Model, Sequential
import tensorflow as tf


def GoogleNet(im_height=224, im_width=224, class_num=1000, aux_logits=False):
    # tensorflow通道顺序 NHWC
    # [None, 224, 224, 3]
    input_image = layers.Input(shape=(im_height, im_width, 3), dtype="float32")
    # [None, 224, 224, 3] -> [None, 112, 112, 64]
    x = layers.Conv2D(filters=64, kernel_size=7, strides=2, padding="SAME",
                      activation="relu", name="conv2d_1")(input_image)
    # [None, 112, 112, 64] -> [None, 56, 56, 64]
    x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME", name="maxpool_1")(x)
    #  [None, 56, 56, 64] ->  [None, 56, 56, 64]
    x = layers.Conv2D(filters=64, kernel_size=1, strides=1, activation="relu", name="conv2d_2")(x)
    #  [None, 56, 56, 64] ->  [None, 56, 56, 192]  (56 - 3 + 2 * 1)/1 + 1 = 56
    x = layers.Conv2D(filters=192, kernel_size=3, strides=1, padding="same", activation="relu", name="conv2d_3")(x)
    #  [None, 56, 56, 192] ->  [None, 28, 28, 192]

    x = layers.MaxPool2D(pool_size=3, strides=2)(x)
    #  [None, 28, 28, 192] ->  [None, 28, 28, 256]
    x = Inception(64, 96, 128, 16, 32, 32, name="inception3a")(x)
    # [None, 28, 28, 256] -> [None, 28, 28, 480]
    x = Inception(128, 128, 192, 32, 96, 64, name="inception3b")(x)
    # [None, 28, 28, 480] -> [None, 14, 14, 480]
    x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME", name="maxpool_2")(x)

    # [None, 14, 14, 480] -> [None, 14, 14, 512]
    x = Inception(192, 96, 208, 16, 48, 64, name="inception4a")(x)

    if aux_logits:
        aux1 = InceptionAux(class_num, name="aux1")(x)
    # [None, 14, 14, 512] -> [None, 14, 14, 512]
    x = Inception(160, 112, 224, 24, 64, 64, name="inception4b")(x)
    # [None, 14, 14, 512] -> [None, 14, 14, 512]
    x = Inception(128, 128, 256, 24, 64, 64, name="inception4c")(x)
    # [None, 14, 14, 512] -> [None, 14, 14, 528]
    x = Inception(112, 144, 288, 32, 64, 64, name="inception4d")(x)
    if aux_logits:
        aux2 = InceptionAux(class_num, name="aux2")(x)

    # [None, 14, 14, 528] -> [None, 14, 14, 832]
    x = Inception(256, 160, 320, 32, 128, 128, name="inception4e")(x)
    # [None, 14, 14, 832] -> [None, 7, 7, 832]
    x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME", name="maxpool_3")(x)
    # [None, 7, 7, 832] -> [None, 7, 7, 832]
    x = Inception(256, 160, 320, 32, 128, 128, name="inception5a")(x)
    # [None, 7, 7, 832] -> [None, 7, 7, 1024]
    x = Inception(384, 192, 384, 48, 128, 128, name="inception5b")(x)
    # [None, 7, 7, 1024] -> [None, 1, 1, 1024]
    x = layers.AvgPool2D(pool_size=7, strides=1, name="avgpool_1")(x)
    # [None, 1, 1, 1024] -> [None, 1024*1*1]
    x = layers.Flatten(name="output_flatten")(x)
    #
    x = layers.Dropout(rate=0.4, name="output_dropout")(x)
    # [None, class_num]
    x = layers.Dense(class_num, name="output_dense")(x)

    aux3 = layers.Softmax(name="aux_3")(x)
    if aux_logits:
        model = models.Model(inputs=input_image, outputs=[aux1, aux2, aux3])
    else:
        model = models.Model(inputs=input_image, outputs=aux3)
    return model


class Inception(layers.Layer):
    def __init__(self, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj, **kwargs):
        super(Inception, self).__init__()
        self.branch1 = layers.Conv2D(filters=ch1x1, kernel_size=1, activation="relu")

        self.branch2 = Sequential([
            layers.Conv2D(filters=ch3x3red, kernel_size=1, activation="relu"),
            layers.Conv2D(filters=ch3x3, kernel_size=3, padding="SAME", activation="relu")])

        self.branch3 = Sequential([
            layers.Conv2D(filters=ch5x5red, kernel_size=1, activation="relu"),
            layers.Conv2D(filters=ch5x5, kernel_size=3, padding="SAME", activation="relu")])

        self.branch4 = Sequential([
            # caution: default stride=pool_size
            layers.MaxPool2D(pool_size=3, strides=1, padding="SAME"),
            layers.Conv2D(filters=pool_proj, kernel_size=1, activation="relu")
        ])

    def call(self, input, **kwargs):
        branch1 = self.branch1(input)
        branch2 = self.branch2(input)
        branch3 = self.branch3(input)
        branch4 = self.branch4(input)
        outputs = layers.concatenate([branch1, branch2, branch3, branch4])
        return outputs


class InceptionAux(layers.Layer):
    def __init__(self, num_classes, **kwargs):
        super(InceptionAux, self).__init__()
        self.avgpool = layers.AvgPool2D(pool_size=5, strides=3)
        self.conv = layers.Conv2D(128, kernel_size=1, strides=1, activation="relu")

        self.fc1 = layers.Dense(units=1024, activation="relu")
        self.fc2 = layers.Dense(units=num_classes)
        self.softmax = layers.Softmax()

    def call(self, inputs, **kwargs):
        # aux1 [None, 14, 14, 512] aux2 [None, 14, 14, 528]
        # aux1: [None, 14, 14, 512] -> [None, 4, 4, 512]  (14 - 5)/3 + 1 = 4
        # axu2: [None, 14, 14, 528] -> [None, 4, 4, 528]  (14 - 5)/3 + 1 = 4
        x = self.avgpool(inputs)
        # aux1 [None, 4, 4, 512]-> [4, 4, 512]  aux2 [None, 4, 4, 528]-> [4, 4, 528]
        x = self.conv(x)
        #
        x = layers.Flatten()(x)
        x = self.fc1(x)
        x = layers.Dropout(rate=0.5)(x)
        x = self.fc2(x)
        x = layers.Dropout(rate=0.5)(x)
        x = self.softmax(x)
        return x


# input = tf.random.uniform((16, 224, 224, 3))
# googlenet = GoogleNet(class_num=5, aux_logits=False)
# output = googlenet(input)
# print(output)