ResNet

1、简介

Resnet由微软实验室于2015年提出，获得当年ImageNet竞赛分类任务第一名，目标检测第一名。获得COCO数据集目标检测第一名，图像分割第一名。

下图是ResNet34的简图。

网络的亮点

超深的网络结构(突破1000层)
提出residual模块(残差结构)
使用batch normalization加速训练(放弃使用dropout)

在ResNet网络提出之前，传统的卷积神经网络都是通过一系列卷积层与下采样层进行堆叠得到的，但是当网络堆叠到一定网络深度时，就会出现如下两个问题：

梯度消失或梯度爆炸
退化问题(degradation problem)

在ResNet论文中说通过数据的预处理以及在网络中使用BN(batch normalization)层能够解决梯度消失或者梯度爆炸问题。但是对于退化问题（随着网络层数的加深，效果还会变差）并无很好的解决方法。

所以ResNet论文提出了residual结构(残差结构)来减轻退化问题。下图是使用residual结构的卷积网络，可以看到随着网络的不断加深，效果并没有变差，反而变得更好了。

下面来分析一下论文中的残差结构(residual)。下图是论文中给出的两种残差结构，左边的残差结构是针对层数较少的网络，例如ResNet18层和ResNet34层网络，右边是针对网络层数较多的网络，例如ResNet101, ResNet152等。

为什么深层网络要用右边的残差结构，因为右边的残差结构能够减少网络参数与运算量。同样输入一个channel为256的特征矩阵，如果使用左侧的残差结构大约需要1170648个参数，但如果使用右侧的残差结构只需要69632个参数，因此在搭建深层网络时，使用右侧的残差结构更合适。

先对左边的残差结构(针对ResNet18/34)进行分析，如下图所示，该残差结构的主要分支是由两层3×3的卷积层组成，而残差结构右侧的连接线是shortcut分支也叫做捷径分支(注意，为了让主分支上的输出矩阵能够与捷径分支上的输出矩阵进行相加，必须保证这两个输出特征矩阵有相同的shape)。

仔细观察ResNet34网络结构，可以发现图中有一些虚线的残差结构，在原论文中作者只是简单说这些虚线残差结构具有降维的作用。下图右侧给出了详细的虚线残差结构，注意每个卷积层的步距stride以及捷径分支上的卷积核的个数(与主分支上的卷积核个数相同)

接着再来分析针对ResNet50/101/152的残差结构，如下图所示，在该残差结构中，主分支使用了三个卷积层，第一个是1x1的卷积层，用来压缩channel维度，第二个是3x3的卷积层，第三个是1x1的卷积层用来还原channel维度（注意主分支上第一层卷积层和第二层卷积层所使用的卷积核个数是相同的，第三层是第一层的4倍)。该残差结构所对应的虚线残差结构如右侧图所示，同样在捷径分支上有一个1x1的卷积层，它的卷积核个数与主分支上的第三层卷积核个数相同，注意每个卷积层的步距。(注意：原论文中，在下图右侧虚线残差结构的主分支上，第一个1×1卷积层的步距是2，第二个3x3卷积层的步距是1。但是在pytorch官方实现过程中第一个1x1卷积层的步距是1，第二个3x3卷积层步距是2，这样做的好处是能够在top1上提升大概0.5%的准确率。可参考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch)

pytorch官方说明

下表是原论文给出的不同深度的ResNet网络结构配置，注意表中的残差结构给出了主分支上卷积核的大小与卷积核个数，表中xN表示该残差结构重复N次

那到底哪些残差结构是虚线残差结构呢？

对于ResNet18/34/50/101/151, 表中conv3_x, conv4_x, conv5_x所对应的一系列残差结构的第一层残差结构都是虚线残差结构。引文这一系列残差结构的第一层都有调整输入特征矩阵shape的使命(将特征矩阵的高和宽缩减为原来的一半，将深度channel调整成下一层残差结构所需的channel)。

为了方便理解，下面给出了ResNet34的网络结构图，图中简单标注了一些信息。

对于ResNet50/101/152，其实在conv2_x所对应的一系列残差结构的第一层也是虚线残差结构，因为它需要调整输入特征矩阵的channel，根据表格可知，通过3x3的maxpool之后输出的特征矩阵的shape应该是[56, 56, 64],但是conv2_x所对应的一系列残差结构中实线残差结构的期望输入特征矩阵的shape是[56, 56, 256], (因为这样才能保证输入输出特征矩阵shape相同，才能将捷径分支的输出与主分支的输出进行相加)。所以第一层残差结构需要将shape从[56, 56, 64]调整为->[56, 56, 256]。注意，这里只调整channel维度，高和宽不变(而conv3_x, conv4_x, conv5_x所对饮的一系列残差结构的第一层虚线残差结构不仅要调整channel，还要将高度和宽度缩减为原来的一半。）

2、实现

1、pytorch实现

# !/usr/bin/env python
# -*-coding:utf-8 -*-
"""
# @File       : model_resnet.py
# @Time       ：
# @Author     ：
# @version    ：python 3.9
# @Software   : PyCharm
# @Description：
"""
# ================【功能：】====================
import torch.nn as nn
import torch


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    """
    注意：在原论文中，在虚线残差结构的主分支上， 第一个1x1卷积层的strid是2，第二个3x3卷积层的stride是1。
    但是在pytorch官方实现过程中是第一个1x1卷积层的stride1， 第二个3x3卷积层的stride是2
    这么做的好处是能够在top1上提升大约0.5%的准确率
    """
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None,
                groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()
        width = int(out_channel * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
                               groups=groups, kernel_size=1, stride=1,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(width)
        # -----------------------------------------------
        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width,
                               groups=groups, kernel_size=3, stride=stride,
                               bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(width)
        # -----------------------------------------------
        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel * self.expansion,
                               kernel_size=1, stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):
    def __init__(self, block,
                 block_num,
                 num_classes=1000,
                 include_top=True,
                 groups=1,
                 width_per_group=64):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64

        self.groups = groups
        self.width_per_group = width_per_group

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.in_channel,
                               kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self._make_layer(block, 64, block_num[0])
        self.layer2 = self._make_layer(block, 128, block_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, block_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, block_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion))
        layers = []
        layers.append(block(self.in_channel,
                            channel,
                            downsample=downsample,
                            stride=stride,
                            groups=self.groups,
                            width_per_group=self.width_per_group))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel,
                                channel,
                                groups=self.groups,
                                width_per_group=self.width_per_group))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)
        return x


def resnet34(num_classe=1000, include_top=True):
    # https://download.pytorch.org/models/resnet34-333f7ec4.pth
    return ResNet(BasicBlock, [3, 4, 6, 3],
                  num_classes=num_classe,
                  include_top=include_top)


def resnet50(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet50-19c8e357.pth
    return ResNet(Bottleneck, [3, 4, 6, 3],
                  num_classes=num_classes,
                  include_top=include_top)


def resnet101(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    return ResNet(Bottleneck, [3, 4, 23, 3],
                  num_classes=num_classes,
                  include_top=include_top)


def resnet50_32x4d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
    groups = 32
    width_per_group = 4
    return ResNet(Bottleneck, [3, 4, 6, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)


def resnet50_32x8d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth
    groups = 32
    width_per_group = 8
    return ResNet(Bottleneck, [3, 4, 23, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)


input = torch.rand((4, 3, 224, 224))
resnet = resnet101(5)
out = resnet(input)
print(out)

2、TensorFlow实现

# !/usr/bin/env python
# -*-coding:utf-8 -*-
"""
# @File       : model_resnet.py
# @Time       ：
# @Author     ：0399
# @version    ：python 3.9
# @Software   : PyCharm
# @Description：
"""
# ================【功能：】====================
from tensorflow.keras import layers, Model, Sequential


class BasicBlock(layers.Layer):
    expansion = 1

    def __init__(self, out_channel, strides=1, downsample=None, **kwargs):
        super(BasicBlock, self).__init__()
        self.conv1 = layers.Conv2D(out_channel, kernel_size=3, strides=strides,
                                   padding="SAME", use_bias=False)
        self.bn1 = layers.BatchNormalization(momentum=0.9, epsilon=1e-5)
        # ----------------------------------------
        self.conv2 = layers.Conv2D(out_channel, kernel_size=3, strides=1,
                                   padding="SAME", use_bias=False)
        self.bn2 = layers.BatchNormalization(momentum=0.9, epsilon=1e-5)
        # ----------------------------------------
        self.downsample = downsample
        self.relu = layers.ReLU()
        self.add = layers.Add()

    def call(self, inputs, training=False):
        identity = inputs
        if self.downsample is not None:
            identity = self.downsample(inputs)

        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.bn2(x, training=training)

        x = self.add([x, identity])
        x = self.relu(x)
        return x


class Bottlenect(layers.Layer):
    expansion = 4

    def __init__(self, out_channel, strides=1, downsample=None, **kwargs):
        super(Bottlenect, self).__init__()
        self.conv1 = layers.Conv2D(out_channel, kernel_size=1, use_bias=False, name="conv1")
        self.bn1 = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name="conv1/BatchNorm")
        # ------------------------------------
        self.conv2 = layers.Conv2D(out_channel, kernel_size=3, use_bias=False,
                                   strides=strides, padding="SAME", name="conv2")
        self.bn2 = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name="conv2/BatchNorm")
        # ------------------------------------
        self.conv3 = layers.Conv2D(out_channel * self.expansion, kernel_size=1, use_bias=False, name="conv3")
        self.bn3 = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name="conv3/BatchNorm")
        # ------------------------------------
        self.relu = layers.ReLU()
        self.downsample = downsample
        self.add = layers.Add()

    def call(self, inputs, training=False):
        identity = inputs
        if self.downsample is not None:
            identity = self.downsample(inputs)

        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.bn2(x, training=training)
        x = self.relu(x)

        x = self.conv3(x)
        x = self.bn3(x, training=training)
        x = self.add([identity, x])
        x = self.relu(x)
        return x


def _make_layer(block, in_channel, channel, block_num, name, strides=1):
    downsample = None
    if strides != 1 or in_channel != channel * block.expansion:
        downsample = Sequential([
            layers.Conv2D(channel * block.expansion, kernel_size=1, strides=strides,
                          use_bias=False, name="conv1"),
            layers.BatchNormalization(momentum=0.9, epsilon=1.001e-5, name="BatchNorm")
        ], name="shortcut")
    layers_list = []
    layers_list.append(block(channel, downsample=downsample, strides=strides, name="unit_1"))
    for index in range(1, block_num):
        layers_list.append(block(channel, name="unit_" + str(index + 1)))
    return Sequential(layers_list, name=name)


def _resnet(block, block_num, im_width, im_height, num_classes=1000, include_top=True):
    # TensorFlow中tensor的通道顺序 NHWC
    input_image = layers.Input(shape=(im_height, im_width, 3), dtype="float32")
    x = layers.Conv2D(filters=64, kernel_size=7, strides=2, padding="SAME",
                      use_bias=False, name="conv1")(input_image)
    x = layers.BatchNormalization(momentum=0.9, epsilon=1e-5, name="conv1/BatchNorm")(x)
    x = layers.ReLU()(x)
    x = layers.MaxPool2D(pool_size=3, strides=2, padding="SAME")(x)

    x = _make_layer(block, x.shape[-1], 64, block_num[0], name="block1")(x)
    x = _make_layer(block, x.shape[-1], 128, block_num[1], strides=2, name="block2")(x)
    x = _make_layer(block, x.shape[-1], 256, block_num[2], strides=2, name="block3")(x)
    x = _make_layer(block, x.shape[-1], 512, block_num[3], strides=2, name="block4")(x)

    if include_top:
        x = layers.GlobalAvgPool2D()(x)
        x = layers.Dense(num_classes, name="logits")(x)
        predict = layers.Softmax()(x)
    else:
        predict = x
    model = Model(inputs=input_image, outputs=predict)
    return model


def resnet34(im_width=224, im_height=224, num_classes=1000, include_top=True):
    return _resnet(BasicBlock, [3, 4, 6, 3], im_height, im_width, num_classes, include_top)


def resnet50(im_width=224, im_height=224, num_classes=1000, include_top=True):
    return _resnet(Bottlenect, [3, 4, 6, 3], im_height, im_width, num_classes, include_top)


def resnet101(im_width=224, im_height=224, num_classes=1000, include_top=True):
    return _resnet(Bottlenect, [3, 4, 23, 3], im_height, im_width, num_classes, include_top)


import tensorflow as tf
input = tf.random.uniform((8, 224, 224, 3))
model = resnet34()
print(model(input))