1×1卷积与GoogleNet

发布于:2025-04-16 ⋅ 阅读:(10) ⋅ 点赞:(0)

1×1卷积

卷积核的尺寸等于1的卷积核

1×1卷积有什么用

1. 通道混合与特征转换

  • 背景:在卷积神经网络中,输入数据通常有多个通道(例如RGB图像有3个通道,经过卷积层后通道数可能会增加)。不同通道的特征图可能包含了不同的语义信息。

  • 作用:1×1卷积可以对输入的特征图在通道维度上进行加权求和和线性组合,从而实现通道间的混合。它能够将不同通道的特征进行整合,生成新的特征表示。例如,假设输入特征图的通道数为C,经过1×1卷积后,可以将其转换为C'个通道的特征图,C'可以大于、小于或等于C。这种转换可以改变特征的表达方式,使网络能够学习到更复杂的特征关系。

2. 减少参数数量和计算量

  • 背景:在深度卷积网络中,尤其是当卷积核较大且通道数较多时,参数数量和计算量会急剧增加。例如,一个3×3卷积核在处理C个输入通道和C'个输出通道的特征图时,参数数量为3×3×C×C'。

  • 作用:1×1卷积可以作为一种高效的替代方案。它只在通道维度上进行操作,不涉及空间维度的卷积,因此参数数量大大减少。例如,1×1卷积的参数数量仅为1×1×C×C'。通过使用1×1卷积,可以在不显著降低网络性能的情况下,减少模型的参数数量和计算量,从而提高模型的训练和推理速度。

3. 构建Inception模块

  • 背景:Inception模块是一种经典的卷积神经网络结构,其核心思想是通过并行的卷积操作(不同大小的卷积核)来捕捉不同尺度的特征,但直接使用多种卷积核会导致参数数量和计算量大幅增加。

  • 作用:1×1卷积在Inception模块中扮演了关键角色。它被用作降维操作,将输入特征图的通道数减少后再进行大卷积核的卷积操作。例如,在一个Inception模块中,先通过1×1卷积将输入特征图的通道数从C减少到C'(C'<C),然后再分别使用3×3、5×5等卷积核进行卷积。这样,既能够减少计算量,又能够保留不同尺度特征的捕捉能力。

4. 构建ResNet中的瓶颈结构

  • 背景:ResNet(残差网络)通过引入残差学习解决了深层网络训练中的梯度消失和梯度爆炸问题。在ResNet中,瓶颈结构是一种常用的模块,它通过减少特征图的通道数来降低计算量。

  • 作用:1×1卷积在瓶颈结构中用于降维和升维。例如,在一个典型的瓶颈模块中,首先使用1×1卷积将输入特征图的通道数从C减少到C/4,然后使用3×3卷积进行特征提取,最后再使用1×1卷积将通道数恢复到C。这种结构大大减少了计算量,同时保留了特征的表达能力。

5. 实现注意力机制

  • 背景:注意力机制是一种模拟人类视觉注意力的机制,它可以让模型更加关注输入数据中重要的部分。

  • 作用:1×1卷积可以用于实现通道注意力机制。例如,在SENet(Squeeze-and-Excitation Networks)中,通过全局平均池化将特征图的空间信息压缩,然后使用1×1卷积对通道进行加权,生成通道注意力权重。这些权重可以用来对输入特征图的通道进行加权求和,从而增强重要通道的特征,抑制不重要的通道特征。

6. 构建轻量级网络

  • 背景:在移动设备或资源受限的环境中,需要使用轻量级的卷积神经网络来满足实时性和低功耗的要求。

  • 作用:1×1卷积由于其参数少、计算量小的特点,可以用于构建轻量级网络。例如,在MobileNet中,通过深度可分离卷积(Depthwise Separable Convolution)和1×1卷积的结合,大大减少了模型的参数数量和计算量,同时保持了较好的性能。

GoogleNet

GoogLeNet(InceptionNet)取得了2014年ImageNet分类大赛的冠军,参数量远小于VGG系列网络,而精度和速度则超越了VGG系列

基本单元Inception Module

基本单元Inception Module

使用1*1卷积进行通道调整的Inception Module

完整模型配置

GoogLeNet模型结构

训练细节

增加了不同深度的两个分支分类器辅助训练,使网络训练更加容易。辅助损失

复现GoogleNet网络

# coding:utf8

# Copyright 2023 longpeng2008. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# If you find any problem,please contact us
#
#     longpeng2008to2012@gmail.com
#
# or create issues
# =============================================================================
import torch
import torch.nn as nn


# 卷积模块
class BasicConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=(1, 1), padding=(0, 0)):
        super(BasicConv, self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
                              stride=stride, padding=padding)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x


# 额外的损失分支
class SideBranch(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(SideBranch, self).__init__()
        self.avg_pool = nn.AvgPool2d(kernel_size=5, stride=3)
        self.conv1x1 = BasicConv(in_channels=in_channels, out_channels=128, kernel_size=1)
        self.fc_1 = nn.Linear(in_features=2048, out_features=1024)
        self.relu = nn.ReLU(inplace=True)
        self.fc_2 = nn.Linear(in_features=1024, out_features=num_classes)

    def forward(self, x):
        x = self.avg_pool(x)
        x = self.conv1x1(x)
        x = torch.flatten(x, 1)
        x = self.fc_1(x)
        x = self.relu(x)
        x = torch.dropout(x, 0.7, train=True)
        x = self.fc_2(x)
        return x


# Inception模块
class InceptionBlock(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3reduce, ch3x3, ch5x5reduce, ch5x5, chpool):
        super(InceptionBlock, self).__init__()
        self.branch_1 = BasicConv(in_channels=in_channels, out_channels=ch1x1, kernel_size=1)
        self.branch_2 = nn.Sequential(
            BasicConv(in_channels=in_channels, out_channels=ch3x3reduce, kernel_size=1),
            BasicConv(in_channels=ch3x3reduce, out_channels=ch3x3, kernel_size=3, padding=1)
        )
        self.branch_3 = nn.Sequential(
            BasicConv(in_channels=in_channels, out_channels=ch5x5reduce, kernel_size=1),
            BasicConv(in_channels=ch5x5reduce, out_channels=ch5x5, kernel_size=5, padding=2)
        )
        self.branch_4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, padding=1, stride=1, ceil_mode=True),
            BasicConv(in_channels=in_channels, out_channels=chpool, kernel_size=1)
        )

    def forward(self, x):
        x_1 = self.branch_1(x)
        x_2 = self.branch_2(x)
        x_3 = self.branch_3(x)
        x_4 = self.branch_4(x)
        x = torch.cat([x_1, x_2, x_3, x_4], dim=1)
        return x


# GoogLeNet/Inception模型
class Inception_V1(nn.Module):
    def __init__(self, num_classes):
        super(Inception_V1, self).__init__()
        self.BasicConv_1 = BasicConv(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
        self.max_pool_1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)  # 把不足square_size的边保留下来,单独计算
        self.lrn_1 = nn.LocalResponseNorm(2)

        self.conv_1x1 = BasicConv(in_channels=64, out_channels=64, kernel_size=1)
        self.conv_3x3 = BasicConv(in_channels=64, out_channels=192, kernel_size=3, padding=1)
        self.lrn_2 = nn.LocalResponseNorm(2)
        self.max_pool_2 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        #   in_channels,ch1x1, ch3x3reduce,ch3x3,ch5x5reduce,ch5x5,chpool
        self.InceptionBlock_3a = InceptionBlock(192, 64, 96, 128, 16, 32, 32)
        self.InceptionBlock_3b = InceptionBlock(256, 128, 128, 192, 32, 96, 64)
        self.max_pool_3 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        self.InceptionBlock_4a = InceptionBlock(480, 192, 96, 208, 16, 48, 64)

        self.SideBranch_1 = SideBranch(512, num_classes)

        self.InceptionBlock_4b = InceptionBlock(512, 160, 112, 224, 24, 64, 64)
        self.InceptionBlock_4c = InceptionBlock(512, 128, 128, 256, 24, 64, 64)
        self.InceptionBlock_4d = InceptionBlock(512, 112, 144, 288, 32, 64, 64)

        self.SideBranch_2 = SideBranch(528, num_classes)

        self.InceptionBlock_4e = InceptionBlock(528, 256, 160, 320, 32, 128, 128)

        self.max_pool_4 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)

        self.InceptionBlock_5a = InceptionBlock(832, 256, 160, 320, 32, 128, 128)
        self.InceptionBlock_5b = InceptionBlock(832, 384, 192, 384, 48, 128, 128)

        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.flatten = nn.Flatten()
        self.fc = nn.Linear(in_features=1024, out_features=num_classes)

    def forward(self, x):
        x = self.BasicConv_1(x)
        x = self.max_pool_1(x)
        x = self.lrn_1(x)

        x = self.conv_1x1(x)
        x = self.conv_3x3(x)
        x = self.lrn_1(x)
        x = self.max_pool_2(x)

        x = self.InceptionBlock_3a(x)
        x = self.InceptionBlock_3b(x)
        x = self.max_pool_3(x)

        x = self.InceptionBlock_4a(x)

        x_1 = self.SideBranch_1(x)

        x = self.InceptionBlock_4b(x)
        x = self.InceptionBlock_4c(x)
        x = self.InceptionBlock_4d(x)

        x_2 = self.SideBranch_2(x)

        x = self.InceptionBlock_4e(x)

        x = self.max_pool_4(x)

        x = self.InceptionBlock_5a(x)
        x = self.InceptionBlock_5b(x)

        x = self.avg_pool(x)
        x = self.flatten(x)
        x = torch.dropout(x, 0.4, train=True)
        x = self.fc(x)

        x_1 = torch.softmax(x_1, dim=1)
        x_2 = torch.softmax(x_2, dim=1)
        x_3 = torch.softmax(x, dim=1)

        # output = x_3 + (x_1 + x_2) * 0.3
        return x_3, x_2, x_1


if __name__ == '__main__':
    # 创建模型,给定输入,前向传播,存储模型
    input = torch.randn([1, 3, 224, 224])
    model = Inception_V1(num_classes=1000)
    torch.save(model, 'googlenet.pth')

    x_3, x_2, x_1 = model(input)

    # 观察输出,只需要观察shape是我们想要的即可
    print(x_1.shape)
    print(x_2.shape)
    print(x_3.shape)

    torch.onnx.export(model, input, 'googlenet.onnx', opset_version=10)