神经网络 -- 卷积层

发布于:2025-03-16 ⋅ 阅读:(19) ⋅ 点赞:(0)

神经网络之卷积层的介绍

这里主要介绍 Conv2d:
在最简单的情况下,
输入尺寸为 ( N , C i n , H , W ) (N, C_{in}, H, W) (N,Cin,H,W)
输出尺寸为 ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) (N,Cout,Hout,Wout)
的层的输出值可以精确描述为:
out ( N i , C o u t j ) = bias ( C o u t j ) + ∑ k = 0 C i n − 1 weight ( C o u t j , k ) ⋆ input ( N i , k ) \text{out}(N_i, C_{out_j}) = \text{bias}(C_{out_j}) + \sum_{k=0}^{C_{in}-1} \text{weight}(C_{out_j}, k) \star \text{input}(N_i, k) out(Ni,Coutj)=bias(Coutj)+k=0Cin1weight(Coutj,k)input(Ni,k)这里, ⋆ \star 是有效的2D互相关操作符, N N N 是批量大小, C C C 表示通道数, H H H 是输入平面的高度(以像素为单位), W W W 是宽度(以像素为单位)。
Conv2d的传参如下:

classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, 
dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

下面详细解释各个参数的含义:

  • in_channels (int) – 输入图像的通道数

  • out_channels (int) – 卷积产生的输出通道数

  • kernel_size (int 或 tuple) – 卷积核的大小

  • stride (int 或 tuple, 可选) – 卷积的步长。默认值:1

  • padding (int, tuple 或 str, 可选) – 添加到输入四边的填充。默认值:0

  • dilation (int 或 tuple, 可选) – 卷积核元素之间的间距。默认值:1

  • groups (int, 可选) – 从输入通道到输出通道的阻塞连接数。默认值:1

  • bias (bool, 可选) – 如果为True,则在输出中添加一个可学习的偏置。默认值:True

  • padding_mode (str, 可选) – 填充模式,可选值为 ‘zeros’, ‘reflect’, ‘replicate’ 或 ‘circular’。默认值:‘zeros’

输出的形状是啥样的呢,有以下公式进行确定:
假定输入:

  • ( N , C i n , H i n , W i n ) (N, C_{in}, H_{in}, W_{in}) (N,Cin,Hin,Win) ( C i n , H i n , W i n ) (C_{in}, H_{in}, W_{in}) (Cin,Hin,Win)

则输出:

  • ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) (N,Cout,Hout,Wout) ( C o u t , H o u t , W o u t ) (C_{out}, H_{out}, W_{out}) (Cout,Hout,Wout),其中

H o u t = ⌊ H i n + 2 × padding [ 0 ] − dilation [ 0 ] × ( kernel_size [ 0 ] − 1 ) − 1 stride [ 0 ] + 1 ⌋ H_{out} = \left\lfloor \frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1 \right\rfloor Hout=stride[0]Hin+2×padding[0]dilation[0]×(kernel_size[0]1)1+1

W o u t = ⌊ W i n + 2 × padding [ 1 ] − dilation [ 1 ] × ( kernel_size [ 1 ] − 1 ) − 1 stride [ 1 ] + 1 ⌋ W_{out} = \left\lfloor \frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1 \right\rfloor Wout=stride[1]Win+2×padding[1]dilation[1]×(kernel_size[1]1)1+1

关于其中的 weightbias,其实是通过采样得到的,具体采样方式如下:

  • weight (Tensor) – 模块的可学习权重,形状为 ( out_channels , in_channels groups , kernel_size [ 0 ] , kernel_size [ 1 ] ) (\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size}[0], \text{kernel\_size}[1]) (out_channels,groupsin_channels,kernel_size[0],kernel_size[1])。这些权重的值从 U ( − k , k ) U(-k, k) U(k,k) 中采样,其中 k = g r o u p s C i n × ∏ i = 0 1 kernel_size [ i ] k = \sqrt{\frac{groups}{C_{in} \times \prod_{i=0}^{1} \text{kernel\_size}[i]}} k=Cin×i=01kernel_size[i]groups

  • bias (Tensor) – 模块的可学习偏置,形状为 ( out_channels ) (\text{out\_channels}) (out_channels)。如果 bias 为 True,则这些权重的值从 U ( − k , k ) U(-k, k) U(k,k) 中采样,其中 k = g r o u p s C i n × ∏ i = 0 1 kernel_size [ i ] k = \sqrt{\frac{groups}{C_{in} \times \prod_{i=0}^{1} \text{kernel\_size}[i]}} k=Cin×i=01kernel_size[i]groups

下面结合之前下载的FashionMNIST数据集进行卷积,然后进行输出观察结果:

需要注意的是,由于FashionMNIST数据集是黑白照,也就是通道数为1,因此我这里定义的James 类中的 conv1 输入通道就是1,要是换用其他的数据集,需要根据图片的通道尺寸进行确定,可以使用 print(img.shape) 进行查看。

dataset = torchvision.datasets.FashionMNIST("./data", train=False, transform=torchvision.transforms.ToTensor(),
                                            download=True)

dataloader = DataLoader(dataset, batch_size=64, shuffle=False)  # 定义批量大小为 64


class James(nn.Module):
    def __init__(self):
        super(James, self).__init__()
        self.conv1 = Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        return x


def run_error():
    james = James()
    writer = SummaryWriter("logs")
    epoch = 0
    step = 0
    for data in dataloader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , input_size = {}".format(epoch, imgs.shape))
        # Epoch : 0 , input_size = torch.Size([64, 1, 28, 28])
        print("Epoch : {} , output_size = {}".format(epoch, output.shape))
        # Epoch : 0 , output_size = torch.Size([64, 3, 26, 26])

        writer.add_images("in_e", imgs, step)
        writer.add_images("out_e", output, step)
        writer.add_images("out_e_plus", output, step)
        # 这里进行了两次卷积后的输出,尽管卷积操作相同,但我们发现在 tensorboard 中展示的图片颜色不一致
        # 查阅资料可知:卷积层生成的特征图通常包含负值或大于1的值,但图像像素值需要在0到1之间或者0到255之间的整数值。
        # 此外,这些特征图并不直接对应于可视化的RGB颜色空间,这导致了你观察到的颜色不一致现象。
        epoch = epoch + 1
        step = step + 1

    writer.close()


def run_right():
    james = James()
    writer = SummaryWriter("logs")
    epoch = 0
    step = 0
    for data in dataloader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , input_size = {}".format(epoch, imgs.shape))
        # Epoch : 0 , input_size = torch.Size([64, 1, 28, 28])
        print("Epoch : {} , output_size = {}".format(epoch, output.shape))
        # Epoch : 0 , output_size = torch.Size([64, 3, 26, 26])

        writer.add_images("in_r", imgs, step)

        # 我希望将其变成 torch.Size([64, 3, 26, 26]) -> torch.Size([xxx, 1, 26, 26]),
        # 因为 FashionMNIST 数据集是黑白照片集,其通道数只有1,而不是传统的RGB三通道
        # 第一个数不知道是多少的时候直接写 -1 ,程序会根据后面进行计算
        output = torch.reshape(output, (-1, 1, 26, 26))
        print("Epoch : {} , reshape_output_size = {}".format(epoch, output.shape))
        writer.add_images("out_r", output, step)

        epoch = epoch + 1
        step = step + 1

    writer.close()

详细的解释在代码中有体现,这里就不进行赘述,主要需要说明的是:

  • 代码中进行了两次卷积后的输出,尽管卷积操作相同,但在 tensorboard 中展示的图片颜色不一致,因为这些特征图并不直接对应于可视化的RGB颜色空间
  • 要是不知道某一个位置的尺寸大小具体是多少,可以将其设置成 -1 ,这样程序回自动计算该处的数值。即 t o r c h . S i z e ( [ 64 , 3 , 26 , 26 ] ) − > t o r c h . S i z e ( [ − 1 , 1 , 26 , 26 ] ) torch.Size([64, 3, 26, 26]) -> torch.Size([-1, 1, 26, 26]) torch.Size([64,3,26,26])>torch.Size([1,1,26,26])

部分图片展示如下:
在这里插入图片描述

当然,对于使用下面的代码下载的数据集:

dataset = torchvision.datasets.XXX("./data", train=False, transform=torchvision.transforms.ToTensor(),
                                            download=True)

(这里 XXX 表示你需要下载的数据集),确实能解决大部分的问题。但是对于我们想自己创建自己的数据集来说,这样使用上述命令就无效了,因为网上没有这类的数据集,因此我们希望编写相关的代码定义自己的数据集,并进行上述类似的卷积操作。

下面编写代码,使用我们之前用到的数据集----蚂蚁蜜蜂数据集(下载链接),定义属于自己的数据集,并进行卷积,输出图片,看看效果:
具体代码如下:

class James_bees_ants(nn.Module):
    def __init__(self):
        super(James_bees_ants, self).__init__()
        self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        return x


class CustomAntsAndBeesImageDataset(Dataset):
    def __init__(self, root_dir, target_dir, transform=None):
        self.img_dir = os.path.join(root_dir, target_dir)
        self.transform = transform
        # 获取目录中的所有图像文件名
        self.img_names = [f for f in os.listdir(self.img_dir) if
                          f.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif'))]

    def __len__(self):
        return len(self.img_names)

    def __getitem__(self, idx):
        img_name = self.img_names[idx]
        img_path = os.path.join(self.img_dir, img_name)
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, img_name


def run_ants_and_bees():
    root_dir = "data/hymenoptera_data/train"
    ants_target_dir = "ants_image"
    bees_target_dir = "bees_image"

    # img_ants_dir = os.path.join(root_dir, ants_target_dir)
    # img_bees_dir = os.path.join(root_dir, bees_target_dir)

    trans_img = transforms.Compose([
        transforms.Resize((256, 256)),  # 确保所有图像都调整为256x256
        transforms.ToTensor(),
        # 如果需要的话
        # transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ])

    ants_dataset = CustomAntsAndBeesImageDataset(root_dir=root_dir, target_dir=ants_target_dir, transform=trans_img)
    bees_dataset = CustomAntsAndBeesImageDataset(root_dir=root_dir, target_dir=bees_target_dir, transform=trans_img)

    ants_loader = DataLoader(dataset=ants_dataset, batch_size=64, shuffle=True)
    bees_loader = DataLoader(dataset=bees_dataset, batch_size=64, shuffle=True)

    james = James_bees_ants()
    writer = SummaryWriter("logs")
    epoch = 0
    step = 0
    # 输出蚂蚁数据集的原始图片和卷积后的图片
    for data in ants_loader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , ants_input_size = {}".format(epoch, imgs.shape))
        print("Epoch : {} , ants_output_size = {}".format(epoch, output.shape))

        writer.add_images("ants_in", imgs, step)

        #  将 6 通道转换回 3 通道
        output = torch.reshape(output, (-1, 3, 254, 254))

        print("Epoch : {} , ants_reshape_output_size = {}".format(epoch, output.shape))
        writer.add_images("ants_out", output, step)

        epoch = epoch + 1
        step = step + 1

    for data in bees_loader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , bees_input_size = {}".format(epoch, imgs.shape))
        print("Epoch : {} , bees_output_size = {}".format(epoch, output.shape))

        writer.add_images("bees_in", imgs, step)

        #  将 6 通道转换回 3 通道
        output = torch.reshape(output, (-1, 3, 254, 254))

        print("Epoch : {} , bees_reshape_output_size = {}".format(epoch, output.shape))
        writer.add_images("bees_out", output, step)

        epoch = epoch + 1
        step = step + 1

    writer.close()

在上述我们稍微处理了一下,因为每张图片的尺寸不一致,因此我们需要进行相应的裁剪,在代码中的Compose部分有体现。

图片展示如下:在这里插入图片描述
在这里插入图片描述
上述是我们自定义的相关数据集的操作,理论上来说只要写的对,适用于任何图片的情况,当然,我在查询PyTorch的相关文档时,发现PyTorch也提供了创建基于上传图片的数据。

具体的API为 torchvision.datasets.ImageFolder。

但是这个有一定的文件排布的要求,需要你的文件排布有规则。一个常见的做法是按照类别来组织图片,每个类别的图片放在单独的子文件夹中。例如:

dataset/
    class_1/
        img1.jpg
        img2.jpg
        ...
    class_2/
        img1.jpg
        img2.jpg
        ...

这样做有助于后续使用PyTorch加载数据集时更方便地应用标签。

我也是用了这个API 进行实现,我的文件排布如下:

Centered Image

使用下面的代码进行实现:
def use_ImageFolder():

    transform = transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.ToTensor(),
    ])

    # 加载数据集
    ants_train_dataset = datasets.ImageFolder(root='data/hymenoptera_data/train', transform=transform)
    ants_train_loader = DataLoader(ants_train_dataset, batch_size=64, shuffle=True)

    james = James_bees_ants()
    writer = SummaryWriter("logs")
    epoch = 0
    step = 0
    # 输出蚂蚁数据集的原始图片和卷积后的图片
    for data in ants_train_loader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , use_ImageFolder_ants_input_size = {}".format(epoch, imgs.shape))
        print("Epoch : {} , use_ImageFolder_ants_output_size = {}".format(epoch, output.shape))

        # writer.add_images("use_ImageFolder_ants_in", imgs, step)

        #  将 6 通道转换回 3 通道
        output = torch.reshape(output, (-1, 3, 254, 254))

        print("Epoch : {} , use_ImageFolder_ants_reshape_output_size = {}".format(epoch, output.shape))
        writer.add_images("use_ImageFolder_ants_out", output, step)

        epoch = epoch + 1
        step = step + 1

    writer.close()

生成的结果也是差不多的。

Centered Image

实际上,要是对图片的要求不是很高的话,可以直接调用torchvision.datasets.ImageFolder来快速加载数据集。这样可以节约大量的时间,并将这些时间用于优化或者设计算法上面去。

最后,完整的代码附上:

import os

import torch
import torchvision

from torchvision import transforms, datasets
from torch.nn import Conv2d
from torch.utils.data import DataLoader, Dataset
from torch import nn
from torch.utils.tensorboard import SummaryWriter
from PIL import Image

dataset = torchvision.datasets.FashionMNIST("./data", train=False, transform=torchvision.transforms.ToTensor(),
                                            download=True)

dataloader = DataLoader(dataset, batch_size=64, shuffle=False)  # 定义批量大小为 64


class James(nn.Module):
    def __init__(self):
        super(James, self).__init__()
        self.conv1 = Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        return x


def run_error():
    james = James()
    writer = SummaryWriter("logs")
    epoch = 0
    step = 0
    for data in dataloader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , input_size = {}".format(epoch, imgs.shape))
        # Epoch : 0 , input_size = torch.Size([64, 1, 28, 28])
        print("Epoch : {} , output_size = {}".format(epoch, output.shape))
        # Epoch : 0 , output_size = torch.Size([64, 3, 26, 26])

        writer.add_images("in_e", imgs, step)
        writer.add_images("out_e", output, step)
        writer.add_images("out_e_plus", output, step)
        # 这里进行了两次卷积后的输出,尽管卷积操作相同,但我们发现在 tensorboard 中展示的图片颜色不一致
        # 查阅资料可知:卷积层生成的特征图通常包含负值或大于1的值,但图像像素值需要在0到1之间或者0到255之间的整数值。
        # 此外,这些特征图并不直接对应于可视化的RGB颜色空间,这导致了你观察到的颜色不一致现象。
        epoch = epoch + 1
        step = step + 1

    writer.close()


def run_right():
    james = James()
    writer = SummaryWriter("logs")
    epoch = 0
    step = 0
    for data in dataloader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , input_size = {}".format(epoch, imgs.shape))
        # Epoch : 0 , input_size = torch.Size([64, 1, 28, 28])
        print("Epoch : {} , output_size = {}".format(epoch, output.shape))
        # Epoch : 0 , output_size = torch.Size([64, 3, 26, 26])

        writer.add_images("in_r", imgs, step)

        # 我希望将其变成 torch.Size([64, 3, 26, 26]) -> torch.Size([xxx, 1, 26, 26]),
        # 因为 FashionMNIST 数据集是黑白照片集,其通道数只有1,而不是传统的RGB三通道
        # 第一个数不知道是多少的时候直接写 -1 ,程序会根据后面进行计算
        output = torch.reshape(output, (-1, 1, 26, 26))
        print("Epoch : {} , reshape_output_size = {}".format(epoch, output.shape))
        writer.add_images("out_r", output, step)

        epoch = epoch + 1
        step = step + 1

    writer.close()


#  =====================================================================================================================
#  自定义代码,将之前的蚂蚁蜜蜂数据集进行卷积操作

class James_bees_ants(nn.Module):
    def __init__(self):
        super(James_bees_ants, self).__init__()
        self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        return x


class CustomAntsAndBeesImageDataset(Dataset):
    def __init__(self, root_dir, target_dir, transform=None):
        self.img_dir = os.path.join(root_dir, target_dir)
        self.transform = transform
        # 获取目录中的所有图像文件名
        self.img_names = [f for f in os.listdir(self.img_dir) if
                          f.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif'))]

    def __len__(self):
        return len(self.img_names)

    def __getitem__(self, idx):
        img_name = self.img_names[idx]
        img_path = os.path.join(self.img_dir, img_name)
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, img_name


def run_ants_and_bees():
    root_dir = "data/hymenoptera_data/train"
    ants_target_dir = "ants_image"
    bees_target_dir = "bees_image"

    # img_ants_dir = os.path.join(root_dir, ants_target_dir)
    # img_bees_dir = os.path.join(root_dir, bees_target_dir)

    trans_img = transforms.Compose([
        transforms.Resize((256, 256)),  # 确保所有图像都调整为256x256
        transforms.ToTensor(),
        # 如果需要的话
        # transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ])

    ants_dataset = CustomAntsAndBeesImageDataset(root_dir=root_dir, target_dir=ants_target_dir, transform=trans_img)
    bees_dataset = CustomAntsAndBeesImageDataset(root_dir=root_dir, target_dir=bees_target_dir, transform=trans_img)

    ants_loader = DataLoader(dataset=ants_dataset, batch_size=64, shuffle=True)
    bees_loader = DataLoader(dataset=bees_dataset, batch_size=64, shuffle=True)

    james = James_bees_ants()
    writer = SummaryWriter("logs")
    epoch = 0
    step = 0
    # 输出蚂蚁数据集的原始图片和卷积后的图片
    for data in ants_loader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , ants_input_size = {}".format(epoch, imgs.shape))
        print("Epoch : {} , ants_output_size = {}".format(epoch, output.shape))

        writer.add_images("ants_in", imgs, step)

        #  将 6 通道转换回 3 通道
        output = torch.reshape(output, (-1, 3, 254, 254))

        print("Epoch : {} , ants_reshape_output_size = {}".format(epoch, output.shape))
        writer.add_images("ants_out", output, step)

        epoch = epoch + 1
        step = step + 1

    for data in bees_loader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , bees_input_size = {}".format(epoch, imgs.shape))
        print("Epoch : {} , bees_output_size = {}".format(epoch, output.shape))

        writer.add_images("bees_in", imgs, step)

        #  将 6 通道转换回 3 通道
        output = torch.reshape(output, (-1, 3, 254, 254))

        print("Epoch : {} , bees_reshape_output_size = {}".format(epoch, output.shape))
        writer.add_images("bees_out", output, step)

        epoch = epoch + 1
        step = step + 1

    writer.close()


#  =====================================================================================================================
#  使用torchvision.datasets.ImageFolder来快速加载按上述方式组织的数据集
#  定义图像变换
def use_ImageFolder():

    transform = transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.ToTensor(),
    ])

    # 加载数据集
    ants_train_dataset = datasets.ImageFolder(root='data/hymenoptera_data/train', transform=transform)
    ants_train_loader = DataLoader(ants_train_dataset, batch_size=64, shuffle=True)

    james = James_bees_ants()
    writer = SummaryWriter("logs")
    epoch = 0
    step = 0
    # 输出蚂蚁数据集的原始图片和卷积后的图片
    for data in ants_train_loader:
        imgs, targets = data
        output = james(imgs)
        print("Epoch : {} , use_ImageFolder_ants_input_size = {}".format(epoch, imgs.shape))
        print("Epoch : {} , use_ImageFolder_ants_output_size = {}".format(epoch, output.shape))

        # writer.add_images("use_ImageFolder_ants_in", imgs, step)

        #  将 6 通道转换回 3 通道
        output = torch.reshape(output, (-1, 3, 254, 254))

        print("Epoch : {} , use_ImageFolder_ants_reshape_output_size = {}".format(epoch, output.shape))
        writer.add_images("use_ImageFolder_ants_out", output, step)

        epoch = epoch + 1
        step = step + 1

    writer.close()


if __name__ == "__main__":
    # run_right()
    # run_error()
    # run_ants_and_bees()
    use_ImageFolder()