[目标检测实验系列] YOLOv8到v11源码逐行对比：核心改动与优化全解析（超详细流程，轻松入门）-EW帮帮网

1. 文章主要内容

本文将从YAML文件结构的 Backbone 和 Head 两个维度，详细解析YOLOv11与YOLOv8的差异。我们将深入探讨YOLOv11在网络结构上的改进，并通过代码对比帮助读者更好地理解这些变化。此外，如果您尚未尝试过YOLOv11，可以参考我的另一篇博客：【YOLOv11训练、推理】手把手教你玩转YOLOv11模型（环境配置-模型训练、推理），轻松入门目标检测领域！，快速掌握最新版本的使用方法。

2. 相关说明

在深入探讨YOLOv11与YOLOv8的代码差异之前，博主希望确保你能更轻松地理解本文内容。如果你已经了解以下知识，阅读会更加顺畅。
1. YOLO系列的基本原理：稍微了解YOLO系列代码的组成结构，以及大概每个结构的作用。

2.卷积神经网络（CNN）基础 ：对卷积层、池化层、激活函数等概念有初步认识。

3. YOLOv11（相较于YOLOv8的改进）

本节重点分析YOLOv11的三个核心改进：分别是C3K2、C2PSA和深度可分离卷积的检测头。针对于每个改进点，将从网络结构图、技术原理和源代码实现三个维度展开详细解析，帮助读者全面理解YOLOv11的技术升级与创新逻辑。（以下是对应YOLOv11的YAML文件结构图）
在这里插入图片描述

3.1 C3K2模块（Backbone、Head）

3.1.1 网络结构对比图

C3K2模块是YOLOv11在Backbone和Head部分的核心改进，用于替代YOLOv8中的C2f模块。接下来，我们通过其网络结构图直观了解其设计细节。
在这里插入图片描述

3.1.2 改进点的描述

从对比图中可以明显看出，YOLOv11的C3K2模块与YOLOv8的C2f模块的主要区别在于：C3K2采用了C3k模块替代了原有的Bottleneck模块。具体来说，C3k模块的结构与YOLOv5中的C3模块保持一致，其详细架构可参考上方的第二幅图示

3.1.3 源代码分析

从上述分析可以看出，C3k模块是核心差异所在。为深入理解这一改进，我们将采用从整体到局部的方式，首先从C3K2源码进行分析，然后再分析C3k模块的源代码。以下为C3K2模块的代码实现：
当c3k=False时，self.m为Bottleneck，即YOLOv8的原始C2f模块
当c3k=True时，self.m为C3k，即YOLOv11的改进模块

class C3k2(C2f):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
        """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(
            C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n)
        )

当c3k=True时，C3k模块的实现基于YOLOv5的C3模块，其核心改进在于Bottleneck模块的卷积核大小参数k：原C3模块采用固定的(1, 1)和(3, 3)卷积核，而C3k模块的k值则通过模块输入动态指定，这一设计显著提升了模块的灵活性和适应性。

class C3k(C3):
    """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""

    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
        """Initializes the C3k module with specified channels, number of layers, and configurations."""
        super().__init__(c1, c2, n, shortcut, g, e)
        c_ = int(c2 * e)  # hidden channels
        # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))

深入理解代码实现后，我们来探讨C3k模块的改进优势：其核心在于引入动态可变的卷积核大小（如3x3、5x5甚至更大），这不仅使模型能够灵活提取多尺度特征，还显著扩展了感受野。这一改进特别有利于检测复杂场景和大尺寸目标，提升了模型的整体检测性能

3.2 C2PSA模块（Backbone）

3.2.1 网络结构图

C2PSA是YOLOv11模型新增的模块，结构如下图所示。
在这里插入图片描述

3.2.2 改进点的描述

由上图可知，C2PSA在YOLOv8系列的CSP结构基础上，引入了PSA注意力机制模块。PSA模块通过多头自注意力机制（Multi-Head Self-Attention），能够有效捕捉目标的细节特征和全局上下文信息，从而显著提升模型的检测性能。

3.2.3 源代码分析

相关的源代码如下所示，C2PSA主要由PSABlock和FFN（即上图中PSA的两个Conv模块）组成。其中，PSABlock的核心是Attention模块，它基于多头自注意力机制（Multi-Head Self-Attention），这一机制源自Transformer，通过Query (Q)、Key (K)和Value (V)进行计算。在实现中，Q、K、V通过1x1卷积初始化生成，对应代码中的self.qkv变量。多头自注意力机制的作用是充分提取特征的全局上下文信息，从而提升模型的检测性能。此外，Attention模块还通过self.pe引入了位置编码（Positional Encoding），以增强模型对空间位置信息的感知能力。

class C2PSA(nn.Module):
    """
    C2PSA module with attention mechanism for enhanced feature extraction and processing.

    This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing
    capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations.

    Attributes:
        c (int): Number of hidden channels.
        cv1 (Conv): 1x1 convolution layer to reduce the number of input channels to 2*c.
        cv2 (Conv): 1x1 convolution layer to reduce the number of output channels to c.
        m (nn.Sequential): Sequential container of PSABlock modules for attention and feed-forward operations.

    Methods:
        forward: Performs a forward pass through the C2PSA module, applying attention and feed-forward operations.

    Notes:
        This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules.

    Examples:
        >>> c2psa = C2PSA(c1=256, c2=256, n=3, e=0.5)
        >>> input_tensor = torch.randn(1, 256, 64, 64)
        >>> output_tensor = c2psa(input_tensor)
    """

    def __init__(self, c1, c2, n=1, e=0.5):
        """Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio."""
        super().__init__()
        assert c1 == c2
        self.c = int(c1 * e)
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv(2 * self.c, c1, 1)

        self.m = nn.Sequential(*(PSABlock(self.c, attn_ratio=0.5, num_heads=self.c // 64) for _ in range(n)))

    def forward(self, x):
        """Processes the input tensor 'x' through a series of PSA blocks and returns the transformed tensor."""
        a, b = self.cv1(x).split((self.c, self.c), dim=1)
        b = self.m(b)
        return self.cv2(torch.cat((a, b), 1))

class PSABlock(nn.Module):
    """
    PSABlock class implementing a Position-Sensitive Attention block for neural networks.

    This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers
    with optional shortcut connections.

    Attributes:
        attn (Attention): Multi-head attention module.
        ffn (nn.Sequential): Feed-forward neural network module.
        add (bool): Flag indicating whether to add shortcut connections.

    Methods:
        forward: Performs a forward pass through the PSABlock, applying attention and feed-forward layers.

    Examples:
        Create a PSABlock and perform a forward pass
        >>> psablock = PSABlock(c=128, attn_ratio=0.5, num_heads=4, shortcut=True)
        >>> input_tensor = torch.randn(1, 128, 32, 32)
        >>> output_tensor = psablock(input_tensor)
    """

    def __init__(self, c, attn_ratio=0.5, num_heads=4, shortcut=True) -> None:
        """Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction."""
        super().__init__()

        self.attn = Attention(c, attn_ratio=attn_ratio, num_heads=num_heads)
        self.ffn = nn.Sequential(Conv(c, c * 2, 1), Conv(c * 2, c, 1, act=False))
        self.add = shortcut

    def forward(self, x):
        """Executes a forward pass through PSABlock, applying attention and feed-forward layers to the input tensor."""
        x = x + self.attn(x) if self.add else self.attn(x)
        x = x + self.ffn(x) if self.add else self.ffn(x)
        return x

class Attention(nn.Module):
    """
    Attention module that performs self-attention on the input tensor.

    Args:
        dim (int): The input tensor dimension.
        num_heads (int): The number of attention heads.
        attn_ratio (float): The ratio of the attention key dimension to the head dimension.

    Attributes:
        num_heads (int): The number of attention heads.
        head_dim (int): The dimension of each attention head.
        key_dim (int): The dimension of the attention key.
        scale (float): The scaling factor for the attention scores.
        qkv (Conv): Convolutional layer for computing the query, key, and value.
        proj (Conv): Convolutional layer for projecting the attended values.
        pe (Conv): Convolutional layer for positional encoding.
    """

    def __init__(self, dim, num_heads=8, attn_ratio=0.5):
        """Initializes multi-head attention module with query, key, and value convolutions and positional encoding."""
        super().__init__()
        self.num_heads = num_heads
        self.head_dim = dim // num_heads
        self.key_dim = int(self.head_dim * attn_ratio)
        self.scale = self.key_dim**-0.5
        nh_kd = self.key_dim * num_heads
        h = dim + nh_kd * 2
        self.qkv = Conv(dim, h, 1, act=False)
        self.proj = Conv(dim, dim, 1, act=False)
        self.pe = Conv(dim, dim, 3, 1, g=dim, act=False)

    def forward(self, x):
        """
        Forward pass of the Attention module.

        Args:
            x (torch.Tensor): The input tensor.

        Returns:
            (torch.Tensor): The output tensor after self-attention.
        """
        B, C, H, W = x.shape
        N = H * W
        qkv = self.qkv(x)
        q, k, v = qkv.view(B, self.num_heads, self.key_dim * 2 + self.head_dim, N).split(
            [self.key_dim, self.key_dim, self.head_dim], dim=2
        )

        attn = (q.transpose(-2, -1) @ k) * self.scale
        attn = attn.softmax(dim=-1)
        x = (v @ attn.transpose(-2, -1)).view(B, C, H, W) + self.pe(v.reshape(B, C, H, W))
        x = self.proj(x)
        return x

3.3 深度可分离检测头模块（Head）

3.3.1 改进点的描述

在Detect的class类中，YOLOv11将普通卷积替换成了深度可分离卷积（DWConv），因为过于简单，所以这里不进行网络结构图的可视化。

3.3.2 源代码分析

YOLOv11在Detect类中对self.cv3进行了重要改进：当self.legacy=False（默认配置）时，模型采用YOLOv11的优化方案，即将self.cv3中的部分标准卷积（Conv）替换为输入输出通道数相同的深度可分离卷积（DWConv）；而当self.legacy=True时，则保持对YOLOv5、v8和v9模型的兼容性，相关实现如下所示。

class Detect(nn.Module):
    """YOLOv8 Detect head for detection models."""

    dynamic = False  # force grid reconstruction
    export = False  # export mode
    end2end = False  # end2end
    max_det = 300  # max_det
    shape = None
    anchors = torch.empty(0)  # init
    strides = torch.empty(0)  # init

    def __init__(self, nc=80, ch=()):
        """Initializes the YOLOv8 detection layer with specified number of classes and channels."""
        super().__init__()
        self.nc = nc  # number of classes
        self.nl = len(ch)  # number of detection layers
        self.reg_max = 16  # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)
        self.no = nc + self.reg_max * 4  # number of outputs per anchor
        self.stride = torch.zeros(self.nl)  # strides computed during build
        c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], min(self.nc, 100))  # channels
        self.cv2 = nn.ModuleList(
            nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch
        )
        self.cv3 = nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for x in ch)
        self.dfl = DFL(self.reg_max) if self.reg_max > 1 else nn.Identity()

        if self.end2end:
            self.one2one_cv2 = copy.deepcopy(self.cv2)
            self.one2one_cv3 = copy.deepcopy(self.cv3)

4.总结

本文通过对比YOLOv11与YOLOv8的差异，从网络结构图、改进点解析以及源代码实现三个维度进行深入剖析，旨在帮助读者快速掌握YOLOv11的核心要点。欢迎在评论区交流探讨，也希望大家多多点赞、收藏并关注，后续将持续更新相关代码解读与论文分析！

[目标检测实验系列] YOLOv8到v11源码逐行对比：核心改动与优化全解析（超详细流程，轻松入门）

目录