动手学深度学习——softmax回归-EW帮帮网

Softmax回归的用处

Softmax回归模型

交叉熵损失函数

softmax回归从零开始实现

Softmax回归的用处

分类问题：Softmax回归主要用于多分类问题，比如区分图片是“猫”还是“狗”。
概率输出：它不仅能预测输入属于某个类别，还能给出每个类别的概率。这样模型不仅告诉你结果，还能表达“信心”。

Softmax回归模型

例如：对4个像素的照片识别3个类别A，B，C。

使用独热编码对三个类别进行打标签： $y_A=\begin{Bmatrix} 1 &0 &0 \end{Bmatrix}$ ， $y_B=\begin{Bmatrix} 0 &1 &0 \end{Bmatrix}$ ， $y_A=\begin{Bmatrix} 0 &0 &1 \end{Bmatrix}$ 。

设输入特征向量 $\mathbf{x} = \begin{bmatrix} x_1 &x_2 &x_3 &x_4 \end{bmatrix}$ ，权重 $\mathbf{w}=\begin{bmatrix} w_{11}&w_{12} &w_{13} \\ w_{21}&w_{22} &w_{23} \\ w_{31}&w_{32} &w_{33} \\ w_{41}&w_{42} &w_{43} \end{bmatrix}$ ，偏置 $\mathbf{b}=\begin{bmatrix} b_{1} &b_{2} &b_{3} \end{bmatrix}$ 。

对每个输入都计算每一类的“概率” $o_1,o_2,o_3$ ：

$o_{1} = x_{1}w_{11} + x_{2}w_{12} + x_{3}w_{13} + x_{4}w_{14} +b_{1}\\ o_{2} = x_{1}w_{21} + x_{2}w_{22} + x_{3}w_{23} + x_{4}w_{24} +b_{2}\\ o_{3} = x_{1}w_{31} + x_{2}w_{32} + x_{3}w_{33} + x_{4}w_{34} + b_{3}\\$

softmax回归对样本 $i$ 的矢量计算表达式为：

$o^{(i)}=x^{(i)}W+b$

上面得到的 $o_1,o_2,o_3$ 其实不是真正的概率，他们的值可能是大于1，也可能小于0，也就是说输出范围是不确定的，我们需要将输出结果进行规范化：输出结果都是非负的且总和为1

使用softmax运算进行规范化：

$\hat{\mathbf{y}} = \text{softmax}(\mathbf{o})$ 其中 $\hat{y}_{j} = \frac{\exp(o_{j})}{\sum_{k} \exp(o_{k})}$

交叉熵损失函数

由于结果是概率，我们不能再使用均方误差MSE了，例如， $y_1=0.6,y_2=0,y_3=0.4$ 和

$y_1=0.6,y_2=0.2,y_3=0.2$ 有一样的结果，但是使用MSE的话，第一个结果计算出来的误差比第二个结果计算出来的误差大很多。改善的方法是使用更加适合衡量两个概率分布差异的测量函数，交叉熵是一个常见的方法。

对于任何标签 $\mathbf{y}$ 和模型 $\mathbf{\hat{y}}$ 预测，损失函数为：

$l(\mathbf{y}, \hat{\mathbf{y}}) = - \sum_{j=1}^{q} y_{j} \log \hat{y}_{j}$

softmax回归从零开始实现

import torch
from IPython import display
from d2l import torch as d2l

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

设置 小批量梯度下降（mini-batch SGD） 的批量大小。

调用了 d2l 库（Dive into Deep Learning 提供的工具包）里的 load_data_fashion_mnist 函数，来加载 Fashion-MNIST 数据集。

Fashion-MNIST 是一个常用的图像分类数据集，包含 10 类服饰（鞋子、T恤、包等），每张图片大小为 28×28 灰度图。
load_data_fashion_mnist(batch_size) 会自动下载并处理这个数据集，返回两个 迭代器：
- train_iter：训练集的迭代器（打乱过的数据，每次迭代取 batch_size=256 张图片）。
- test_iter：测试集的迭代器（一般不打乱，用于评估模型性能）。

初始化模型参数

num_inputs = 784
num_outputs = 10

num_inputs = 784：输入特征数量是 784，因为 Fashion-MNIST 的每张图是 28 × 28，展开后就是 784 个像素点作为特征。
num_outputs = 10：输出类别数是 10，因为 Fashion-MNIST 一共有 10 种服饰类别。

也就是说：输入维度是 784，输出维度是 10。

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)

这里定义了 权重参数矩阵 W，形状是 (784, 10)。
torch.normal(0, 0.01, ...) 表示从均值为 0、标准差为 0.01 的 正态分布中随机初始化权重。
requires_grad=True 表示在训练过程中，PyTorch 会自动为 W 计算梯度，以便梯度下降更新参数。

b = torch.zeros(num_outputs, requires_grad=True)

这里定义了 偏置向量 b，形状是 (10,)，因为每个类别对应一个偏置。
初始化为 0。
同样设置 requires_grad=True，表示训练时会更新它。

定义softmax操作

def softmax(X):
    X_exp = torch.exp(X)
    partition = X_exp.sum(1, keepdim=True)
    return X_exp / partition  # 这里应用了广播机制

完成softmax计算 $\hat{y}_{j} = \frac{\exp(o_{j})}{\sum_{k} \exp(o_{k})}$

keepdim=True 让结果形状变成 (batch_size, 1)，这样和 (batch_size, num_classes) 在做除法时可以按列广播。如果不保留维度，形状会变成 (batch_size,)，无法与 (batch_size, num_classes) 正确广播。

定义模型

def net(X):
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

实现了 softmax 回归模型的前向计算：

把输入图像展平为向量；
通过权重矩阵 W 和偏置 b 计算 logits， $o^{(i)}=x^{(i)}W+b$ ；
使用 softmax 把 logits 转换成概率分布， $\hat{y}_{j} = \frac{\exp(o_{j})}{\sum_{k} \exp(o_{k})}$ ；
返回每个样本属于各类别的概率。

定义损失函数

def cross_entropy(y_hat, y):
    return - torch.log(y_hat[range(len(y_hat)), y])

y_hat：模型的预测结果（softmax 输出），形状是 (batch_size, num_classes)，每一行是一个概率分布。
y：真实标签（ground truth），形状是 (batch_size,)，其中每个值是类别索引（0~num_classes-1）。

range(len(y_hat)) 生成一个批量索引（比如 [0, 1, 2, ...]）。
y 是每个样本的真实类别。
y_hat[range(len(y_hat)), y] 就相当于“按行取对应标签的概率”。

交叉熵损失的核心思想就是：最大化真实类别的预测概率，等价于最小化 负对数似然。
所以对真实类别的预测概率取 log，再加负号：

$l_i = - \log \big( \hat{y}_{i, y_i} \big)$

分类精度

def accuracy(y_hat, y):  #@save
    """计算预测正确的数量"""
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())

用于计算预测值与真实标签相等的个数。

def evaluate_accuracy(net, data_iter):  #@save
    """计算在指定数据集上模型的精度"""
    if isinstance(net, torch.nn.Module):
        net.eval()  # 将模型设置为评估模式
    metric = Accumulator(2)  # 正确预测数、预测总数
    with torch.no_grad():
        for X, y in data_iter:
            metric.add(accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

用于 评估模型在一个数据集（如测试集）上的准确率。

class Accumulator:  #@save
    """在n个变量上累加"""
    def __init__(self, n):
        self.data = [0.0] * n

    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

1. 定义类

class Accumulator: #@save

2. 构造函数 __init__

def __init__(self, n): self.data = [0.0] * n

__init__ 是类的构造函数，在你创建对象的时候会自动执行。
参数 n 表示你要统计多少个变量。
self.data 是一个列表，初始值是 [0.0, 0.0, ..., 0.0]，长度为 n。

3. 添加数据 add

def add(self, *args):
self.data = [a + float(b) for a, b in zip(self.data, args)]

*args 表示可以传入多个数，比如 acc.add(1, 2, 3)。
zip(self.data, args) 会把旧的累计值和新传入的值配对。
a 表示旧的值（在 self.data 里），b 表示这次传入的新值（在 args 里）。
a + float(b) 就是把新值加到旧值上。
结果更新回 self.data。

4. 重置 reset

def reset(self):
self.data = [0.0] * len(self.data)

把累计结果清零，恢复到 [0.0, 0.0, ...]。

5. 支持索引访问 __getitem__

def __getitem__(self, idx):
return self.data[idx]

这个方法可以像访问列表一样访问对象的结果。

训练

def train_epoch_ch3(net, train_iter, loss, updater):  #@save
    """训练模型一个迭代周期（定义见第3章）"""
    # 将模型设置为训练模式
    if isinstance(net, torch.nn.Module):
        net.train()
    # 训练损失总和、训练准确度总和、样本数
    metric = Accumulator(3)
    for X, y in train_iter:
        # 计算梯度并更新参数
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            # 使用PyTorch内置的优化器和损失函数
            updater.zero_grad()
            l.mean().backward()
            updater.step()
        else:
            # 使用定制的优化器和损失函数
            l.sum().backward()
            updater(X.shape[0])
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 返回训练损失和训练精度
    return metric[0] / metric[2], metric[1] / metric[2]

1. 函数的输入

net：神经网络模型，可以是 torch.nn.Module（PyTorch标准模型）或自定义模型。
train_iter：训练数据迭代器（通常是 DataLoader），会批量提供 (X, y)，即输入和标签。
loss：损失函数，比如 nn.CrossEntropyLoss。
updater：优化器，可以是 PyTorch 内置的优化器（如 torch.optim.SGD）或者一个自定义更新函数。

2. 设置训练模式

if isinstance(net, torch.nn.Module):
net.train()

net.train() 表示将模型切换到训练模式（比如启用 Dropout，使用 BatchNorm 的训练行为）。

3. 记录指标

metric = Accumulator(3)

Accumulator(3) 用于累计三个指标：
1. 训练损失的总和
2. 训练正确预测数的总和（准确率相关）
3. 样本总数

4. 遍历批量数据

for X, y in train_iter:
y_hat = net(X) # 前向传播，得到预测
l = loss(y_hat, y) # 计算损失

对每个 batch 的数据 (X, y)，先得到预测值 y_hat，然后计算损失 l。

5. 反向传播与参数更新

if isinstance(updater, torch.optim.Optimizer):

# 使用PyTorch内置的优化器和损失函数

updater.zero_grad()

l.mean().backward()

updater.step()

else:

# 使用定制的优化器和损失函数

l.sum().backward()

updater(X.shape[0])

6. 累积指标

metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())

l.sum()：本 batch 的总损失
accuracy(y_hat, y)：预测正确的样本数
y.numel()：本 batch 的样本数量

7. 返回平均值

return metric[0] / metric[2], metric[1] / metric[2]

metric[0] / metric[2] → 平均训练损失
metric[1] / metric[2] → 平均训练精度

绘图

class Animator:  #@save
    """在动画中绘制数据"""
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(3.5, 2.5)):
        # 增量地绘制多条线
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
        if nrows * ncols == 1:
            self.axes = [self.axes, ]
        # 使用lambda函数捕获参数
        self.config_axes = lambda: d2l.set_axes(
            self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        # 向图表中添加多个数据点
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        display.display(self.fig)
        display.clear_output(wait=True)

Animator 类的核心功能是 动态绘制折线图，常用于深度学习训练时实时显示：

损失随迭代次数的下降曲线
准确率随迭代次数的上升曲线

工作流程是：

初始化画布和曲线样式；
每次训练迭代后调用 add(x, y)，把新的数据点加进去；
自动更新图像，形成动画效果。

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  #@save
    """训练模型（定义见第3章）"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

1. 函数定义

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater): #@save

参数说明：

net：神经网络模型。
train_iter：训练数据迭代器。
test_iter：测试数据迭代器。
loss：损失函数。
num_epochs：训练的迭代周期数。
updater：优化器（可以是 PyTorch 内置的，也可以是自定义的）。

2. 初始化绘图工具

animator = Animator(
xlabel='epoch',
xlim=[1, num_epochs],
ylim=[0.3, 0.9],
legend=['train loss', 'train acc', 'test acc']
)

创建一个 Animator 实例，准备动态绘制三条曲线：
- 训练损失（train loss）
- 训练准确率（train acc）
- 测试准确率（test acc）
xlabel='epoch' → x 轴标签为 epoch。
xlim=[1, num_epochs] → 横轴范围从 1 到 num_epochs。
ylim=[0.3, 0.9] → 纵轴范围设定（一般是精度范围）。

3. 训练循环

for epoch in range(num_epochs):
train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
test_acc = evaluate_accuracy(net, test_iter)
animator.add(epoch + 1, train_metrics + (test_acc,))

步骤：

训练一个 epoch
- 调用 train_epoch_ch3，返回 (train_loss, train_acc)。
在测试集上评估
- 调用 evaluate_accuracy，计算当前模型在 test_iter 上的精度。
动态绘图
- animator.add() 将本次 epoch 的指标添加到图表中：
  - x 轴：epoch + 1（因为从 0 开始循环，显示时加 1）
  - y 轴：包含 (train_loss, train_acc, test_acc)

4. 训练结束后的检查

train_loss, train_acc = train_metrics
assert train_loss < 0.5, train_loss
assert train_acc <= 1 and train_acc > 0.7, train_acc
assert test_acc <= 1 and test_acc > 0.7, test_acc

这里用 assert 语句做了一些 简单的 sanity check（合理性检查）：

训练损失应小于 0.5。
训练准确率在 0.7 ~ 1 之间。
测试准确率在 0.7 ~ 1 之间。

如果不满足这些条件，程序会抛出 AssertionError，说明模型没有达到预期效果（比如没有学到东西，或者训练失败）。

lr = 0.1

def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)

num_epochs = 10
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)

定义学习率 0.1；
用自定义的 updater（基于手写的随机梯度下降 sgd）更新参数；
训练模型 10 个 epoch；
使用 cross_entropy 作为损失，训练过程中会实时绘制 训练损失 / 训练精度 / 测试精度曲线。

预测

def predict_ch3(net, test_iter, n=6):  #@save
    """预测标签（定义见第3章）"""
    for X, y in test_iter:
        break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true +'\n' + pred for true, pred in zip(trues, preds)]
    d2l.show_images(
        X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])

predict_ch3(net, test_iter)

从测试集中取一批样本；
得到真实标签和模型预测标签；
显示前 n 张图片，并在图片下方写上 真实 vs 预测 的结果。

代码

import torch
from IPython import display
from d2l import torch as d2l
# 加载训练集和测试集
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
# 初始化模型参数
num_inputs = 784
num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)
# 定义softmax函数
def softmax(X):
    X_exp = torch.exp(X)
    partition = X_exp.sum(1,keepdim=True)
    return X_exp / partition
""" keepdim=True 让结果形状变成 (batch_size, 1)，
    这样和 (batch_size, num_classes) 在做除法时可以按列广播。
    如果不保留维度，形状会变成 (batch_size,)，无法与 (batch_size, num_classes) 正确广播。"""
# 定义模型
def net(X):
    X_reshape = X.reshape((-1,W.shape[0]))
    return softmax(torch.matmul(X_reshape,W)+b)

# 定义损失函数
def cross_entropy(y_hat, y):
    return - torch.log(y_hat[range(len(y_hat)), y])
"""因为标签y是使用独热编码，正确的类别的索引是1，其他都是0，所以只需要计算正确类别处的-log(y_hat)就行了"""

# 分类精度
def accuracy(y_hat,y):
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
        """ X.argmax(dim=0)  # 找每列最大值所在的行索引
            X.argmax(dim=1)  # 找每行最大值所在的列索引
            这里是找到第二个维度中元素的最大值的索引，在这一步已经得到是整数预测结果的索引了"""
    cmp = y_hat.type(y.dtype) == y
    """这一步是先让两个整数的类型一摸一样，然后在‘==’得到预测结果和实际结果的差异，相同为True，不同为Fasle""" 
    return float(cmp.type(y.dtype).sum())
    """这一步是返回转化为浮点数的预测结果正确的数量"""
"""分类精度 GPT推荐写法
def accuracy_count(y_hat, y):
    if y_hat.ndim > 1:
        y_hat = y_hat.argmax(dim=1)
    return (y_hat == y).sum().item()  # 正确样本数（int）
"""
# 计算在指定数据集上模型的精度
def evaluate_accuracy(net, data_iter):  #@save
    """计算在指定数据集上模型的精度"""
    if isinstance(net, torch.nn.Module):
        net.eval()  # 将模型设置为评估模式
    metric = Accumulator(2)  # 正确预测数、预测总数
    with torch.no_grad():
        for X, y in data_iter:
            metric.add(accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

class Accumulator:  #@save
    """在n个变量上累加"""
    def __init__(self, n):
        self.data = [0.0] * n

    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

# 训练
def train_epoch_ch3(net, train_iter, loss, updater):  #@save
    """训练模型一个迭代周期（定义见第3章）"""
    # 将模型设置为训练模式
    if isinstance(net, torch.nn.Module):
        net.train()
    # 训练损失总和、训练准确度总和、样本数
    metric = Accumulator(3)
    for X, y in train_iter:
        # 计算梯度并更新参数
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            # 使用PyTorch内置的优化器和损失函数
            updater.zero_grad()
            l.mean().backward()
            updater.step()
        else:
            # 使用定制的优化器和损失函数
            l.sum().backward()
            updater(X.shape[0])
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 返回训练损失和训练精度
    return metric[0] / metric[2], metric[1] / metric[2]

# 绘图
class Animator:  #@save
    """在动画中绘制数据"""
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(3.5, 2.5)):
        # 增量地绘制多条线
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
        if nrows * ncols == 1:
            self.axes = [self.axes, ]
        # 使用lambda函数捕获参数
        self.config_axes = lambda: d2l.set_axes(
            self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        # 向图表中添加多个数据点
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        display.display(self.fig)
        display.clear_output(wait=True)
 # 训练
def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  #@save
    """训练模型（定义见第3章）"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

lr = 0.1

def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)
num_epochs = 10
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)
def predict_ch3(net, test_iter, n=6):  #@save
    """预测标签（定义见第3章）"""
    for X, y in test_iter:
        break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true +'\n' + pred for true, pred in zip(trues, preds)]
    d2l.show_images(
        X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])

predict_ch3(net, test_iter)

精简版代码

# -*- coding: utf-8 -*-
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

# ===== 0) 设备选择 =====
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

# ===== 1) 数据加载 =====
batch_size = 256
transform = transforms.ToTensor()
train_ds = datasets.FashionMNIST(root="./data", train=True, download=True, transform=transform)
test_ds  = datasets.FashionMNIST(root="./data", train=False, download=True, transform=transform)
train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
test_loader  = DataLoader(test_ds, batch_size=batch_size, shuffle=False, num_workers=2, pin_memory=True)

# ===== 2) 模型 =====
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(784, 10)
).to(device)

# ===== 3) 损失 & 优化器 =====
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# ===== 4) 评估函数 =====
@torch.no_grad()
def evaluate_accuracy(model, loader):
    model.eval()
    correct, total = 0, 0
    for X, y in loader:
        X, y = X.to(device), y.to(device)
        pred = model(X).argmax(dim=1)
        correct += (pred == y).sum().item()
        total   += y.size(0)
    return correct / total

# ===== 5) 训练循环 + 记录 =====
num_epochs = 10
history = {"train_loss": [], "train_acc": [], "test_acc": []}

for epoch in range(1, num_epochs + 1):
    model.train()
    running_loss, running_correct, total = 0.0, 0, 0

    for X, y in train_loader:
        X, y = X.to(device, non_blocking=True), y.to(device, non_blocking=True)

        logits = model(X)
        loss = criterion(logits, y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss   += loss.item() * y.size(0)
        running_correct += (logits.argmax(dim=1) == y).sum().item()
        total += y.size(0)

    train_loss = running_loss / total
    train_acc  = running_correct / total
    test_acc   = evaluate_accuracy(model, test_loader)

    history["train_loss"].append(train_loss)
    history["train_acc"].append(train_acc)
    history["test_acc"].append(test_acc)

    print(f"Epoch {epoch:02d}/{num_epochs} | "
          f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | Test Acc: {test_acc:.4f}")

# ===== 6) 绘制曲线 =====
epochs = range(1, num_epochs + 1)
plt.figure(figsize=(10,4))

plt.subplot(1,2,1)
plt.plot(epochs, history["train_loss"], 'o-', label="Train Loss")
plt.xlabel("Epoch"); plt.ylabel("Loss"); plt.title("Training Loss"); plt.legend()

plt.subplot(1,2,2)
plt.plot(epochs, history["train_acc"], 'o-', label="Train Acc")
plt.plot(epochs, history["test_acc"], 's--', label="Test Acc")
plt.xlabel("Epoch"); plt.ylabel("Accuracy"); plt.title("Accuracy"); plt.legend()

plt.tight_layout()
plt.show()

# ===== 7) 可视化预测结果 =====
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

@torch.no_grad()
def show_predictions(model, loader, n=6):
    model.eval()
    X, y = next(iter(loader))
    X_show = X[:n]
    X, y = X.to(device), y.to(device)

    preds = model(X[:n]).argmax(dim=1).cpu()

    plt.figure(figsize=(n * 2, 2.5))
    for i in range(n):
        plt.subplot(1, n, i + 1)
        plt.imshow(X_show[i][0], cmap="gray")
        plt.title(f"T:{classes[y[i].item()]}\nP:{classes[preds[i].item()]}", fontsize=9)
        plt.axis("off")
    plt.tight_layout()
    plt.show()

show_predictions(model, test_loader, n=6)

动手学深度学习——softmax回归

Softmax回归的用处

Softmax回归模型

交叉熵损失函数

softmax回归从零开始实现

初始化模型参数

定义softmax操作

定义模型

定义损失函数

分类精度

训练

预测

代码

精简版代码

网站公告

今日签到

热门文章

最新发布