各类神经网络学习：（八）GRU 门控循环单元（下集），pytorch 版详细代码编写说明-EW帮帮网

上一篇	下一篇
GRU（上集）	待编写

代码详解

对于反向传播，想要深入研究的，可以看看知乎或者CSDN上的文章。只是想用的，知道使用的是链式法则、梯度下降法即可。

pytorch 官网主要有两个可调用的模块，分别是 nn.GRUCell 和 nn.GRU 下面会进行详细讲解。

使用起来和 $RNN$ 、 $L STM$ 的相关模块类似（在代码上调用起来和 $RNN$ 几乎一模一样）。

下面模块调用展示时，如果对其尺寸设置不理解的话，可以参考往期有关 $RNN$ 的博客：各类神经网络学习：（四）RNN 循环神经网络（下集），pytorch 版的 RNN 代码编写-CSDN博客

1）pytorch版模块调用

①nn.GRUCell（单步GRU）

官网链接：GRUCell — PyTorch 2.6 documentation

使用此函数，需要再手动实现时间循环

对应结构图：
模块解析：
```
class torch.nn.GRUCell(input_size, hidden_size, bias=True, device=None, dtype=None)
# 实例化：grucell = torch.nn.GRUCell(10,20)
```
- 类的参数解释：
  - input_size (int)：输入 $x$ 的特征数------------------------------- 其实就是 $x$ 的维度，即向量 $x$ 中的元素个数。
  - hidden_size (int)：隐藏状态 $h$ 的特征数----------------------- 其实就是 $h$ 的维度，即向量 $h$ 中的元素个数。
  - bias (bool)：偏置设置项，默认为 True ----------------------- 如果为 $F a l se$ 则不使用偏置。
- 输入，的类型及形状（ input, hidden ）：
  - input ：类型：tensor，形状： $N,H_{in})$ 或 $H_{in})$ --------其中 $N$ 就是 batch_size （批量）， $H_{in}$ = input_size 。
  - hidden ：类型：tensor，形状： $N,H_{out})$ 或 $H_{out})$ — 其中 $H_{out}$ = hidden_size ，如果不提供就默认为 0 张量。
- 输出，的类型及形状（ (h_1, c_1) ）：
  - hidden ：类型：tensor，形状： $N,H_{out})$ 或 $H_{out})$ ----其中 $H_{out}$ = hidden_size ，此输出代表了下一时刻的隐藏层状态。
- 其中的权重 $W 、 b$ 都是自动初始化、可自学习的。

完整调用样例展示（和调用nn.RNNCell差不多）：

有关 nn.RNNCell 的调用，请参考：RNN 循环神经网络（下集）。

import torch

batch_size = 2
seq_len = 3
input_size = 4
hidden_size = 2

cell = torch.nn.GRUCell(input_size=input_size, hidden_size=hidden_size)  # 实例化

dataset = torch.randn(seq_len, batch_size, input_size)  # 构造固定格式的数据集, (seq, batch, features)
hidden = torch.zeros(batch_size, hidden_size)  # 初始化隐层状态输入

for idx, input in enumerate(dataset):
    print('=' * 20, idx, '=' * 20)
    print('input size:', input.shape)
    hidden = cell(input, hidden)
    print('outputs size:', hidden.shape)
    print(hidden)

-----------------------------------------------------------------------------------------------------------------------
# 输出结果为：
==================== 0 ====================
input size: torch.Size([2, 4])
outputs size: torch.Size([2, 2])
tensor([[-0.0090,  0.8231],
        [-0.0764,  0.2640]], grad_fn=<AddBackward0>)
==================== 1 ====================
input size: torch.Size([2, 4])
outputs size: torch.Size([2, 2])
tensor([[ 0.0568,  0.4827],
        [-0.0359,  0.2657]], grad_fn=<AddBackward0>)
==================== 2 ====================
input size: torch.Size([2, 4])
outputs size: torch.Size([2, 2])
tensor([[ 0.0796,  0.8472],
        [-0.1967,  0.3379]], grad_fn=<AddBackward0>)

②nn.GRU（重点）

官网链接：GRU — PyTorch 2.6 documentation

使用此函数，无需再手动实现时间循环

可以理解为由多个 nn.GRUCell 组成的集成网络。

模块解析：
```
class torch.nn.GRU(input_size, hidden_size, num_layers=1, bias=True, batch_first=False, dropout=0.0, bidirectional=False, device=None, dtype=None)
```
- 类的参数解释：
  - input_size (int)：输入 $x$ 的特征数------------------------------------------ 其实就是 $x$ 的维度，即向量 $x$ 中的元素个数。
  - hidden_size (int)：隐藏状态 $h$ 的特征数---------------------------------- 其实就是 $h$ 的维度，即向量 $h$ 中的元素个数。
  - num_layers (int)：循环层数，默认为 1 ----------------------------------- 意味着将 num_layers 个 GRU 堆叠在一起，第二层 GRU 接收第一层 GRU 的隐层状态输出作为输入，并且计算最终结果。
  - bias (bool)：偏置设置项，默认为 True ---------------------------------- 如果为 False 则不使用偏置。
  - batch_first (bool)：输入输出格式设置项，默认为 False -------- 如果为 True 则用户需要按照 $batch\_size, ~seq\_len, ~input\_size)$ 来构造数据格式，默认是 $seq\_len, batch\_size, input\_size)$ 。
  - dropout ：神经元随机丢弃概率（ $0\sim1$ ），默认值：0 ------------- 如果非零，则在除最后一层之外的每个 GRU 层的输出上引入一个 Dropout 层，会在训练过程中随机丢弃一部分神经元（即将其输出置为零），dropout 的概率等于 dropout 参数指定的值。（num_layers>1时才可赋非零值）
  - bidirectional (bool)：双向 GRU 设置项，默认为 False --------- 如果为 True ，则变成双向 GRU 。
- 输入，的类型及形状（ $D$ 一般都为 $1$ ）（ input, h_0 ）：
  - input ：类型：tensor，形状： $L,N,H_{in})$ 或 $L,H_{in})$ ----------- 其中 $L$ 即 seq_len ， $N$ 即 batch_size （批量）， $H_{in}$ = input_size 。当 batch_first=True 时为 $N,L,H_{in})$ 。
  - h_0 ：类型：tensor，形状： $D*num\_layers,N,H_{out})$ 或 $D*num\_layers,H_{out})$ ---------------------
    
    其中 $H_{out}$ = hidden_size ，
    
    $D = 2 i f bi d i rec t i o na l = T r u e o t h er w i se 1$
    
    如果不提供就默认为 0 张量。
- 输出，的类型及形状（ $D$ 一般都为 $1$ ）（ output, h_n ）：
  - output：类型：tensor，形状： $L,N,D*H_{out})$ 或 $L,D*H_{out})$ ------------------------当 batch_first=True 时为 $N,L,D∗H_{out})$ 。
  - h_n ：类型：tensor，形状： $D*num\_layers,N,H_{out})$ 或 $D*num\_layers,H_{out})$ ，此输出代表了最后一个时刻的隐藏层状态输出。
- 其中的权重 $W 、 b$ 都是自动初始化、可自学习的。
其实输出 output 就是所有的隐层状态输出， h_n 就是最后一刻的隐层状态输出（ num_layers ＞1 的稍有不同）。

完整调用样例展示（和调用nn.RNN差不多）：

这里 nn.GRU 内部前向传播的 GRUCell 个数就是 seq_len 的值。

import torch

batch_size = 2
seq_len = 3
input_size = 4
hidden_size = 2
num_layers = 1

single_rnn = torch.nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)

# (seqLen, batchSize, inputSize)
inputs = torch.randn(seq_len, batch_size, input_size)
hidden = torch.zeros(num_layers, batch_size, hidden_size)
out, hidden = single_rnn(inputs, hidden)
print('output size:', out.shape)
print('output:', out)
print('Hidden size:', hidden.shape)
print('Hidden:', hidden)

-----------------------------------------------------------------------------------------------------------------------
# 输出结果为：
output size: torch.Size([3, 2, 2])
output: tensor([[[-0.2948, -0.4016],
         [-0.0554, -0.2445]],

        [[-0.4926, -0.5537],
         [-0.1792, -0.4067]],

        [[-0.0627, -0.2873],
         [-0.2236, -0.5261]]], grad_fn=<StackBackward0>)
Hidden size: torch.Size([1, 2, 2])
Hidden: tensor([[[-0.0627, -0.2873],
         [-0.2236, -0.5261]]], grad_fn=<StackBackward0>)

2）单值序列预测实例

在训练之前制作数据集时，通常是用前 m 个数据预测第 m+1 个数据，第 m+1 个数据作为真实值，前 m 个数据作为输入得出的一个结果作为预测值（这 m+1 个数据就作为一个样本）。如果 batch_size 不为 1 ，则 batch_size 个样本就作为一次 epoch 的输入。

数据集：某国际航班每月乘客量变化表，international-airline-passengers.csv，CSDN 都有提供下载的，下载之后不要改动。

目标：拿 12 个月的数据来预测下一个月的客流量。

完整代码（使用 nn.GRU ）：

import torch
import torch.nn as nn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from torch.utils.data import TensorDataset, DataLoader

# 1. 数据预处理
data = pd.read_csv('international-airline-passengers.csv', usecols=['Month', 'Passengers'])
data['Month'] = pd.to_datetime(data['Month'])
data.set_index('Month', inplace=True)

# 2. 数据集划分
train_size = int(len(data) * 0.8)
train_data = data[:train_size]
test_data = data[train_size:]

# 3. 归一化处理
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train_data)
test_scaled = scaler.transform(test_data)


# 4. 创建滑动窗口数据集
def create_sliding_windows(data, window_size):
    X, Y = [], []
    for i in range(len(data) - window_size):
        X.append(data[i:i + window_size])
        Y.append(data[i + window_size])
    return np.array(X), np.array(Y)


window_size = 12
X_train, y_train = create_sliding_windows(train_scaled, window_size)
X_test, y_test = create_sliding_windows(test_scaled, window_size)

# 转换为PyTorch张量 (batch_size, seq_len, features)
X_train = torch.FloatTensor(X_train).unsqueeze(-1)  # [samples, seq_len, 1]
X_train = X_train.squeeze(2) if X_train.dim() == 4 else X_train  # 消除多余维度
y_train = torch.FloatTensor(y_train)
X_test = torch.FloatTensor(X_test).unsqueeze(-1)
X_test = X_test.squeeze(2) if X_test.dim() == 4 else X_test
y_test = torch.FloatTensor(y_test)


# 5. 构建LSTM模型
class AirlinePassengerModel(nn.Module):
    def __init__(self, input_size=1, hidden_size=50, output_size=1):
        super().__init__()
        self.gru = nn.GRU(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=1,  # 显式指定层数
            batch_first=True
        )
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.gru(x)
        out = self.fc(out[:, -1, :])  # 取最后一个时间步输出
        return out


model = AirlinePassengerModel()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

dummy_input = torch.randn(1, window_size, 1)

# 6. 训练模型
train_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=32, shuffle=True)

epochs = 500
train_loss, val_loss = [], []

for epoch in range(epochs):
    model.train()
    batch_loss = 0
    for X_batch, y_batch in train_loader:
        optimizer.zero_grad()
        y_pred = model(X_batch)
        loss = criterion(y_pred, y_batch)
        loss.backward()
        optimizer.step()
        batch_loss += loss.item()
    train_loss.append(batch_loss / len(train_loader))

    # 验证步骤
    model.eval()
    with torch.no_grad():
        y_val_pred = model(X_test)
        loss = criterion(y_val_pred, y_test)
        val_loss.append(loss.item())

    print(f'Epoch {epoch + 1}/{epochs} | Train Loss: {train_loss[-1]:.4f} | Val Loss: {val_loss[-1]:.4f}')

# 7. 预测与逆归一化
model.eval()
with torch.no_grad():
    train_pred = model(X_train).numpy()
    test_pred = model(X_test).numpy()

# 逆归一化处理
train_pred = scaler.inverse_transform(train_pred)
y_train = scaler.inverse_transform(y_train.numpy().reshape(-1, 1))
test_pred = scaler.inverse_transform(test_pred)
y_test = scaler.inverse_transform(y_test.numpy().reshape(-1, 1))

# 8. 可视化
# 训练损失曲线可视化
plt.figure(figsize=(12, 5))
plt.plot(range(1, len(train_loss) + 1), train_loss, 'b-', label='Train Loss')
plt.plot(range(1, len(val_loss) + 1), val_loss, 'r--', label='Validation Loss')
plt.title('Training Process Monitoring\n(2025-03-11)', fontsize=14)
plt.xlabel('Epochs', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.xticks(np.arange(0, len(train_loss) + 1, 10))
plt.grid(True, linestyle='--', alpha=0.7)
plt.legend()
plt.tight_layout()
plt.show()

# 综合预测结果可视化
plt.figure(figsize=(14, 6))

# 原始数据曲线
plt.plot(data.index, data['Passengers'],
         label='Original Data',
         color='gray',
         alpha=0.4)

# 训练集预测曲线（需注意时间对齐）
train_pred_dates = train_data.index[window_size:train_size]
plt.plot(train_pred_dates, train_pred,
         label='Train Predictions',
         color='blue',
         linestyle='--')

# 测试集预测曲线
test_pred_dates = test_data.index[window_size:]
plt.plot(test_pred_dates, test_pred,
         label='Test Predictions',
         color='red',
         linewidth=2)

# 格式设置
plt.title('Time Series Prediction Results', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Passengers', fontsize=12)
plt.legend(loc='upper left')
plt.grid(True, linestyle=':', alpha=0.5)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

输出训练结果：

Epoch 1/500 | Train Loss: 0.3435 | Val Loss: 1.1062
Epoch 2/500 | Train Loss: 0.2473 | Val Loss: 0.9233
Epoch 3/500 | Train Loss: 0.1949 | Val Loss: 0.7438
Epoch 4/500 | Train Loss: 0.1224 | Val Loss: 0.5649
Epoch 5/500 | Train Loss: 0.0720 | Val Loss: 0.3903
...
...
Epoch 495/500 | Train Loss: 0.0018 | Val Loss: 0.0162
Epoch 496/500 | Train Loss: 0.0020 | Val Loss: 0.0104
Epoch 497/500 | Train Loss: 0.0016 | Val Loss: 0.0101
Epoch 498/500 | Train Loss: 0.0012 | Val Loss: 0.0132
Epoch 499/500 | Train Loss: 0.0011 | Val Loss: 0.0129
Epoch 500/500 | Train Loss: 0.0013 | Val Loss: 0.0141

训练及预测可视化：

损失曲线：

在这里插入图片描述

训练及预测情况：

在这里插入图片描述

各类神经网络学习：（八）GRU 门控循环单元（下集），pytorch 版详细代码编写说明

代码详解

1）pytorch版模块调用

①nn.GRUCell（单步GRU）

②nn.GRU（重点）

2）单值序列预测实例

网站公告

今日签到

热门文章

最新发布