2025.3.24-2025.3.30学习周报-EW帮帮网

摘要

在本周阅读的论文中，作者提出了一种改进的双向长短期记忆网络（BiLSTM）与双阶段注意力机制相结合的新型水质预测模型。该模型通过创新的架构设计，将水质监测数据按时间尺度分解为短、中、长三个子模型，分别捕捉不同时间跨度的特征，确保预测结果的全面性和适应性。同时，模型利用BiLSTM的正向和反向处理机制，全面理解时序数据中的上下文依赖关系，并通过引入双阶段注意力机制动态聚焦于关键特征和时间点，有效减少噪声和冗余信息的干扰，增强对长期和短期依赖的捕捉能力。该模型在多时间尺度建模方面表现出色，能够同时捕捉短期突发事件、中期规律性波动和长期趋势。同时，模型具有高普适性，适用于多种水质指标的预测，并在复杂水质波动场景下展现出较高的稳定性，为河流管理和生态保护提供了可靠的技术支持。

abstract

In the paper read this week, the author proposed a novel water quality prediction model that combines an improved bidirectional long short-term memory network (BiLSTM) with a two-stage attention mechanism. This model uses innovative architecture design to decompose water quality monitoring data into three sub models: short, medium, and long, based on time scales, capturing features of different time spans to ensure the comprehensiveness and adaptability of prediction results. At the same time, the model utilizes the forward and backward processing mechanisms of BiLSTM to comprehensively understand the contextual dependencies in time-series data, and dynamically focuses on key features and time points by introducing a two-stage attention mechanism, effectively reducing the interference of noise and redundant information, and enhancing the ability to capture long-term and short-term dependencies. This model performs well in modeling at multiple time scales, capturing short-term emergencies, medium-term regular fluctuations, and long-term trends simultaneously. At the same time, the model has high universality and is suitable for predicting various water quality indicators. It also demonstrates high stability in complex water quality fluctuation scenarios, providing reliable technical support for river management and ecological protection.

1. 文献阅读

本周阅读了一篇名为Analysis of River Management Method Based on Improved Bidirectional Long Short‑Term Memory Network for Water Quality Prediction
论文地址：添加链接描述
在这里插入图片描述
论文提出了一种改进的Bi-LSTM模型，并通过结合双阶段注意力机制（Dual-Stage Attention, DA）优化了关键特征的提取。模型旨在解决水质监测数据中复杂的时序特性、非线性关系以及长期依赖问题。

1.1 Dual-Stage Attention

论文作者通过融入Dual-Stage Attention从而优化了关键特征的提取，Dual-Stage Attention最先在一篇名为ADual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction的论文中提出，论文地址：https://arxiv.org/abs/1704.02971。
在这里插入图片描述
在论文中，作者通过将Dual-Stage Attention‌与LSTM相结合来提升LSTM模型的性能和准确性。其模型结构如下图所示：

其思想是模型通过在输入和输出前分别引入注意力机制以提高模型的性能和准确性。在第一阶段，模型通过输入注意力机制计算每个时间步的注意力权重，从而提取与输出相关的特征。模型会根据输入特征的重要性分配不同的权重，使得LSTM能够更关注那些对预测结果影响较大的特征。在第二阶段，模型通过时间注意力机制选择所有时间步中的相关特征。从而帮助模型在处理长序列数据时，选择性地关注那些对当前预测最重要的历史信息。
第一阶段的注意力机制实现过程如下：
在这里插入图片描述
其中h_t为encoder在时刻t的hidden state，第一阶段，使用当前时刻的输人x_t,以及上一个时刻编码器的hidden state h_t-1,来计算当前时刻编码器的hidden state h_t。更新公式可写为:h_t=f₁(h_t-1,x_t)，f₁为一个非线性激活函数。
为了自适应地选取相关feature（即给每一个特征赋予一定的权重）, 作者在此处引入了注意力机制。即对每个时刻的输入x_t为其中的每个影响因子赋予一定的注意力权重α^k_t,其衡量了时刻t的第k个feature的重要性。α^k_t由以下计算得到：
首先通过下述公式计算得到e^k_t：
在这里插入图片描述
该式即把第k个特征与前一时刻的h_t-1和s_t-1线性组合后再用tanh激活得到，其中的V,W,U为需要学习的参数。计算得到e^k_t后再用softmax进行归一化后得到α^k_t。
其过程可以概括如下：

第一阶段的attention机制，使得编码器能够关注其中输入特征中重要的特征，而不是对所有特征一视同仁，这也是所有attention的本质作用。
编码阶段的代码实现如下所示：

def _Encode(self, encode_input=None):  # encode_input: [-1, time_step, input_dim]
        x_k = tf.transpose(encode_input, perm=[0, 2, 1], name='Series_of_length_TIME_STEP')#[-1,input_dim,time_step]
        encode_time_step_hidden = []
        for t in range(encode_input.get_shape()[1]):  # [t < time_step]
            e_t = self._attention_layer(_h_t_1 = self.encode_hidden,
                                        _s_t_1 = self.encode_cell,
                                        _x_k   = x_k,
                                        _We    = self._weights['Input_attention_layer_We'],
                                        _Ue    = self._weights['Input_attention_layer_Ue'],
                                        _Ve    = self._weights['Input_attention_layer_Ve'], )
            a_t = tf.nn.softmax(e_t)  # [-1, input_dim]
            tmp = tf.reshape(encode_input[:, t, :], shape=[-1, encode_input.get_shape().as_list()[-1]])
            x_t = tf.multiply(a_t, tmp)
            (self.encode_cell, self.encode_hidden) = self._LSTMCell(h_t_1 = self.encode_hidden,
                                                                    s_t_1 = self.encode_cell,
                                                                    x_t   = x_t,
                                                                    name  = 'Encode_LSTMCell')
            encode_time_step_hidden.append(self.encode_hidden)
        return tf.reshape(tf.stack(encode_time_step_hidden), [-1, self.TIME_STEP, self.DECODE_CELL])

第二阶段temporal Attention机制实现过程如下：
在这里插入图片描述
第二阶段的解码器注意力机制设计类似于传统的seq2seq中的Attention机制，也就是第二阶段temporal attention的机制其实就是传统Attention的机制。其过程可以概括如下：

解码阶段的代码实现如下所示：

 def _Decode(self, decode_input=None, y_t=None):
        for t in range(decode_input.get_shape()[1]-1):
            l_t = self._attention_layer(_h_t_1 = self.decode_hidden,
                                        _s_t_1 = self.decode_cell,
                                        _x_k   = decode_input,
                                        _We    = self._weights['Temporal_attention_layer_Wd'],
                                        _Ue    = self._weights['Temporal_attention_layer_Ud'],
                                        _Ve    = self._weights['Temporal_attention_layer_Vd'], )
            b_t = tf.reshape(tf.nn.softmax(l_t), shape=[-1, decode_input.get_shape().as_list()[1], 1])  # [-1, time_step, 1]
            c_t = tf.reduce_sum(tf.multiply(b_t, decode_input), axis=1)  # [-1, time_step, 1]*[-1, time_step, cell_dim]
                                                                         # ---> [-1, time_step, cell_dim]-->[-1, cell_dim]
            y_t_ = self._Dense(_input       = tf.concat([c_t, tf.reshape(y_t[:, t], [-1, 1])], axis=1),
                               _weights     = self._weights['Decode_layer_yt_weights'],
                               _bias        = self._weights['Decode_layer_yt_bias'],
                               _activation  = None,
                               _dtype       = tf.float32,
                               _is_bias     = True, )
            (self.decode_cell, self.decode_hidden) = self._LSTMCell(h_t_1 = self.decode_hidden,
                                                                    s_t_1 = self.decode_cell,
                                                                    x_t   = y_t_,
                                                                    name  = 'Decode_LSTMCell')
        pre_y_ = self._Dense(_input       = tf.concat([self.decode_hidden, self.decode_cell], axis=1),
                             _weights     = self._weights['Decode_layer_output_1_weights'],
                             _bias        = self._weights['Decode_layer_output_1_bias'],
                             _activation  = None,
                             _dtype       = tf.float32,
                             _is_bias     = True, )
        pre_y = self._Dense(_input       = pre_y_,
                            _weights     = self._weights['Decode_layer_output_2_weights'],
                            _bias        = self._weights['Decode_layer_output_2_bias'],
                            _activation  = None,
                            _dtype       = tf.float32,
                            _is_bias     = True, )
        return pre_y

1.2 总体结构

论文提出的模型其总体架构如下所示：
在这里插入图片描述
模型的第一个创新点在于多时间尺度子模型融合，模型将水质监测数据按时间尺度分为三个子模型，分别处理不同时间跨度的特征：
短间隔子模型： 专注于捕捉水质数据的短期趋势和特征，例如突发的污染事件或降雨后的水质变化。
中间隔子模型： 分析中等时间尺度的变化（如周到月的趋势），能够捕捉规律性波动和季节性特征。
长间隔子模型： 关注长期趋势，识别长期污染事件及其对水质的持续影响。
从三个子模型中提取的特征被输入到 BiLSTM模型中。通过LSTM 前向层和后向层后，得到最终输出。再将三个子模型的输出通过加权融合得到最终预测值。融合公式如下所示：
在这里插入图片描述

为了克服BiLSTM在处理噪声、冗余信息和长期序列时的局限性，模型引入了 Dual-Stage Attention，其结构如下所示：
在这里插入图片描述
其中主要是包含feature attention和time attention两步，这两个部分的工作分别就对应 Dual-Stage Attention的两个attention，在第一小节中已经做了清楚的解释在此不做过多讲解。
这两个部分的attention在论文中描述的工作流程如下所示：
feature attention：
在这里插入图片描述
其作用是为每个输入特征分配权重，突出对预测重要的特征，减少冗余信息干扰。
time attention：

其作用是动态调整时间序列中各时间点的权重，捕捉长期和短期依赖关系。通过突出关键时间步，提升模型对水质变化趋势的敏感性。

1.3 实验分析

（1）数据集
实验使用某河流2021年6月至2022年12月的监测数据，2021年6月至2022年6月的水质数据作为训练集。2022年7月至2022年12月的水质数据作为测试集。除此还按照时间尺度把数据分为短期、中期、长期三个尺度。
（2）评估标准
平均绝对误差（MAE）：衡量预测值与实际值的平均偏差。
平均绝对百分比误差（MAPE）：反映预测的相对误差。
均方误差（MSE）：强调较大误差的影响。
均方根误差（RMSE）：综合衡量预测误差的大小。
实验将传统LSTM、IPSO-SVR和CNN-LSTM作为基线进行比较。
（3）实验结果
实验主要预测了溶解氧(D0)、化学需氧量(coD)、氢电位(pH)和NH3-N。
pH值预测结果如下图所示：
在这里插入图片描述
DO浓度预测的损失函数如下所示：

COD预测的损失函数结果如下：
实验表明，提出的DA-BiLSTM模型通过将水质数据分解为短、中、长期三个子模型，能够有效捕捉不同时间尺度的变化特征。这种分层设计使其在短期突发事件（如降雨或污染排放）、中期规律性波动（如季节性变化）以及长期趋势（如生态演变）预测中均表现出色。
为了进一步验证所设计的模型，作者通过将DA-BILSTM与另三种基线模型进行了比较和分析。实验选择了河流的A、B两个不太部分进行NH3-N的预测。
A 点的选择侧重于短期数据，主要反映降雨和污水排放等意外事件对水质的影响。它代表了及时的监测和响应，可以有效地捕捉快速变化中水质的动态特征，实验结果如下：
在这里插入图片描述
结果显示改进后的模型具有较强的预测能力，能够有效捕捉NH3-N 浓度的变化特征。这为WQ监测和管理提供了有效的工具，在实际应用中可以更准确地反映WQ状态。
B点侧重于中长期数据，反映环境条件的季节性波动和逐渐变化，以识别水生态演变和长期污染趋势为代表，预测结果如下：
在这里插入图片描述
由结果可知，改进后的模型能较好地识别NH3-N的数据分布特征，对WQ波动较大的河段具有更稳定的预测结果。
其他未展示的实验结果汇总如下图所示：

1.4 论文复现

相下面是实验模型的全部源码，完成对NH3-H的预测复现，除了模型的相关主体代码还有数据集数据加载以及模型训练等的相关代码，在此不过多赘述。

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import KFold
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

# 定义特征注意力机制
class FeatureAttention(nn.Module):
    def __init__(self, hidden_dim):
        super(FeatureAttention, self).__init__()
        self.attention = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, 1)
        )
    
    def forward(self, lstm_output):
        attention_scores = self.attention(lstm_output)
        attention_weights = torch.softmax(attention_scores, dim=1)
        weighted_output = lstm_output * attention_weights
        return torch.sum(weighted_output, dim=1)

# 定义时间注意力机制
class TimeAttention(nn.Module):
    def __init__(self, hidden_dim):
        super(TimeAttention, self).__init__()
        self.attention = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, 1)
        )
    
    def forward(self, feature_attention_output):
        attention_scores = self.attention(feature_attention_output)
        attention_weights = torch.softmax(attention_scores, dim=0)
        weighted_output = feature_attention_output * attention_weights
        return weighted_output

# 定义子模型
class SubModel(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(SubModel, self).__init__()
        self.bilstm = nn.LSTM(input_dim, hidden_dim, bidirectional=True, batch_first=True)
        self.feature_attention = FeatureAttention(hidden_dim * 2)
        self.time_attention = TimeAttention(hidden_dim * 2)
        self.fc = nn.Linear(hidden_dim * 2, 16)  # 减小输出维度
        self.dropout = nn.Dropout(0.4)
        self.residual = nn.Linear(input_dim, 16)  # 残差连接
    
    def forward(self, x):
        lstm_output, _ = self.bilstm(x)
        feature_att_output = self.feature_attention(lstm_output)
        time_att_output = self.time_attention(feature_att_output)
        output = self.fc(time_att_output)
        residual = self.residual(x[:, -1, :])  # 使用最后一时间步
        return self.dropout(output + residual)

# 定义主模型
class MainModel(nn.Module):
    def __init__(self, short_input_dim, medium_input_dim, long_input_dim, hidden_dim):
        super(MainModel, self).__init__()
        self.short_model = SubModel(short_input_dim, hidden_dim)
        self.medium_model = SubModel(medium_input_dim, hidden_dim)
        self.long_model = SubModel(long_input_dim, hidden_dim)
        self.fc = nn.Linear(16 * 3, 32)  # 调整融合层
        self.dropout = nn.Dropout(0.5)
        self.output_layer = nn.Linear(32, 1)
    
    def forward(self, short_x, medium_x, long_x):
        short_output = self.short_model(short_x)
        medium_output = self.medium_model(medium_x)
        long_output = self.long_model(long_x)
        concatenated = torch.cat((short_output, medium_output, long_output), dim=1)
        fused = self.fc(concatenated)
        fused = self.dropout(fused)
        output = self.output_layer(fused)
        return output

# 加载数据
def load_data():
    short_data = pd.read_csv('short_data_new.csv').values.reshape(-1, 10, 5)
    medium_data = pd.read_csv('medium_data_new.csv').values.reshape(-1, 30, 5)
    long_data = pd.read_csv('long_data_new.csv').values.reshape(-1, 90, 5)
    labels = pd.read_csv('labels_new.csv').values
    
    short_data = torch.tensor(short_data, dtype=torch.float32)
    medium_data = torch.tensor(medium_data, dtype=torch.float32)
    long_data = torch.tensor(long_data, dtype=torch.float32)
    labels = torch.tensor(labels, dtype=torch.float32)
    
    return short_data, medium_data, long_data, labels

# 主函数
def main():
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f'使用设备: {device}')

    # 加载数据
    short_data, medium_data, long_data, labels = load_data()
    dataset = TensorDataset(short_data, medium_data, long_data, labels)

    # K 折交叉验证
    kfold = KFold(n_splits=3, shuffle=True, random_state=42)
    num_epochs = 300
    best_mae, best_mse, best_rmse = float('inf'), float('inf'), float('inf')
    best_predictions, best_true_labels = None, None

    for fold, (train_idx, val_idx) in enumerate(kfold.split(dataset)):
        print(f'\nFold {fold + 1}/3')
        train_sub = torch.utils.data.Subset(dataset, train_idx)
        val_sub = torch.utils.data.Subset(dataset, val_idx)
        train_loader = DataLoader(train_sub, batch_size=64, shuffle=True)
        val_loader = DataLoader(val_sub, batch_size=64)

        # 创建模型
        model = MainModel(short_input_dim=5, medium_input_dim=5, long_input_dim=5, hidden_dim=32).to(device)
        optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)  # 提高正则化
        scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_epochs)
        criterion = nn.MSELoss()

        # 训练模型
        train_losses = []
        val_losses = []
        best_val_loss = float('inf')

        for epoch in range(num_epochs):
            model.train()
            train_loss = 0
            for batch in train_loader:
                short_x, medium_x, long_x, y = [b.to(device) for b in batch]
                optimizer.zero_grad()
                output = model(short_x, medium_x, long_x)
                loss = criterion(output, y)
                loss.backward()
                optimizer.step()
                train_loss += loss.item()
            train_loss /= len(train_loader)
            train_losses.append(train_loss)

            model.eval()
            val_loss = 0
            val_preds, val_true = [], []
            with torch.no_grad():
                for batch in val_loader:
                    short_x, medium_x, long_x, y = [b.to(device) for b in batch]
                    output = model(short_x, medium_x, long_x)
                    val_loss += criterion(output, y).item()
                    val_preds.append(output.cpu().numpy())
                    val_true.append(y.cpu().numpy())
            val_loss /= len(val_loader)
            val_losses.append(val_loss)

            scheduler.step()
            print(f'Epoch {epoch+1}, Train Loss: {train_loss:.6f}, Val Loss: {val_loss:.6f}, LR: {optimizer.param_groups[0]["lr"]:.6f}')

            if val_loss < best_val_loss:
                best_val_loss = val_loss
                torch.save(model.state_dict(), f'best_model_fold{fold}.pth')

        # 评估当前折的验证集
        model.load_state_dict(torch.load(f'best_model_fold{fold}.pth', weights_only=True))
        model.eval()
        val_preds = np.concatenate(val_preds)
        val_true = np.concatenate(val_true)
        mae = mean_absolute_error(val_true, val_preds)
        mse = mean_squared_error(val_true, val_preds)
        rmse = np.sqrt(mse)
        print(f'Fold {fold + 1} 验证集评估指标: MAE: {mae:.4f}, MSE: {mse:.4f}, RMSE: {rmse:.4f}')

        if mae < best_mae:
            best_mae, best_mse, best_rmse = mae, mse, rmse
            with torch.no_grad():
                predictions = model(short_data.to(device), medium_data.to(device), long_data.to(device)).cpu().numpy()
                best_predictions, best_true_labels = predictions, labels.numpy()

    # 输出最佳结果
    print(f'\n最佳验证集评估指标:')
    print(f'MAE: {best_mae:.4f}')
    print(f'MSE: {best_mse:.4f}')
    print(f'RMSE: {best_rmse:.4f}')

    # 绘制损失曲线（最后一折）
    plt.figure(figsize=(10, 5), dpi=300)
    plt.plot(train_losses, label='Train Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.title('Training and Validation Loss (Last Fold)', fontsize=16)
    plt.xlabel('Epoch', fontsize=12)
    plt.ylabel('Loss', fontsize=12)
    plt.legend(fontsize=10)
    plt.grid(True)
    plt.savefig('loss_curve.png', dpi=300, bbox_inches='tight')
    plt.show(block=True)

    # 绘制预测结果
    plt.figure(figsize=(12, 6), dpi=300)
    plt.plot(best_true_labels[:100], label='真实值', linewidth=1.5)
    plt.plot(best_predictions[:100], label='预测值', linewidth=1.5)
    plt.title('真实值 vs 预测值（前 100 个样本）', fontsize=16)
    plt.xlabel('样本', fontsize=12)
    plt.ylabel('值', fontsize=12)
    plt.legend(fontsize=10)
    plt.grid(True)
    plt.tight_layout()
    plt.savefig('prediction_comparison.png', dpi=300, bbox_inches='tight')
    plt.show(block=True)

if __name__ == '__main__':
    main()