CNN和LSTM的计算复杂度分析

发布于:2025-03-28 ⋅ 阅读:(28) ⋅ 点赞:(0)

前言:今天做边缘计算的时候,在评估模型性能的时候发现NPU计算的大部分时间都花在了LSTM上,使用的是Bi-LSTM(耗时占比98%),CNN耗时很短,不禁会思考为什么LSTM会花费这么久时间。

 首先声明一下实验条件:这里使用的是振动信号,输入的数据,长度是1024,通道是1通道输入,batchsize也是1

一、CNN计算复杂度公式:

卷积核大小为 K x K,输入通道数为 C_in,输出通道数为 C_out,输入大小为 W x H

卷积操作的复杂度: O(K*K * C_in * C_out * W * H)

举个例子:我的第一个卷积层input:1channel,output:32channels,卷积核大小是1*3,为了保持输入数据长度和输出数据长度保持不变,padding=(k-1)/2=1

输入数据格式:1*1*1024(batchsize、channel、len)

输入数据格式: 1*32*1024

计算复杂度:1*32*3*1024

二、LSTM计算复杂度公式:

假设 LSTM 的隐藏层大小为 H,输入大小为 I,时间步数为 T

每个时间步的计算复杂度为 O(I * H + H^2)(包括矩阵乘法和激活函数)。

LSTM计算复杂度为 O(T * (I * H + H*H))

举个例子:输入大小是指上一层CNN输出的通道数128,隐藏层大小设置为128,时间步数就是数据长度:128

复杂度为:128*(128*128+128*128)=4194304

计算比例:4194304%(32*3*1024)=43%

因为这个是双层lstm:43*2=86符合预期,在实际计算中LSTM花费的时间更长,我估计是NPU对CNN结构的计算优化更好吧,下面是网络的完整结构

Layer: CNN_LSTM_Model
  Input shapes: [torch.Size([32, 1, 1024])]
  Output shape: torch.Size([32, 10])
Layer: Conv1d
  Input shapes: [torch.Size([32, 1, 1024])]
  Output shape: torch.Size([32, 32, 1024])
Layer: ReLU
  Input shapes: [torch.Size([32, 32, 1024])]
  Output shape: torch.Size([32, 32, 1024])
Layer: Conv1d
  Input shapes: [torch.Size([32, 32, 1024])]
  Output shape: torch.Size([32, 32, 1024])
Layer: ReLU
  Input shapes: [torch.Size([32, 32, 1024])]
  Output shape: torch.Size([32, 32, 1024])
Layer: MaxPool1d
  Input shapes: [torch.Size([32, 32, 1024])]
  Output shape: torch.Size([32, 32, 512])
Layer: Conv1d
  Input shapes: [torch.Size([32, 32, 512])]
  Output shape: torch.Size([32, 64, 512])
Layer: ReLU
  Input shapes: [torch.Size([32, 64, 512])]
  Output shape: torch.Size([32, 64, 512])
Layer: MaxPool1d
  Input shapes: [torch.Size([32, 64, 512])]
  Output shape: torch.Size([32, 64, 256])
Layer: Conv1d
  Input shapes: [torch.Size([32, 64, 256])]
  Output shape: torch.Size([32, 128, 256])
Layer: ReLU
  Input shapes: [torch.Size([32, 128, 256])]
  Output shape: torch.Size([32, 128, 256])
Layer: MaxPool1d
  Input shapes: [torch.Size([32, 128, 256])]
  Output shape: torch.Size([32, 128, 128])
Layer: Sequential
  Input shapes: [torch.Size([32, 1, 1024])]
  Output shape: torch.Size([32, 128, 128])
Layer: LSTM
  Input shapes: [torch.Size([32, 128, 128]), <class 'tuple'>]
  Output shapes: [torch.Size([32, 128, 256]), <class 'tuple'>]
Layer: Linear
  Input shapes: [torch.Size([32, 128, 256])]
  Output shape: torch.Size([32, 128, 256])
Layer: Attention
  Input shapes: [torch.Size([32, 128]), torch.Size([32, 128, 256])]
  Output shape: torch.Size([32, 1, 128])
Layer: LayerNorm
  Input shapes: [torch.Size([32, 256])]
  Output shape: torch.Size([32, 256])
Layer: ResidualConnection
  Input shapes: [torch.Size([32, 256]), <class 'function'>]
  Output shape: torch.Size([32, 256])
Layer: Linear
  Input shapes: [torch.Size([32, 256])]
  Output shape: torch.Size([32, 500])
Layer: ReLU
  Input shapes: [torch.Size([32, 500])]
  Output shape: torch.Size([32, 500])
Layer: Dropout
  Input shapes: [torch.Size([32, 500])]
  Output shape: torch.Size([32, 500])
Layer: Linear
  Input shapes: [torch.Size([32, 500])]
  Output shape: torch.Size([32, 10])
Layer: Sequential
  Input shapes: [torch.Size([32, 256])]
  Output shape: torch.Size([32, 10])