卷积神经网络项目实现
关于项目实现的文档说明书,三个要素:数据、模型、训练
1、项目简介。
1.1 项目名称
基于CNN实现扑克牌花色的小颗粒度分类
1.2 项目简介
该项目旨在通过卷积神经网络(CNN)实现扑克的小颗粒度分类,主要针对扑克的不同花色进行细粒度的视觉识别与分类。花色种类繁多,且差异较小,因此传统的图像分类方法难以满足精确识别的需求。通过引入深度学习中的CNN模型,本项目将构建一个专门用于扑克花色的小颗粒度分类的网络架构,充分利用卷积层的局部特征提取能力以及深层网络的高阶特征融合能力。项目将包括数据预处理、特征提取、模型训练与评估等步骤,使用扑克图像数据集进行模型训练,并通过精细的特征学习提升分类精度。此外,项目还将探索数据增强、迁移学习等技术手段,以提高模型的泛化能力和鲁棒性,最终实现高精度的小颗粒度扑克花色分类。
2、数据
公开数据源加上自己爬取的数据源
2.1 公开数据集
推荐:https://www.kaggle.com/datasets 里面去找
如果实在找不着就百度。
2.2 自己爬取
使用request模块
比如:https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=&st=-1&fm=index&fr=&hs=0&xthttps=111110&sf=1&fmq=&pv=&ic=0&nc=1&z=&se=&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E4%B8%9B%E6%9E%97%E9%BA%BB%E9%9B%80
进行数据清洗和整理。
2.3 数据增强
提升模型的泛化能力和鲁棒性。
transform = transforms.Compose(
[
transforms.RandomHorizontalFlip(), # 随机水平翻转
transforms.RandomRotation(10), # 随机旋转 ±10 度
transforms.RandomResizedCrop(
32, scale=(0.8, 1.0)
), # 随机裁剪到 32x32,缩放比例在0.8到1.0之间
transforms.ColorJitter(
brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1
), # 随机调整亮度、对比度、饱和度、色调
transforms.ToTensor(), # 转换为 Tensor
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), # 归一化
]
)
数据增强在为了防止过拟合方面的作用非常明显,可以明显增强模型的泛化能力和鲁棒性。但是如果在训练初期就施加过多的数据增强或者是一直使用数据增强的缺点也非常的明显,只要稍微多一点或者是加入一些类似于噪声和旋转这样的操作,模型的训练速度也会掉的非常快,非常的影响性能,所以最好的方式是先基础训练后在二次训练加上数据增强,然后再让数据增强在最后阶段自己关闭使模型更好的拟合但又不过度拟合。
2.4 数据分割
数据分割的话需要sklearn的train_test_split然后用循环命名,引入os来编写路径对分割的数据进行保存。
大致的代码类似于:
from sklearn.model_selection import train_test_split
import os
from PIL import Image
# 假设您的数据集存储在'dataset'目录下,每个类别有自己的子目录
dataset_dir = 'dataset'
categories = os.listdir(dataset_dir)
# 初始化训练集和测试集的列表
X_train = []
y_train = []
X_test = []
y_test = []
# 遍历每个类别
for category in categories:
category_dir = os.path.join(dataset_dir, category)
images = os.listdir(category_dir)
# 遍历类别中的每张图片
for image_name in images:
image_path = os.path.join(category_dir, image_name)
image = Image.open(image_path)
# 将图片添加到训练集或测试集中
X_train.append(image)
y_train.append(category)
# 根据您的需求,您可以选择在这里划分测试集
# 例如,每四张图片中选择一张作为测试集
if len(X_train) % 4 == 0:
X_test.append(image)
y_test.append(category)
# 从训练集中移除这张图片
X_train.pop()
y_train.pop()
# 如果您想要随机划分25%作为测试集,可以使用train_test_split
X_train_full, X_test, y_train_full, y_test = train_test_split(X_train, y_train, test_size=0.25, random_state=42)
# 此时,X_train_full和y_train_full是完整的训练集,X_test和y_test是测试集
如果只是想分离一个数据集的话的就是直接用工具对train进行分割。
3. 神经网络
本次实验使用的神经网络是ResNet34,最早期我尝试过ResNet18,但是当我没有引进通道注意力之前,使用该网络生成的模型始终存在过拟合的状况所以我选择了更深的网络,并加入了数据增强。
而对于网络层参数来说,我们是使用了预加载模型进行迁移学习,但是我没有选择对其网络进行任何的冻结操作,虽然冻结会使训练的速度更快,但是因为它能够自动调整的参数太少了,所以准确度始终是非常低的。
from torchvision.models import resnet34,ResNet34_Weights
def pretrained():
pre_model = resnet34(
weights = ResNet34_Weights.IMAGENET1K_V1
)
torch.save(pre_model.state_dict(),os.path.join(weight_path,'res34.pth'))
# 先对预加载的参数进行保存
net = resnet34(weights=None)
# 加载我参数的空结构网络
# net.conv1 = DepthwiseSeparableConv2d(3, 64, kernel_size=7, stride=2, padding=3)
net.fc = nn.Linear(
in_features=pre_model.fc.in_features,
out_features=53,
bias=True,
)
# 更改其最后的线性层,为我们的输出特征数量更改参数
state_dic = torch.load(os.path.join(weight_path,'res34.pth'))
state_dic.pop('fc.weight')
state_dic.pop('fc.bias')
#state_dic.pop('conv1.weight')
# 直接放弃预训练参数的fc层参数,因为参数数量不对,无法使用
net.load_state_dict(state_dic,strict=False)
# 这里的strict是直接忽略没有参数导致的错误,在再次使用此网络时,没有参数的位置会自动进行系统设置的初始化
# print(net)
return net
这里是只加载了深度可分离卷积以增加训练速度的,但是深度可分离卷积的定义在后面模型优化时再说。
4. 模型训练
和训练相关的操作
4.1 训练参数
轮次:ecpochs = 100
批次:batch_size=128
学习率:lr=1e-2
可以在二次训练时直接更改学习率,虽然Adam自带学习率优化效果,但是降低学习率会让训练更快接近上次训练最后一次更新的学习率。
4.2 损失函数
# 交叉熵损失
loss_fn = nn.CrossEntropyLoss(reduction='mean')
# 使用sum的话可能会导致梯度崩溃,所以直接选择mean
# 而选择交叉熵损失的原因非常简单,因为我们做的是53分类,多分类使用交叉熵损失自然是非常好选择
4.3 优化器
使用动量优化器
optim.Adam()
4.4 训练过程可视化
因为我是在表格内添加的图像,但是转格式的时候无法适配。所以直接放图。
这是tensorboard中的损失和准确率,这里我取的是学习率0.01,轮次100的一次训练过程。
下面是数据增强的效果
网络结构:
本次使用的ResNet34,主要层结构于18类似,但是在每一个layer中有所增添。
为了提高运算速度,我选择了在conv1也就是数据输入后的一层添加了深度可分离卷积去减少它的数据量来加快模型的训练速度。而后就是正常的更改它的线性层输出特征,总所周知,它的线性层正常的out_features是1000而我们需要将它的值改为我们需要的分类数,然后就是预加载的权重于偏置清空和忽略,上面也讲过这个问题,不清空的话就需要自己动手去进行初始化,如果不是特殊要求直接清空就好
5. 模型验证
验证我们的模型的鲁棒性和泛化能力
5.1 验证过程数据化
生成Excel:
5.2 指标报表
准确度: 0.8490566037735849
精确度: 0.8490566037735849
召回率: 0.8490566037735849
f1: 0.8430301354829655
5.3 混淆矩阵
可视化,由于数据分类过多,只能看一下形式了,然后是给一点混淆矩阵生成的代码。
主要还是运用生成的excel来进行画图操作,但是因为我的类型名称过于复杂就直接用ImageFolder重新加载了一次数据集然后调用数据集的classes属性来提取名字,非常好用。
def report():
excel_data = pd.read_excel(excel_path)
labels = excel_data['true'].values
predict = excel_data['predict'].values
label = ImageFolder(
root=os.path.join(data_path,'train')
)
data_classes = [*label.classes]
class_report = classification_report(
labels,
predict,
)
print(class_report)
accuracy =accuracy_score(labels,predict)
print(f'准确度:{accuracy}')
precision = precision_score(labels, predict, average='macro')
print(f"精确度:{precision}")
recall = recall_score(labels, predict, average='macro')
print(f"召回率:{recall}")
f1 = f1_score(labels, predict, average='macro')
print(f"f1:{f1}")
matrix = confusion_matrix(labels,predict)
plt.matshow(matrix, cmap=plt.cm.Greens)
# 显示颜色条
plt.colorbar()
for i in range(len(matrix)):
for j in range(len(matrix)):
plt.annotate(
matrix[j, i],
xy=(j, i),
horizontalalignment="center",
verticalalignment="center",
)
plt.xlabel("True labels")
plt.ylabel("Predict labels")
plt.xticks(range(len(label)), label, rotation=45)
plt.yticks(range(len(label)), label)
plt.rcParams['font.sans-serif'] = ['SimHei'] # 设置字体为SimHei
plt.rcParams['axes.unicode_minus'] = False
plt.title("训练结果混淆矩阵视图")
plt.show()
然后是另一个十分类数据集的混淆矩阵生成。
这个模型的测试集准确度也是达到了98%。
6. 模型优化
让网络变得更好,这里主要的优化一是注意力机制的引入,而是深度可分离卷积的引入。
6.1 增加网络深度
我选择加深resnet的深度的作用一是加速,二是注意力机制
from torchvision.models.resnet import resnet34,ResNet34_Weights,BasicBlock,_resnet,_ovewrite_named_param,ResNet
from SEnet网络构建 import SENet
import torch.nn as nn
class ResNetLayer(ResNet):
def __init__(
self,
block,
layers,
num_classes=1000,
zero_init_residual=False,
groups=1,
width_per_group: int = 64,
replace_stride_with_dilation=None,
norm_layer=None,
):
super(ResNetLayer, self).__init__(
block,
layers,
num_classes,
zero_init_residual,
groups,
width_per_group,
replace_stride_with_dilation,
norm_layer=norm_layer,
)
# 更新Layer
self.layer3 = self._modify_layer(self.layer3)
self.layer4 = self._modify_layer(self.layer4)
def _modify_layer(self, layer):
myLayer = []
for block in layer:
myLayer.append(block)
myLayer.append(SENet(block.conv2.out_channels, 16))
return nn.Sequential(*myLayer)
def resnet34Layer(*, weights=None, progress=True, **kwargs):
weights = ResNet34_Weights.verify(weights)
return _resnet(BasicBlock, [2, 2, 2, 2], weights, progress, **kwargs)
def _resnet(
block,
layers,
weights,
progress: bool,
**kwargs,
):
if weights is not None:
_ovewrite_named_param(kwargs, "num_classes", len(weights.meta["categories"]))
model = ResNetLayer(block, layers, **kwargs)
if weights is not None:
model.load_state_dict(
weights.get_state_dict(progress=progress, check_hash=True)
)
return model
# 定义深度可分离卷积层
class DepthwiseSeparableConv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
super(DepthwiseSeparableConv2d, self).__init__()
self.depthwise = nn.Conv2d(
in_channels, in_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=in_channels
)
self.pointwise = nn.Conv2d(
in_channels, out_channels,
kernel_size=1,
stride=1,
padding=0
)
def forward(self, x):
x = self.depthwise(x)
x = self.pointwise(x)
return x
6.2 继续训练
因为我是选择在训练函数中加入单独语句而不是新建函数创建的,所以我直接当作继续训练函数使用了
def train():
# wandb.init(
# # set the wandb project where this run will be logged
# project="my-awesome-project",
#
# # track hyperparameters and run metadata
# config={
# "learning_rate": 0.02,
# "architecture": "CNN",
# "dataset": "CIFAR-100",
# "epochs": 100,
# }
# )
custom_data = ImageFolder(
root=os.path.join(data_path,'train'),
transform= transforms.Compose(
[
transforms.ToTensor(),
transforms.Resize((224,224)),
# transforms.RandomRotation(15),
transforms.RandomHorizontalFlip(0.5),
transforms.RandomCrop(112, padding=4),
# transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2471, 0.2435, 0.2616)),
]
)
)
net = pretrained()
net.train()
# 二次训练
state_dict = torch.load(os.path.join(weight_path,'res_last_low.pth'))
net.load_state_dict(state_dict)
net.to(device)
epochs = 50
lr = 0.001
batch_size = 128
optimizer = torch.optim.Adam(net.parameters(),lr=lr,weight_decay=1e-5)
loss_fn = nn.CrossEntropyLoss(reduction='mean')
train_loader = DataLoader(custom_data,batch_size=batch_size,shuffle=True)
for epoch in range(epochs):
accuracy = 0
total_loss=0
count = 0
start_time = time.time()
for i,data in enumerate(train_loader):
x,y = data
if i % 100 == 0:
img_grid = torchvision.utils.make_grid(x)
writer.add_image(f"r_m_{epoch}_{i * 100}", img_grid, epoch * len(train_loader) + i)
count +=1
x,y = x.to(device),y.to(device)
y_pred = net(x)
loss = loss_fn(y_pred,y)
total_loss +=loss
optimizer.zero_grad()
accuracy += torch.sum(torch.argmax(y_pred,dim=1)==y)
loss.backward()
optimizer.step()
print(f'acc:{accuracy/len(custom_data)},time:{time.time()-start_time},loss:{total_loss/count}')
writer.add_scalar('Acc',accuracy/len(custom_data),epoch)
writer.add_scalar('Loss',total_loss/len(custom_data),epoch)
# wandb.log({"acc": accuracy / len(custom_data), "loss": total_loss / count})
torch.save(net.state_dict(),os.path.join(weight_path,'res_last_low.pth'))
# wandb.watch(net, log="all", log_graph=True)
print('Finish!!!!!')
writer.close()
# wandb.finish()
因为我的模型是经过了预加载的,所以我是直接调用pretrained函数来返回我自己改造好的初始化模型,这样我在加载时就不需要重新加载resnet34再处理参数了,然后就是网络模型的模式调节,真的非常的重要,在训练时使用训练模式,预测时使用预测模式。
# 训练模式
net.train()
# 预测模式
net.eval()
6.3 预训练和迁移学习
我直接给出我的初始化模型,而这些所有的训练也都是基于Resnet的迁移学习。
def pretrained():
pre_model = resnet34(
weights = ResNet34_Weights.IMAGENET1K_V1
)
torch.save(pre_model.state_dict(),os.path.join(weight_path,'res34.pth'))
net = resnet34Layer(weights=None)
net.conv1 = DepthwiseSeparableConv2d(3, 64, kernel_size=7, stride=2, padding=3)
net.fc = nn.Linear(
in_features=pre_model.fc.in_features,
out_features=10,
bias=True,
)
state_dic = torch.load(os.path.join(weight_path,'res34.pth'))
state_dic.pop('fc.weight')
state_dic.pop('fc.bias')
state_dic.pop('conv1.weight')
net.load_state_dict(state_dic,strict=False)
# print(net)
return net
7. 模型应用
推理工作
7.1 图片预处理
我这里使用的是PTL进行加载,因为cv2仍需要将加载图片转为RGB。
def Imageread(img_path):
img = Image.open(img_path).convert('RGB') # 确保图片是RGB格式
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
# 如果训练时应用了标准化,请在这里添加
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
img = transform(img).unsqueeze(0)
print(img.shape)
return img
7.2 模型推理
def Predict():
net = pretrained()
data = ImageFolder(
root=os.path.join(data_path,'train')
)
data_classes = [*data.classes]
print(data_classes)
state_dict = torch.load('runs/weight_34/SEresnet.pth')
net.load_state_dict(state_dict)
net.eval()
img = Imageread('king.jpg')
pred = net(img)
pred = nn.Softmax(dim=1)(pred)
print(pred)
print(data_classes[(torch.argmax(pred,dim=1))])
7.3 类别显示
8. 模型移植
使用ONNX
8.1 导出ONNX
这里就是在layer3和4的BasicBlock中加的一个SEnet
8.2 使用ONNX推理
from PTL import Image
import onnx
import onnxruntime as ort
from torchvision import transforms
def Imageread(img_path):
img = Image.open(img_path).convert('RGB') # 确保图片是RGB格式
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
# 如果训练时应用了标准化,请在这里添加
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
img = transform(img).unsqueeze(0)
print(img.shape)
return img
def inference():
# 加载onnx模型
model = ort.InferenceSession('runs/weight/SE-resnet.onnx',providers=['CPUExecutionProvider'])
img = Imageread('fangkuai3.jpg').numpy()
out = model.run(None,{'input':img})
data = ImageFolder(
root=os.path.join(data_path, 'train')
)
class_label = [*data.classes]
print(class_label[list(out[0][0]).index(max(out[0][0]))])
执行结果::
9. 项目总结
自己要学会记录
9.1 问题及解决办法
在实验的过程中遇到了非常多的问题
1、最开始是时候是数据集的问题,注意原因是开始自己的实验的时候用的始终是Cifar-10或者MNIST这种下载方便且分类明确的数据集,导致我们不清楚应该在数据集加载时使用什么函数去调用数据集,到后面翻阅前面得知使用的是Imagefolder,但是它对数据集的文件夹有所要求,需要按固定的逐级文件分类去做数据集,所以我尝试重新寻找数据集,而我的同学们也尝试了爬虫,但是我懒,就没做尝试。使用的是老师给出的扑克牌数据集。就很好的避免了数据集带来的问题,包括了数据集单分类数据量过少叠加网络过深导致的过拟合问题。
2、我在使用resnet18时遭遇了严重的欠拟合问题,一是有可能当时我数据增强过重或者我的训练轮次过低,主要是当时我的训练速度非常低,所以我才在得知深度可分离卷积和注意力机制后直接将此两种方式都加入了其中,这使我的训练速度和拟合速度得到了质的飞跃。
3、在后续的书写其他代码的过程中其实我并没太多的问题,但是我感觉我对新知识理解非常不到位,我依旧是无法自己写出简单的注意力机制和resnet的层更改。我始终感觉它的引用重叠过多,关系有点理不清楚。
9.2 收获
收获的话,这是继上次cv2后又一次做一个比较大的项目,当时在sklearn阶段我不太理解反向传播的用处和cnn整体的使用,但是这次实验后我大致的理清了整个神经网络学习的过程,而在最后模型训练出来后看到预测准确的那一刻成就感还是非常强的,对于简单的神经网络我确实是有一定的信心去自己编写了,但是在学习注意力机制时,我确认是大致能明白在做什么,无非是训练计算每层通道权重占比的函数的参数,但是我确实是被将它引入模型时的复杂调用给绕晕了,emmm别的不说还是先玩游戏罢。
项目总体代码:
import time
import onnx
import onnxruntime as ort
import cv2
from matplotlib import pyplot as plt
from sklearn.metrics import f1_score, accuracy_score, recall_score, classification_report, precision_score, \
confusion_matrix
import numpy as np
import pandas
import pandas as pd
from PIL import Image
import torch
import os
import torchvision
import torch.nn as nn
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader,Dataset
from torch.utils.tensorboard import SummaryWriter
from torchvision.datasets import CIFAR10
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torchvision.models import resnet34,ResNet34_Weights
from torchvision.models.resnet import ResNet,BasicBlock,_resnet,_ovewrite_named_param
import wandb
from SEnet网络构建 import SENet
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
current_path = os.path.dirname(__file__)
weight_path = os.path.relpath(os.path.join(current_path,'runs/weight_34'))
data_path = os.path.relpath(os.path.join(current_path,'datasets'))
excel_path = os.path.relpath(os.path.join(current_path,'excel/poker.xlsx'))
writer = SummaryWriter(log_dir='tblogs/train_SE_2')
onnx_path = os.path.relpath(os.path.join(current_path,'runs/weight/SE-resnet.onnx'))
class ResNetLayer(ResNet):
def __init__(
self,
block,
layers,
num_classes=1000,
zero_init_residual=False,
groups=1,
width_per_group: int = 64,
replace_stride_with_dilation=None,
norm_layer=None,
):
super(ResNetLayer, self).__init__(
block,
layers,
num_classes,
zero_init_residual,
groups,
width_per_group,
replace_stride_with_dilation,
norm_layer=norm_layer,
)
# 更新Layer
self.layer3 = self._modify_layer(self.layer3)
self.layer4 = self._modify_layer(self.layer4)
def _modify_layer(self, layer):
myLayer = []
for block in layer:
myLayer.append(block)
myLayer.append(SENet(block.conv2.out_channels, 16))
return nn.Sequential(*myLayer)
def resnet34Layer(*, weights=None, progress=True, **kwargs):
weights = ResNet34_Weights.verify(weights)
return _resnet(BasicBlock, [2, 2, 2, 2], weights, progress, **kwargs)
def _resnet(
block,
layers,
weights,
progress: bool,
**kwargs,
):
if weights is not None:
_ovewrite_named_param(kwargs, "num_classes", len(weights.meta["categories"]))
model = ResNetLayer(block, layers, **kwargs)
if weights is not None:
model.load_state_dict(
weights.get_state_dict(progress=progress, check_hash=True)
)
return model
# 定义深度可分离卷积层
class DepthwiseSeparableConv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
super(DepthwiseSeparableConv2d, self).__init__()
self.depthwise = nn.Conv2d(
in_channels, in_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=in_channels
)
self.pointwise = nn.Conv2d(
in_channels, out_channels,
kernel_size=1,
stride=1,
padding=0
)
def forward(self, x):
x = self.depthwise(x)
x = self.pointwise(x)
return x
def pretrained():
pre_model = resnet34(
weights = ResNet34_Weights.IMAGENET1K_V1
)
torch.save(pre_model.state_dict(),os.path.join(weight_path,'res34.pth'))
net = resnet34Layer(weights=None)
net.conv1 = DepthwiseSeparableConv2d(3, 64, kernel_size=7, stride=2, padding=3)
net.fc = nn.Linear(
in_features=pre_model.fc.in_features,
out_features=53,
bias=True,
)
state_dic = torch.load(os.path.join(weight_path,'res34.pth'))
state_dic.pop('fc.weight')
state_dic.pop('fc.bias')
state_dic.pop('conv1.weight')
net.load_state_dict(state_dic,strict=False)
# print(net)
return net
def train():
# wandb.init(
# # set the wandb project where this run will be logged
# project="my-awesome-project",
#
# # track hyperparameters and run metadata
# config={
# "learning_rate": 0.02,
# "architecture": "CNN",
# "dataset": "CIFAR-100",
# "epochs": 100,
# }
# )
custom_data = ImageFolder(
root=os.path.join(data_path,'train'),
transform= transforms.Compose(
[
transforms.ToTensor(),
transforms.Resize((224,224)),
# transforms.RandomRotation(15),
transforms.RandomHorizontalFlip(0.5),
transforms.RandomCrop(112, padding=4),
# transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2471, 0.2435, 0.2616)),
]
)
)
net = pretrained()
net.train()
# 二次训练
state_dict = torch.load(os.path.join(weight_path,'SEresnet.pth'))
net.load_state_dict(state_dict)
net.to(device)
epochs = 50
lr = 0.001
batch_size = 128
optimizer = torch.optim.Adam(net.parameters(),lr=lr)
loss_fn = nn.CrossEntropyLoss(reduction='sum')
train_loader = DataLoader(custom_data,batch_size=batch_size,shuffle=True)
for epoch in range(epochs):
accuracy = 0
total_loss=0
count = 0
start_time = time.time()
for i,data in enumerate(train_loader):
x,y = data
if i % 100 == 0:
img_grid = torchvision.utils.make_grid(x)
writer.add_image(f"r_m_{epoch}_{i * 100}", img_grid, epoch * len(train_loader) + i)
count +=1
x,y = x.to(device),y.to(device)
y_pred = net(x)
loss = loss_fn(y_pred,y)
total_loss +=loss
optimizer.zero_grad()
accuracy += torch.sum(torch.argmax(y_pred,dim=1)==y)
loss.backward()
optimizer.step()
print(f'acc:{accuracy/len(custom_data)},time:{time.time()-start_time},loss:{total_loss/count}')
writer.add_scalar('Acc',accuracy/len(custom_data),epoch)
writer.add_scalar('Loss',total_loss/len(custom_data),epoch)
# wandb.log({"acc": accuracy / len(custom_data), "loss": total_loss / count})
torch.save(net.state_dict(),os.path.join(weight_path,'SEresnet.pth'))
# wandb.watch(net, log="all", log_graph=True)
print('Finish!!!!!')
writer.close()
# wandb.finish()
def Val_test():
data_val = ImageFolder(
root=os.path.join(data_path,'test'),
transform=transforms.Compose(
[transforms.ToTensor(),
transforms.Resize((224,224)),
# transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2471, 0.2435, 0.2616)),
]
)
)
net = pretrained()
state_dict = torch.load(os.path.join(weight_path,'SEresnet.pth'))
net.load_state_dict(state_dict)
net.to(device)
net.eval()
acc = 0
total_data = np.empty(shape=(0,55))
with torch.no_grad(): # 在测试时不计算梯度
for x, y in DataLoader(data_val, batch_size=8):
x, y = x.to(device), y.to(device)
y_pred = net(x)
acc += torch.sum(torch.argmax(y_pred, dim=1) == y)
predict = torch.argmax(y_pred,dim=1).detach().cpu().numpy()
predict = np.expand_dims(predict,axis=1)
excel_data = y_pred.detach().cpu().numpy()
y = y.unsqueeze(dim=1).detach().cpu().numpy()
total = np.concatenate((excel_data,y,predict),axis=1)
total_data = np.concatenate((total_data,total),axis=0)
df = pandas.DataFrame(total_data,columns=[*data_val.classes,'true','predict'])
df.to_excel('excel/poker.xlsx')
print(f"验证集准确率:{acc / len(data_val)}")
def report():
excel_data = pd.read_excel(excel_path)
labels = excel_data['true'].values
predict = excel_data['predict'].values
label = ImageFolder(
root=os.path.join(data_path,'train')
)
data_classes = [*label.classes]
class_report = classification_report(
labels,
predict,
)
print(class_report)
accuracy =accuracy_score(labels,predict)
print(f'准确度:{accuracy}')
precision = precision_score(labels, predict, average='macro')
print(f"精确度:{precision}")
recall = recall_score(labels, predict, average='macro')
print(f"召回率:{recall}")
f1 = f1_score(labels, predict, average='macro')
print(f"f1:{f1}")
# matrix = confusion_matrix(labels,predict)
# plt.matshow(matrix, cmap=plt.cm.Greens)
# # 显示颜色条
# plt.colorbar()
# for i in range(len(matrix)):
# for j in range(len(matrix)):
# plt.annotate(
# matrix[j, i],
# xy=(j, i),
# horizontalalignment="center",
# verticalalignment="center",
# )
# plt.xlabel("True labels")
# plt.ylabel("Predict labels")
# plt.xticks(range(len(label)), label, rotation=45)
# plt.yticks(range(len(label)), label)
# plt.rcParams['font.sans-serif'] = ['SimHei'] # 设置字体为SimHei
# plt.rcParams['axes.unicode_minus'] = False
# plt.title("训练结果混淆矩阵视图")
# plt.show()
def Imageread(img_path):
img = Image.open(img_path).convert('RGB') # 确保图片是RGB格式
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
# 如果训练时应用了标准化,请在这里添加
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
img = transform(img).unsqueeze(0)
print(img.shape)
return img
def Predict():
net = pretrained()
data = ImageFolder(
root=os.path.join(data_path,'train')
)
data_classes = [*data.classes]
print(data_classes)
state_dict = torch.load('runs/weight_34/SEresnet.pth')
net.load_state_dict(state_dict)
net.eval()
img = Imageread('king.jpg')
pred = net(img)
pred = nn.Softmax(dim=1)(pred)
print(pred)
print(data_classes[(torch.argmax(pred,dim=1))])
# cv2.waitKey(0)
# cv2.destroyAllWindows()
def Onnx_model():
model = pretrained()
state_dict = torch.load(os.path.join(weight_path,'SEresnet.pth'))
model.load_state_dict(state_dict)
img = Imageread('fangkuai3.jpg')
torch.onnx.export(
model,
img,
onnx_path,
verbose=True,
input_names=['input'],
output_names=['output'],
)
print('onnx导出成功')
def inference():
# 加载onnx模型
model = ort.InferenceSession('runs/weight/SE-resnet.onnx',providers=['CPUExecutionProvider'])
img = Imageread('fangkuai3.jpg').numpy()
out = model.run(None,{'input':img})
data = ImageFolder(
root=os.path.join(data_path, 'train')
)
class_label = [*data.classes]
print(class_label[list(out[0][0]).index(max(out[0][0]))])
if __name__ == '__main__':
# pretrained()
# train()
# report()
# Val_test()
# Predict()
# Onnx_model()
inference()
希望明天能够搞懂空间注意力和注意力机制的引进流程。