简介:
本章主要是针对Pytorch神经网络的开发步骤做一个详细的总结,对每一步的前世今生做一个了解,下面先列一下开发需要的步骤有哪些:
- 模型构建,主要是前向传递函数的确认
- 确认损失函数以及学习步频(learning_rate)
- 基于损失函数,对模型层的权值进行求导
- 权值更新,实现梯度递减,然后恢复权值导数 a.grad = None
- 二轮循环
Pytorch模型开发详解:
下面我们将参照上面的步骤实现一个简单的模型,然后从最原始的方式到当前最新的方式来梳理每一个步骤的意义。
1、原始模型构建
模型要求:使用三阶多项拟合 y=sin(x) 的问题作为运行示例,网络会有四个参数,并将通过梯度下降进行训练,以通过最小化网络输出和真实输出之间的欧几里得距离(多维空间两点之间的距离)来拟合随机数据
模型分析:
输入X:(-pai,pai)
输出Y:sin(x)
模型预估值Ypred:a + b*x + c*x^2 + d*X^3
损失函数:(Ypred - Y)^2
import torch
import math
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# 创建随机的输入,并计算对应的输出
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)
# 随机初始化权值,如果想保证每次的初始权值一致,可以采用torch.manual_seed()
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)
learning_rate = 1e-6
for t in range(2000):
# Forward pass: compute predicted y
y_pred = a + b * x + c * x ** 2 + d * x ** 3
# 定义损失函数,并计算损失值
loss = (y_pred - y).pow(2).sum().item()
if t % 100 == 99:
print(t, loss)
# 基于损失函数,计算每个权值的导数
grad_y_pred = 2.0 * (y_pred - y)
grad_a = grad_y_pred.sum()
grad_b = (grad_y_pred * x).sum()
grad_c = (grad_y_pred * x ** 2).sum()
grad_d = (grad_y_pred * x ** 3).sum()
#自己计算一些损失函数的求导函数
#loss = ( a + b * x + c * x ** 2 + d * x ** 3 - y).pow(2).sum().item()
#lossa = 2(a + bx + c * x ** 2 + d * x ** 3 - y)
#lossb = 2(a + bx + c * x ** 2 + d * x ** 3 - y)*x
#lossc = 2(a + bx + c * x ** 2 + d * x ** 3 - y)*x^2
#lossd = 2(a + bx + c * x ** 2 + d * x ** 3 - y)*x^3
# 按照梯度相反的方向更新权值,实现梯度递减
a -= learning_rate * grad_a
b -= learning_rate * grad_b
c -= learning_rate * grad_c
d -= learning_rate * grad_d
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
2、求导步骤的优化
在上面的案例中,我们手动的实现了神经网络前向和后向传递。对于小型网络来说,手动实现后向传递没问题,但是对于大型网络来说,会变得很棘手。但是值得感谢的是,我们可以使用自动差分来在神经网络中对反向传递进行自动计算。使用autograd后,网络的前向传递会定义一个计算图。里面的节点就是tensors,然后边缘就是从输入产生输出的函数。通过这个图进行反向传播是我们能轻易的计算出梯度.
前向传递:从输入层到输出层计算预测值的过程。整个过程,神经网络的权重和偏置是固定的,目的是根据给定输入计算输出。
- 输入层:输入数据被传递给神经网络的输入层
- 隐藏层:数据通过每一层的神经元进行处理。每个神经元的输入是前一层所有神经元的输出(加权和),经过激活函数(如ReLU、sigmoid、tanh等)后产生输出。
- 输出层:最终,数据通过最后一层(输出层)生成预测值
后向传播:神经网络根据损失函数计算梯度并更新权重和偏置的过程
- 损失函数:计算预测值和实际值之间的误差,通常使用损失函数(如均方差MSE,交叉熵损失等)
- 梯度计算:使用链式法则计算损失函数对每一层权重和偏置的梯度(导数)
- 权重更新:使用优化算法(梯度下降、Adam等)更新权重和偏置
import torch
import math
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# 创建随机的输入,并计算对应的输出
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)
# 随机初始化权值,如果想保证每次的初始权值一致,可以采用torch.manual_seed()
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
for t in range(2000):
# Forward pass: compute predicted y
y_pred = a + b * x + c * x ** 2 + d * x ** 3
# 定义损失函数,并计算损失值
loss = (y_pred - y).pow(2).sum().item()
if t % 100 == 99:
print(t, loss)
# 采用backward实现权值导数计算
loss.backward()
# 按照梯度相反的方向更新权值,实现梯度递减
with torch.no_grad():
a -= learning_rate * a.grad
b -= learning_rate * b.grad
c -= learning_rate * c.grad
d -= learning_rate * d.grad
# Manually zero the gradients after updating weights
a.grad = None
b.grad = None
c.grad = None
d.grad = None
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
需要注意的是,当我们采用backward进行求导后,要更新权值参数,必须先调用torch.no_grad()来禁用梯度跟踪,默认情况下,所有涉及 requires_grad=True
的张量操作都会被记录到计算图中,用于反向传播,造成内存浪费以及梯度计算混乱。
3、nn.Module的使用
计算图和 autograd 是定义复杂运算符和自动取导数的非常强大的范式;但是,对于大型神经网络,Raw Autograd 可能有点太低级了。在tensorflow中,像 Keras, TensorFlow-Slim, and TFLearn这些包提供了对原始计算图的高级抽象,这些对于构建神经网络非常有用。
在pytorch中,nn包提供相同的作用,nn包有一系列的module,它们大致相当于神经网络层。一个module既可以接收输入tensors,然后计算出输出tensors, 也可以保存包含学习参数的tensors状态。nn包里面也定义了一系列有用的损失函数,常用于训练神经网络的时候使用。
import torch
import math
# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)
# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)
# In the above code, x.unsqueeze(-1) has shape (2000, 1), and p has shape
# (3,), for this case, broadcasting semantics will apply to obtain a tensor
# of shape (2000, 3)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. The Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor,
# to match the shape of `y`.
model = torch.nn.Sequential(
torch.nn.Linear(3, 1),
torch.nn.Flatten(0, 1)
)
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-6
for t in range(2000):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Tensor of input data to the Module and it produces
# a Tensor of output data.
y_pred = model(xx)
# Compute and print loss. We pass Tensors containing the predicted and true
# values of y, and the loss function returns a Tensor containing the
# loss.
loss = loss_fn(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Zero the gradients before running the backward pass.
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Tensors with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
loss.backward()
# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its gradients like we did before.
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
# You can access the first layer of `model` like accessing the first item of a list
linear_layer = model[0]
# For linear layer, its parameters are stored as `weight` and `bias`.
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()}
4、优化器的使用
到目前为止,我们都是通过使用torch.no_grad函数来手动调用带学习参数的Tensors来更新模型的权值。对于随机梯度下降等简单的优化算法来说,这并不是一个巨大的负担,但在实践中,我们经常使用更复杂的优化器(如 AdaGrad、RMSProp、Adam 等)来训练神经网络。
PyTorch 中的 optim 包抽象了优化算法的概念,并提供了常用优化算法的实现。
下面我们还是使用上面的模型,但是我们将会使用optim package里面的RMSprop算法来优化模型:
import torch
import math
# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)
# Prepare the input tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)
# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
torch.nn.Linear(3, 1),
torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')
# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use RMSprop; the optim package contains many other
# optimization algorithms. The first argument to the RMSprop constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(xx)
# Compute and print loss.
loss = loss_fn(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the variables it will update (which are the learnable
# weights of the model). This is because by default, gradients are
# accumulated in buffers( i.e, not overwritten) whenever .backward()
# is called. Checkout docs of torch.autograd.backward for more details.
optimizer.zero_grad()
# Backward pass: compute gradient of the loss with respect to model
# parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its
# parameters
optimizer.step()
linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')
5、自定义nn.Module
在第三小节已经介绍了,nn.Module就包含了各种神经网络层,如果我们想自定义一个神经网络层,或者说自定义一个包含多层网络的模型,该怎么办呢?
对于这种情况,我们可以继承nn包来定义自己的modules,并且定义一个前向传递函数,接收输入tensors并且使用其他的模块或者自动求导函数来产生输出tensors。
import torch
import math
class Polynomial3(torch.nn.Module):
def __init__(self):
"""
In the constructor we instantiate four parameters and assign them as
member parameters.
"""
super().__init__()
self.a = torch.nn.Parameter(torch.randn(()))
self.b = torch.nn.Parameter(torch.randn(()))
self.c = torch.nn.Parameter(torch.randn(()))
self.d = torch.nn.Parameter(torch.randn(()))
def forward(self, x):
"""
In the forward function we accept a Tensor of input data and we must return
a Tensor of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Tensors.
"""
return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3
def string(self):
"""
Just like any class in Python, you can also define custom method on PyTorch modules
"""
return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'
# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)
# Construct our model by instantiating the class defined above
model = Polynomial3()
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters (defined
# with torch.nn.Parameter) which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f'Result: {model.string()}')
OK!以上基本对模型构建的各个步骤,从毛坯到最后的装修说清楚了,当然我们的案例只是想给大家展示一个模型构建的基本步骤是怎么样的。 如果真正想要开发一个用于实际业务的模型,还要很多其他需要学习的地方,欢迎大家一起学习,指正,分享。