赞
踩
导入torch库,然后用torch.empty(5,3)
初始化一个5*3的tensor。
这个tensor里的数字是随机的。
torch.rand(5,3)
里创建的tensor里的数字是零到一的数字。
创建一个全是0的tensor,类型默认是torch.float32
还可以自己制定类型
还可以使用long()函数
还可以根据已有数据创建tensor。
也可以从一个已有的tensor建立新的tensor,新建的tensor会有之前的tensor的一些特征,如下图:我用new_ones(5,3)
建立了一个全是1的tensor,dtype类型是旧的x中数的类型,当然,我也可以手动指定类型,如torch.double
利用randn_like
函数可以产生与x形状相同的随机tensor
shape属性表示形状
tensor的加法
还有一种in-place形式的加法:
像这种有下划线的方法,都会改变调用者原有的值。
各种类似numpy的索引切片操作都可以在tensor上搞。
这里的view就相当于numpy的reshape
如果只有一个·元素的tensor,可以使用.item转为一个数值。
转置矩阵
tensor与numpy可相互转化,并且共同使用一个内存空间,如下图,a和y里的值随便改变一个的话,另一个也会被改变。
tensor转为numpy也行
numpy是cpu的库,一定要讲tensor从gpu搬到cpu上才能转为numpy
代码暂时先忽略偏置
import numpy as np //N表示有多少输入,D_in表示每个输入有多少维,H表示隐藏层输出有多少维,D_out表示最后输出有多少维 N,D_in,H,D_out=64,1000,100,10 x=np.random.randn(N,D_in) y=np.random.randn(N,D_out) w1=np.random.randn(D_in,H) w2=np.random.randn(H,D_out) learning_rate=1e-6 for it in range(500): ##定义损失 h=x.dot(w1) h_relu=np.maximum(h,0) y_pred=h_relu.dot(w2) loss=np.square(y_pred-y).sum() print(it,loss) ##求梯度 grad_y_pred=2.0*(y_pred-y) grad_w2=h_relu.T.dot(grad_y_pred) grad_h_relu=grad_y_pred.dot(w2.T) grad_h=grad_h_relu.copy() grad_h[h<0]=0 grad_w1=x.T.dot(grad_h) ##梯度下降 w1-=learning_rate*grad_w1 w2-=learning_rate*grad_w2
如上图,损失确实下降了
import torch import numpy as np N,D_in,H,D_out=64,1000,100,10 x=torch.randn(N,D_in) y=torch.randn(N,D_out) w1=torch.randn(D_in,H) w2=torch.randn(H,D_out) learning_rate=1e-6 for it in range(500): h=x.mm(w1) h_relu=h.clamp(min=0) y_pred=h_relu.mm(w2) ##要将tensor转为单个数字 loss=(y_pred-y).pow(2).sum().item() print(it,loss) grad_y_pred=2.0*(y_pred-y) grad_w2=h_relu.t().mm(grad_y_pred) grad_h_relu=grad_y_pred.mm(w2.t()) grad_h=grad_h_relu.clone() grad_h[h<0]=0 grad_w1=x.t().mm(grad_h) w1-=learning_rate*grad_w1 w2-=learning_rate*grad_w2
如上图所示,梯度明显在下降
当然,我也可以用torch里面自带的backward进行梯度计算,如下图,tonsor里面的数值得是小数。
将手动实现两层神经网络的代码完全用API来实现:
import torch import numpy as np N,D_in,H,D_out=64,1000,100,10 x=torch.randn(N,D_in) y=torch.randn(N,D_out) w1=torch.randn(D_in,H,requires_grad=True) w2=torch.randn(H,D_out,requires_grad=True) learning_rate=1e-6 for it in range(500): h=x.mm(w1) h_relu=h.clamp(min=0) y_pred=h_relu.mm(w2) ##要将tensor转为单个数字 loss=(y_pred-y).pow(2).sum() print(it,loss.item()) loss.backward() with torch.no_grad():##为了不让计算图占内存,就用torch.no_grad w1-=learning_rate*w1.grad w2-=learning_rate*w2.grad # 根据pytorch中backward()函数的计算,当网络参量进行反馈时, # 梯度是累积计算而不是被替换,但在处理每一个batch时并不需要与其他batch的梯度混合起来累积计算,因此需 # 要对每个batch调用一遍grad.zero_()将参数梯度置0. w1.grad.zero_() w2.grad.zero_()
用NeuralNetwork库实现双层神经网络
import torch N, D_in, H, D_out = 64, 1000, 100, 10 x = torch.randn(N, D_in) y = torch.randn(N, D_out) # Use the nn package to define our model as a sequence of layers. nn.Sequential\n", # is a Module which contains other Modules, and applies them in sequence to\n", # produce its output. Each Linear Module computes output from input using a\n", # linear function, and holds internal Tensors for its weight and bias.\n", model = torch.nn.Sequential( torch.nn.Linear(D_in, H), torch.nn.ReLU(), torch.nn.Linear(H, D_out) ) ##将model里面的第一层和第三层的权重初始化为正态分布 torch.nn.init.normal_(model[0].weight) torch.nn.init.normal_(model[2].weight) # The nn package also contains definitions of popular loss functions; in this # case we will use Mean Squared Error (MSE) as our loss function. loss_fn = torch.nn.MSELoss(reduction='sum') learning_rate = 1e-6 for t in range(500): # Forward pass: compute predicted y by passing x to the model. Module objects\n", # override the __call__ operator so you can call them like functions. When\n", # doing so you pass a Tensor of input data to the Module and it produces\n", # a Tensor of output data.\n", y_pred = model(x) # Compute and print loss. We pass Tensors containing the predicted and true # values of y, and the loss function returns a Tensor containing the loss., loss = loss_fn(y_pred, y) print(t, loss.item()) # Zero the gradients before running the backward pass.\n", model.zero_grad() loss.backward() # Update the weights using gradient descent. Each parameter is a Tensor, so\n", # we can access its gradients like we did before.\n", with torch.no_grad(): for param in model.parameters(): param -= learning_rate * param.grad
“这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数,optim这个package提供了各种不同的模型优化方法,包括SGD+momentum, RMSProp, Adam等等”
import torch N, D_in, H, D_out = 64, 1000, 100, 10 x = torch.randn(N, D_in) y = torch.randn(N, D_out) # Use the nn package to define our model as a sequence of layers. nn.Sequential\n", # is a Module which contains other Modules, and applies them in sequence to\n", # produce its output. Each Linear Module computes output from input using a\n", # linear function, and holds internal Tensors for its weight and bias.\n", model = torch.nn.Sequential( torch.nn.Linear(D_in, H), torch.nn.ReLU(), torch.nn.Linear(H, D_out) ) #torch.nn.init.normal_(model[0].weight) #torch.nn.init.normal_(model[2].weight) # The nn package also contains definitions of popular loss functions; in this # case we will use Mean Squared Error (MSE) as our loss function. loss_fn = torch.nn.MSELoss(reduction='sum') learning_rate = 1e-4 optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) for t in range(500): # Forward pass: compute predicted y by passing x to the model. Module objects\n", # override the __call__ operator so you can call them like functions. When\n", # doing so you pass a Tensor of input data to the Module and it produces\n", # a Tensor of output data.\n", y_pred = model(x) # Compute and print loss. We pass Tensors containing the predicted and true # values of y, and the loss function returns a Tensor containing the loss., loss = loss_fn(y_pred, y) print(t, loss.item()) # Zero the gradients before running the backward pass.\n", optimizer.zero_grad() loss.backward() # Update the weights using gradient descent. Each parameter is a Tensor, so\n", # we can access its gradients like we did before.\n", optimizer.step()
【注意】有时我们需要处理初始化参数,有时则不需要。用Adam时,就不需要将参数初始化正态分布,不然结果反而会坏掉。反正自己敲代码去实践一下,就知道大概是怎么回事了。
“我们可以定义一个模型,这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型,就需要定义nn.Module模型”
import torch class TwoLayerNet(torch.nn.Module): def __init__(self, D_in, H, D_out): #In the constructor we instantiate two nn.Linear modules and assign them as\n", #member variables.\n", super(TwoLayerNet, self).__init__() self.linear1 = torch.nn.Linear(D_in, H) self.linear2 = torch.nn.Linear(H, D_out) def forward(self, x): #In the forward function we accept a Tensor of input data and we must return\n", #a Tensor of output data. We can use Modules defined in the constructor as\n", #well as arbitrary operators on Tensors.\n", h_relu = self.linear1(x).clamp(min=0) y_pred = self.linear2(h_relu) return y_pred # N is batch size; D_in is input dimension", # H is hidden dimension; D_out is output dimension.\n", N, D_in, H, D_out = 64, 1000, 100, 10 "# Create random Tensors to hold inputs and outputs\n", x = torch.randn(N, D_in) y = torch.randn(N, D_out) # Construct our model by instantiating the class defined above\n", model = TwoLayerNet(D_in, H, D_out) # Construct our loss function and an Optimizer. The call to model.parameters()\n", # in the SGD constructor will contain the learnable parameters of the two\n", # nn.Linear modules which are members of the model.\n", criterion = torch.nn.MSELoss(reduction='sum') optimizer = torch.optim.SGD(model.parameters(), lr=1e-4) for t in range(500): # Forward pass: Compute predicted y by passing x to the model\n", y_pred = model(x) # Compute and print loss\n", loss = criterion(y_pred, y) print(t, loss.item()) # Zero gradients, perform a backward pass, and update the weights.\n", optimizer.zero_grad() loss.backward() optimizer.step()
FizzBuzz是一个简单的小游戏。游戏规则如下:从1开始往上数数,当遇到3的倍数的时候,说fizz,当遇到5的倍数,说buzz,当遇到15的倍数,就说fizzbuzz,其他情况下则正常数数。
# One-hot encode the desired outputs: [number, \"fizz\", \"buzz\", \"fizzbuzz\"], import numpy as np import torch def fizz_buzz_encode(i): if i % 15 == 0: return 3 elif i % 5 == 0: return 2 elif i % 3 == 0: return 1 else: return 0 def fizz_buzz_decode(i, prediction): return [str(i), "fizz", "buzz", "fizzbuzz"][prediction] ##我们首先定义模型的输入与输出(训练数据) NUM_DIGITS = 10 # Represent each input by an array of its binary digits. def binary_encode(i, num_digits): return np.array([i >> d & 1 for d in range(num_digits)]) trX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)]) trY = torch.LongTensor([fizz_buzz_encode(i) for i in range(101, 2 ** NUM_DIGITS)]) ##然后我们用PyTorch定义模型 # Define the model NUM_HIDDEN = 100 model = torch.nn.Sequential( torch.nn.Linear(NUM_DIGITS, NUM_HIDDEN), torch.nn.ReLU(), torch.nn.Linear(NUM_HIDDEN, 4) ) # "- 为了让我们的模型学会FizzBuzz这个游戏,我们需要定义一个损失函数,和一个优化算法。\n", # "- 这个优化算法会不断优化(降低)损失函数,使得模型的在该任务上取得尽可能低的损失值。\n", # "- 损失值低往往表示我们的模型表现好,损失值高表示我们的模型表现差。\n", # "- 由于FizzBuzz游戏本质上是一个分类问题,我们选用Cross Entropyy Loss函数。\n", # "- 优化函数我们选用Stochastic Gradient Descent。" loss_fn = torch.nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr = 0.05) # Start training it\n", BATCH_SIZE = 128 for epoch in range(10000): for start in range(0, len(trX), BATCH_SIZE): end = start + BATCH_SIZE batchX = trX[start:end] batchY = trY[start:end] y_pred = model(batchX) loss = loss_fn(y_pred, batchY) optimizer.zero_grad() loss.backward() optimizer.step() # Find loss on training data loss = loss_fn(model(trX), trY).item() print('Epoch:', epoch, 'Loss:', loss) ##最后我们用训练好的模型尝试在1到100这些数字上玩FizzBuzz游戏 testX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(1, 101)]) with torch.no_grad(): testY = model(testX) predictions = zip(range(1, 101), list(testY.max(1)[1].data.tolist())) print([fizz_buzz_decode(i, x) for (i, x) in predictions]) print(np.sum(testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)]))) testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])
结果如下
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。