当前位置:   article > 正文

线性回归——pytorch与paddle实现线性回归的详细过程_pythorch paddle

pythorch paddle

线性回归——pytorch与paddle实现线性回归

本文将深入探讨线性回归的理论基础,并通过PyTorch和PaddlePaddle两个深度学习框架来展示如何实现线性回归模型。我们将首先介绍线性代数、微积分以及自动微分的基本概念,这些数学工具是理解和实现线性回归的基础。接着,我们将详细讲解线性回归问题的本质,包括其数学模型和求解方法。最后,通过PyTorch和PaddlePaddle的代码示例,我们将展示如何设计、训练和评估一个线性回归模型,从而让读者能够直观地理解并掌握这两种框架在机器学习问题中的应用。

本文部分为torch框架以及部分理论分析,paddle框架对应代码可见线性回归paddle部分

import torch
  • 1
print("pytorch version:",torch.__version__)
  • 1
pytorch version: 2.2.2+cu121
  • 1

矩阵运算、梯度基础操作

矩阵运算

基本的标量与向量的创建方法:我们可以将标量及列表转换为Tensor标量或向量。

# 首先我们创建一些标量和向量
num = 1  # 一个标量
mat_1 = [1, 2, 3]  # 一维向量(3)
mat_2 = [[1, 2, 3], [4, 5, 6]]  # 二维向量(2*3)
mat_3 = [[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]  # 三维向量(2*2*3)
mat_4 = [[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]], 
[[[13, 14, 15], [16, 17, 18]], [[19, 20, 21], [22, 23, 24]]]]  # 四维向量(2*2*2*3)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
# 接着我们将其转换为Tensor
num_tensor = torch.tensor(num, dtype=torch.float32)  # 其中dtype是数据元素类型,这地方我们采用float32类型变量
mat_1_tensor = torch.tensor(mat_1, dtype=torch.float32)
mat_2_tensor = torch.tensor(mat_2, dtype=torch.float32)
mat_3_tensor = torch.tensor(mat_3, dtype=torch.float32)
mat_4_tensor = torch.tensor(mat_4, dtype=torch.float32)
num_tensor, mat_1_tensor, mat_2_tensor, mat_3_tensor, mat_4_tensor
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
(tensor(1.),
 tensor([1., 2., 3.]),
 tensor([[1., 2., 3.],
         [4., 5., 6.]]),
 tensor([[[ 1.,  2.,  3.],
          [ 4.,  5.,  6.]],
 
         [[ 7.,  8.,  9.],
          [10., 11., 12.]]]),
 tensor([[[[ 1.,  2.,  3.],
           [ 4.,  5.,  6.]],
 
          [[ 7.,  8.,  9.],
           [10., 11., 12.]]],
 
 
         [[[13., 14., 15.],
           [16., 17., 18.]],
 
          [[19., 20., 21.],
           [22., 23., 24.]]]]))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
# 创建特殊元素的张量
X_arange = torch.reshape(torch.arange(24), (2, 3, 4)) # arange方法将创建元素从0~23的行向量,通过reshape改变张量形状
X_ones = torch.reshape(torch.ones(12, dtype=torch.float32), (1, 3, 4))  # ones方法创建元素都为1的向量
X_zeros = torch.zeros((1, 3, 4))  # zeros方法创建元素都为0的向量
mean = 0.5  
std_dev = 2.0
X_randn = mean + std_dev * torch.randn((1, 3, 4))  # 创建元素服从正态分布N(mean, std_dev)的向量
X_rand = torch.rand((1, 3, 4))  # 创建元素服从0~1均匀分布的向量
X_arange, X_ones, X_zeros, X_randn, X_rand
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
(tensor([[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11]],
 
         [[12, 13, 14, 15],
          [16, 17, 18, 19],
          [20, 21, 22, 23]]]),
 tensor([[[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]]]),
 tensor([[[0., 0., 0., 0.],
          [0., 0., 0., 0.],
          [0., 0., 0., 0.]]]),
 tensor([[[ 2.1599,  0.1366, -0.4689,  0.9104],
          [ 1.4780,  2.3289, -0.1879, -1.5273],
          [-0.0116,  4.1163, -0.8679,  1.0013]]]),
 tensor([[[0.2001, 0.8083, 0.5556, 0.3765],
          [0.8255, 0.4185, 0.5584, 0.2667],
          [0.6393, 0.3122, 0.4045, 0.5263]]]))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

要想复制一个张量,我们不能使用=直接进行复制,因为这样变量会共享内存,要重新分配内存,我们使用clone方法。

X_arange_copy = X_arange.clone()
X_arange1 = X_arange
X_arange_copy, X_arange is X_arange_copy, X_arange1 is X_arange  # 通过is判断两变量是否共享内存
  • 1
  • 2
  • 3
(tensor([[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11]],
 
         [[12, 13, 14, 15],
          [16, 17, 18, 19],
          [20, 21, 22, 23]]]),
 False,
 True)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

两个向量的按元素乘法称为Hadamard积(Hadamard product),数学符号为 ⊙ \odot ,两个向量的按元素乘法可以通过torch.mul方法实现。要想让向量的每个元素都乘以某数或都加上某数,只需要直接对向量进行操作就好。

X_arange_mul = torch.mul(X_arange, X_arange_copy)  # 按元素乘法
a = 2
X_arange_add = X_arange + a  # 按元素加法
X_arange_mula = X_arange * a  # 直接对向量进行乘
X_arange_mul, X_arange_mul is X_arange, X_arange_add, X_arange_mula
  • 1
  • 2
  • 3
  • 4
  • 5
(tensor([[[  0,   1,   4,   9],
          [ 16,  25,  36,  49],
          [ 64,  81, 100, 121]],
 
         [[144, 169, 196, 225],
          [256, 289, 324, 361],
          [400, 441, 484, 529]]]),
 False,
 tensor([[[ 2,  3,  4,  5],
          [ 6,  7,  8,  9],
          [10, 11, 12, 13]],
 
         [[14, 15, 16, 17],
          [18, 19, 20, 21],
          [22, 23, 24, 25]]]),
 tensor([[[ 0,  2,  4,  6],
          [ 8, 10, 12, 14],
          [16, 18, 20, 22]],
 
         [[24, 26, 28, 30],
          [32, 34, 36, 38],
          [40, 42, 44, 46]]]))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

接下来我们介绍一下矩阵运算,在paddle和pytorch中我们尽量使用三种类型的矩阵运算操作,分别为点积、 矩阵-向量积和矩阵-矩阵乘法。

# 点积运算:两向量(1D)对应元素相乘,再求和
# 注意,点积运算只能是1D和1D tensor进行运算,且两个tensor长度需要相同,两tensor数据类型需要相同
X_dot1 = torch.ones(4, dtype=torch.float32)
X_dot2 = torch.arange(4, dtype=torch.float32)
X_dot12 = torch.dot(X_dot1, X_dot2)  # 点积运算
X_dot12
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
tensor(6.)
  • 1
# 矩阵-向量积运算:矩阵的每一行与向量相乘,再求和,最终得到一个1D向量
X_mv1 = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float32)  # 创建1二维矩阵
X_mv2 = torch.tensor([1, 2, 3], dtype=torch.float32)  # 创建1D向量
X_mv12 = torch.matmul(X_mv1, X_mv2)  # 矩阵-向量积运算
X_mv12
  • 1
  • 2
  • 3
  • 4
  • 5
tensor([14., 32., 50.])
  • 1
# 矩阵-矩阵乘法运算:矩阵的每一行与矩阵的每一列相乘,再求和,最终得到一个矩阵
X_mm1 = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float32)  # 创建1二维矩阵
X_mm2 = torch.ones((3, 3), dtype=torch.float32)  # 创建1二维矩阵
X_mm12 = torch.matmul(X_mm1, X_mm2)  # 矩阵-矩阵乘法运算
X_mm12
  • 1
  • 2
  • 3
  • 4
  • 5
tensor([[ 6.,  6.,  6.],
        [15., 15., 15.],
        [24., 24., 24.]])
  • 1
  • 2
  • 3

梯度

对于变量的梯度,我们一般讨论形如 f ( x ) f(\bf{x}) f(x)的标量值函数,这意味着函数输入 x \bf{x} x可以为任意维度的向量(或标量),而函数输出 f ( x ) f(\bf{x}) f(x)为标量值。此时变量 x \bf{x} x的梯度可表示为 ∇ f ( x ) \nabla f(\bf{x}) f(x)

# 我们首先定义一个标量值函数f(x)
def f(x):
    if x.shape == ():  # 判断输入是否为标量
        return 2 * x
    else:
        return torch.sum(2 * x)
    
x_0 = torch.tensor(1.0, dtype=torch.float32, requires_grad=True)  # 创建一个标量值变量x, requires_grad=True表示该变量需要计算梯度
x_1 = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float32, requires_grad=True)  # 向量值变量x
y_0 = f(x_0)  # 计算标量值函数f(x_0)的值
y_1 = f(x_1)  # 计算标量值函数f(x_1)的值
y_0.backward()  # 反向传播更新变量梯度
y_1.backward()  # 反向传播更新变量梯度
x_0.grad, x_1.grad  # 输出变量梯度
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
(tensor(2.), tensor([2., 2., 2.]))
  • 1
x_0.grad.zero_()  # 梯度清零,否则的话,变量梯度信息会累加
x_1.grad.zero_()  # 梯度清零
y_0 = f(x_0) + x_0 ** 2
y_1 = f(x_1) + torch.dot(x_1, x_1)
y_0.backward()  # 反向传播更新变量梯度
y_1.backward()  # 反向传播更新变量梯度
x_0.grad, x_1.grad  # 输出变量梯度
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
(tensor(4.), tensor([4., 6., 8.]))
  • 1

线性回归

线性回归问题可以表示为:y = wx + b,其中 w w w b b b为待求参数, x x x为输入变量, y y y为输出变量。对于线性回归问题的机器学习求解,我们可以通过求解最小二乘问题来得到 w w w b b b的值。
min ⁡ w , b 1 2 m ∑ i = 1 m ( w x i + b − y i ) 2 \min_{w, b} \frac{1}{2m} \sum_{i=1}^m (wx_i + b - y_i)^2 w,bmin2m1i=1m(wxi+byi)2
其中 m m m为样本数量。

我们采用Bike Sharing中的数据集进行线性回归模型的训练和测试。该数据集包含2011年至2012年首都自行车共享系统中租赁自行车的小时和每日计数,以及相应的天气和季节信息。执行以下代码导入数据集的话需要安装ucimlrepo库(pip install ucimlrepo)。

# 导入数据
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
bike_sharing = fetch_ucirepo(id=275) 
  
# data (as pandas dataframes) 
data_featrures = bike_sharing.data.features 
data_targets = bike_sharing.data.targets 
  
# metadata 
print(bike_sharing.metadata) 
  
# variable information 
print(bike_sharing.variables) 

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
{'uci_id': 275, 'name': 'Bike Sharing', 'repository_url': 'https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset', 'data_url': 'https://archive.ics.uci.edu/static/public/275/data.csv', 'abstract': 'This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.', 'area': 'Social Science', 'tasks': ['Regression'], 'characteristics': ['Multivariate'], 'num_instances': 17389, 'num_features': 13, 'feature_types': ['Integer', 'Real'], 'demographics': [], 'target_col': ['cnt'], 'index_col': ['instant'], 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 2013, 'last_updated': 'Sun Mar 10 2024', 'dataset_doi': '10.24432/C5W894', 'creators': ['Hadi Fanaee-T'], 'intro_paper': {'title': 'Event labeling combining ensemble detectors and background knowledge', 'authors': 'Hadi Fanaee-T, João Gama', 'published_in': 'Progress in Artificial Intelligence', 'year': 2013, 'url': 'https://www.semanticscholar.org/paper/bc42899f599d31a5d759f3e0a3ea8b52479d6423', 'doi': '10.1007/s13748-013-0040-3'}, 'additional_info': {'summary': 'Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues. \r\n\r\nApart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.', 'purpose': None, 'funded_by': None, 'instances_represent': None, 'recommended_data_splits': None, 'sensitive_data': None, 'preprocessing_description': None, 'variable_info': 'Both hour.csv and day.csv have the following fields, except hr which is not available in day.csv\r\n\t\r\n\t- instant: record index\r\n\t- dteday : date\r\n\t- season : season (1:winter, 2:spring, 3:summer, 4:fall)\r\n\t- yr : year (0: 2011, 1:2012)\r\n\t- mnth : month ( 1 to 12)\r\n\t- hr : hour (0 to 23)\r\n\t- holiday : weather day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)\r\n\t- weekday : day of the week\r\n\t- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.\r\n\t+ weathersit : \r\n\t\t- 1: Clear, Few clouds, Partly cloudy, Partly cloudy\r\n\t\t- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist\r\n\t\t- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds\r\n\t\t- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog\r\n\t- temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)\r\n\t- atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)\r\n\t- hum: Normalized humidity. The values are divided to 100 (max)\r\n\t- windspeed: Normalized wind speed. The values are divided to 67 (max)\r\n\t- casual: count of casual users\r\n\t- registered: count of registered users\r\n\t- cnt: count of total rental bikes including both casual and registered\r\n', 'citation': None}}
          name     role         type demographic  \
0      instant       ID      Integer        None   
1       dteday  Feature         Date        None   
2       season  Feature  Categorical        None   
3           yr  Feature  Categorical        None   
4         mnth  Feature  Categorical        None   
5           hr  Feature  Categorical        None   
6      holiday  Feature       Binary        None   
7      weekday  Feature  Categorical        None   
8   workingday  Feature       Binary        None   
9   weathersit  Feature  Categorical        None   
10        temp  Feature   Continuous        None   
11       atemp  Feature   Continuous        None   
12         hum  Feature   Continuous        None   
13   windspeed  Feature   Continuous        None   
14      casual    Other      Integer        None   
15  registered    Other      Integer        None   
16         cnt   Target      Integer        None   

                                          description units missing_values  
0                                        record index  None             no  
1                                                date  None             no  
2                1:winter, 2:spring, 3:summer, 4:fall  None             no  
3                             year (0: 2011, 1: 2012)  None             no  
4                                     month (1 to 12)  None             no  
5                                      hour (0 to 23)  None             no  
6   weather day is holiday or not (extracted from ...  None             no  
7                                     day of the week  None             no  
8   if day is neither weekend nor holiday is 1, ot...  None             no  
9   - 1: Clear, Few clouds, Partly cloudy, Partly ...  None             no  
10  Normalized temperature in Celsius. The values ...     C             no  
11  Normalized feeling temperature in Celsius. The...     C             no  
12  Normalized humidity. The values are divided to...  None             no  
13  Normalized wind speed. The values are divided ...  None             no  
14                              count of casual users  None             no  
15                          count of registered users  None             no  
16  count of total rental bikes including both cas...  None             no  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38

可以看到该数据集共有17379项数据,其中有13个特征量和1个目标变量。13个特征量中,第一个量为日期,因此可以用到的是后12个特征量。我们首先设计函数对数据进行预处理(数据归一化)。

import numpy as np  
  
class Preprocessor:  
    def __init__(self):  
        self.min_values = None  
        self.scale_factors = None  
  
    def normalize(self, data):  
        """  
        对输入数据进行归一化处理。  
        data: numpy数组或类似结构,其中每一列是一个特征。  
        """  
        # 确保输入是numpy数组  
        data = np.asarray(data)  
          
        # 检查是否已经拟合过数据,如果没有,则先拟合  
        if self.min_values is None or self.scale_factors is None:  
            self.fit(data)  
          
        # 对数据进行归一化处理  
        normalized_data = (data - self.min_values) * self.scale_factors  
        return normalized_data  
  
    def denormalize(self, normalized_data):  
        """  
        对归一化后的数据进行反归一化处理。  
        normalized_data: 已经归一化处理的数据。  
        """  
        # 确保输入是numpy数组  
        normalized_data = np.asarray(normalized_data)  
          
        # 反归一化数据  
        original_data = normalized_data / self.scale_factors + self.min_values  
        return original_data  
  
    def fit(self, data):  
        """  
        计算每个特征的最小值和比例因子,用于后续的归一化和反归一化。  
        data: numpy数组或类似结构,其中每一列是一个特征。  
        """  
        # 确保输入是numpy数组  
        data = np.asarray(data)  
          
        # 计算每个特征(列)的最小值  
        self.min_values = np.min(data, axis=0)  
          
        # 计算每个特征(列)的比例因子  
        ranges = np.max(data, axis=0) - self.min_values  
        # 避免除以零错误,如果范围是零,则设置为1  
        self.scale_factors = np.where(ranges == 0, 1, 1.0 / ranges)  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
# 归一化数据
data_featrures = data_featrures.iloc[:, 1:]
data_all = np.concatenate((data_featrures, data_targets), axis=1)
# 这样,我们在data_all中,前12列是特征量,最后一列是目标变量
preprocessor = Preprocessor()
  • 1
  • 2
  • 3
  • 4
  • 5
# 归一化
data_all_normalized = preprocessor.normalize(data_all)
  • 1
  • 2

在获得数据后,我们设计一个数据加载器,用于将数据加载到模型中进行训练和测试。

from sklearn.model_selection import train_test_split  
from torch.utils.data import Dataset, DataLoader  
  
class CustomDataset(Dataset):  
    def __init__(self, features, labels):  
        self.features = features  
        self.labels = labels  
  
    def __len__(self):  
        return len(self.labels)  
  
    def __getitem__(self, idx):  
        return self.features[idx], self.labels[idx]  
  
def create_data_loaders(features, labels, batch_size=32, test_size=0.2, random_state=42):  
    # 划分数据集  
    X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=test_size, random_state=random_state)  
      
    # 创建Dataset对象  
    train_dataset = CustomDataset(X_train, y_train)  
    test_dataset = CustomDataset(X_test, y_test)  
      
    # 创建DataLoader对象  
    train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)  
    test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)  
      
    return train_loader, test_loader  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
train_loader, test_loader = create_data_loaders(data_all_normalized[:, :12], data_all_normalized[:, 12:], batch_size=256)
  • 1

让我们看看训练集中的第一项数据吧!

for batch_features, batch_labels in train_loader:  
    break
batch_features.shape, batch_labels.shape
  • 1
  • 2
  • 3
(torch.Size([256, 12]), torch.Size([256, 1]))
  • 1

接下来,我们设计线性回归模型,并使用训练数据进行训练。

# 首先我们定义一个线性回归模型
class LinearRegressionModel(torch.nn.Module):  
    def __init__(self, input_dim):  
        super(LinearRegressionModel, self).__init__()  
        self.linear = torch.nn.Linear(input_dim, 1)  
          
    def forward(self, x):  
        out = self.linear(x)  
        return out  

# 然后我们设计一个训练函数
def train(model, train_loader, criterion, optimizer, device ,num_epochs=100):
    model.train()
    model.to(device)
    for epoch in range(num_epochs):
        for batch_features, batch_labels in train_loader:
            if batch_features.dtype != torch.float32:
                batch_features = batch_features.to(torch.float32)
                batch_labels = batch_labels.to(torch.float32)
            batch_features = batch_features.to(device)
            batch_labels = batch_labels.to(device)
            optimizer.zero_grad()
            outputs = model(batch_features)
            loss = criterion(outputs, batch_labels)
            loss.backward()
            optimizer.step()
        # 将每一轮的训练损失打印出来
        if (epoch+1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# 然后我们定义一个测试函数
def test(model, criterion, device, test_loader):
    model.eval()
    model.to(device)
    total_loss = 0
    with torch.no_grad():
        for batch_features, batch_labels in test_loader:
            if batch_features.dtype != torch.float32:
                batch_features = batch_features.to(torch.float32)
                batch_labels = batch_labels.to(torch.float32)
            batch_features = batch_features.to(device)
            batch_labels = batch_labels.to(device)
            outputs = model(batch_features)
            loss = criterion(outputs, batch_labels)
            total_loss += loss.item()
    print(f'Test Loss: {total_loss / len(test_loader):.4f}')
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
# 我们定义好各个参量,进行模型的训练和测试过程
input_dim = 12
model = LinearRegressionModel(input_dim)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
train(model, train_loader, criterion, optimizer, device, num_epochs=500)  # 训练网络
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
Epoch [10/500], Loss: 0.0391
Epoch [20/500], Loss: 0.0426
Epoch [30/500], Loss: 0.0451
Epoch [40/500], Loss: 0.0360
Epoch [50/500], Loss: 0.0377
Epoch [60/500], Loss: 0.0431
Epoch [70/500], Loss: 0.0485
Epoch [80/500], Loss: 0.0428
Epoch [90/500], Loss: 0.0295
Epoch [100/500], Loss: 0.0311
Epoch [110/500], Loss: 0.0411
Epoch [120/500], Loss: 0.0380
Epoch [130/500], Loss: 0.0396
Epoch [140/500], Loss: 0.0364
Epoch [150/500], Loss: 0.0500
Epoch [160/500], Loss: 0.0482
Epoch [170/500], Loss: 0.0487
Epoch [180/500], Loss: 0.0371
Epoch [190/500], Loss: 0.0325
Epoch [200/500], Loss: 0.0378
Epoch [210/500], Loss: 0.0274
Epoch [220/500], Loss: 0.0366
Epoch [230/500], Loss: 0.0322
Epoch [240/500], Loss: 0.0325
Epoch [250/500], Loss: 0.0295
Epoch [260/500], Loss: 0.0346
Epoch [270/500], Loss: 0.0429
Epoch [280/500], Loss: 0.0392
Epoch [290/500], Loss: 0.0376
Epoch [300/500], Loss: 0.0358
Epoch [310/500], Loss: 0.0314
Epoch [320/500], Loss: 0.0438
Epoch [330/500], Loss: 0.0434
Epoch [340/500], Loss: 0.0256
Epoch [350/500], Loss: 0.0418
Epoch [360/500], Loss: 0.0268
Epoch [370/500], Loss: 0.0298
Epoch [380/500], Loss: 0.0346
Epoch [390/500], Loss: 0.0328
Epoch [400/500], Loss: 0.0279
Epoch [410/500], Loss: 0.0383
Epoch [420/500], Loss: 0.0456
Epoch [430/500], Loss: 0.0379
Epoch [440/500], Loss: 0.0302
Epoch [450/500], Loss: 0.0246
Epoch [460/500], Loss: 0.0341
Epoch [470/500], Loss: 0.0421
Epoch [480/500], Loss: 0.0273
Epoch [490/500], Loss: 0.0213
Epoch [500/500], Loss: 0.0264
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
test(model, criterion, device ,test_loader)  # 测试网络
  • 1
Test Loss: 0.0284
  • 1

可以看到线性模型的精度在归一化后大概可以让MSE损失达到0.03~0.05。接下来让我们手动生成一些数据,看看线性模型是否能够拟合这些数据。

def synthetic_data(w, b, num_examples):
    """生成y=Xw+b+噪声"""
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

true_w = torch.tensor([2, -3.4, 4.2, -1.3, 2.1, -1.0, 3.0, -1.0, 1.0, 3.0, -2.0, 0.5])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 10000)
train_loader, test_loader = create_data_loaders(features, labels, batch_size=256)
# 我们定义好各个参量,进行模型的训练和测试过程
input_dim = features.shape[1]
model = LinearRegressionModel(input_dim)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
train(model, train_loader, criterion, optimizer, device, num_epochs=500)  # 训练网络
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
Epoch [10/500], Loss: 91.9687
Epoch [20/500], Loss: 60.9311
Epoch [30/500], Loss: 69.8828
Epoch [40/500], Loss: 62.9809
Epoch [50/500], Loss: 31.0669
Epoch [60/500], Loss: 42.0620
Epoch [70/500], Loss: 38.5775
Epoch [80/500], Loss: 29.1444
Epoch [90/500], Loss: 30.8505
Epoch [100/500], Loss: 19.2795
Epoch [110/500], Loss: 21.5497
Epoch [120/500], Loss: 20.3075
Epoch [130/500], Loss: 22.2840
Epoch [140/500], Loss: 12.7620
Epoch [150/500], Loss: 10.6494
Epoch [160/500], Loss: 12.3813
Epoch [170/500], Loss: 11.0192
Epoch [180/500], Loss: 8.3032
Epoch [190/500], Loss: 6.3828
Epoch [200/500], Loss: 6.1393
Epoch [210/500], Loss: 5.9387
Epoch [220/500], Loss: 3.8017
Epoch [230/500], Loss: 4.1068
Epoch [240/500], Loss: 3.5148
Epoch [250/500], Loss: 2.3559
Epoch [260/500], Loss: 4.3733
Epoch [270/500], Loss: 1.8011
Epoch [280/500], Loss: 1.9995
Epoch [290/500], Loss: 2.0763
Epoch [300/500], Loss: 2.3735
Epoch [310/500], Loss: 1.9479
Epoch [320/500], Loss: 1.4883
Epoch [330/500], Loss: 1.3161
Epoch [340/500], Loss: 1.0936
Epoch [350/500], Loss: 1.0877
Epoch [360/500], Loss: 0.9161
Epoch [370/500], Loss: 0.6251
Epoch [380/500], Loss: 0.5342
Epoch [390/500], Loss: 0.5225
Epoch [400/500], Loss: 0.4291
Epoch [410/500], Loss: 0.5050
Epoch [420/500], Loss: 0.3970
Epoch [430/500], Loss: 0.3698
Epoch [440/500], Loss: 0.3607
Epoch [450/500], Loss: 0.3106
Epoch [460/500], Loss: 0.1806
Epoch [470/500], Loss: 0.1894
Epoch [480/500], Loss: 0.1364
Epoch [490/500], Loss: 0.1692
Epoch [500/500], Loss: 0.1211
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50

可以看到,线性模型在拟合这些数据时,精度不断提高,读者可尝试不同的参数配置,让训练更加高效精准。

线性回归问题理论分析

模型定义

对于给定的数据集 D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , … , ( x n , y n ) } D = \{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\} D={(x1,y1),(x2,y2),,(xn,yn)},其中 x i ∈ R d x_i \in \mathbb{R}^d xiRd 是特征向量, y i ∈ R y_i \in \mathbb{R} yiR 是目标值。线性回归模型假设目标值 y y y 可以由特征 x x x 的线性组合来预测:

y = w 0 + w 1 x 1 + w 2 x 2 + … + w d x d y = w_0 + w_1 x_1 + w_2 x_2 + \ldots + w_d x_d y=w0+w1x1+w2x2++wdxd

或者用向量形式表示:

y = w T x + w 0 y = w^T x + w_0 y=wTx+w0

其中 w = [ w 1 , w 2 , … , w d ] T w = [w_1, w_2, \ldots, w_d]^T w=[w1,w2,,wd]T 是权重向量, w 0 w_0 w0 是偏置项。

损失函数

为了找到最佳的权重 w w w 和偏置 w 0 w_0 w0,我们需要定义一个损失函数来量化模型预测与实际值之间的差距。最常用的损失函数是均方误差(Mean Squared Error, MSE):

L ( w , w 0 ) = 1 n ∑ i = 1 n ( y i − ( w T x i + w 0 ) ) 2 L(w, w_0) = \frac{1}{n} \sum_{i=1}^{n} (y_i - (w^T x_i + w_0))^2 L(w,w0)=n1i=1n(yi(wTxi+w0))2

训练优化方法

为了找到最小化损失函数的参数,通常使用梯度下降法(Gradient Descent)或其变种(如随机梯度下降SGD、小批量梯度下降Mini-Batch GD等)。梯度下降的更新规则如下:

w = w − η ∂ L ∂ w , w 0 = w 0 − η ∂ L ∂ w 0 w = w - \eta \frac{\partial L}{\partial w}, \quad w_0 = w_0 - \eta \frac{\partial L}{\partial w_0} w=wηwL,w0=w0ηw0L

其中 η \eta η 是学习率。

梯度下降法算法步骤:

  1. 初始化:为权重向量 w(包括 w0)选择一个初始值,例如全零或随机数。选择一个合适的学习率 η(例如0.01)和一个迭代停止条件(例如最大迭代次数或损失函数的变化小于某个阈值)。

  2. 计算梯度:对于数据集中的每个样本 (xi, yi),计算预测值 yi_pred = w0 + w1*xi1 + w2*xi2 + ... + wn*xin,然后计算损失函数关于每个权重 wj 的偏导数(梯度):

    ∂L/∂wj = -2 * (yi - yi_pred) * xij   (对于 j = 1, 2, ..., n)
    ∂L/∂w0 = -2 * (yi - yi_pred)         (对于偏置项 w0)
    
    • 1
    • 2

    其中 L 是损失函数(均方误差),xij 是样本 xi 的第 j 个特征值。

  3. 更新权重:使用计算出的梯度和学习率 η 来更新每个权重:

    wj = wj - η * (∂L/∂wj)   (对于 j = 1, 2, ..., n)
    w0 = w0 - η * (∂L/∂w0)   (对于偏置项 w0)
    
    • 1
    • 2
  4. 迭代:重复步骤2和3,直到满足停止条件(例如达到最大迭代次数或损失函数的改变量小于预设的阈值)。

线性回归的拓展

Lasso回归

Lasso(Least Absolute Shrinkage and Selection Operator)回归是一种通过引入L1正则化来实现特征选择和稀疏模型的方法。其损失函数为:

L ( w , w 0 ) = 1 n ∑ i = 1 n ( y i − ( w T x i + w 0 ) ) 2 + λ ∥ w ∥ 1 L(w, w_0) = \frac{1}{n} \sum_{i=1}^{n} (y_i - (w^T x_i + w_0))^2 + \lambda \|w\|_1 L(w,w0)=n1i=1n(yi(wTxi+w0))2+λw1

其中 λ \lambda λ 是正则化参数,控制正则化的强度。 ∥ w ∥ 1 \|w\|_1 w1 是权重向量的L1范数,即权重向量各元素绝对值之和。

岭回归

岭回归(Ridge Regression)通过引入L2正则化来解决共线性问题,并防止过拟合。其损失函数为:

L ( w , w 0 ) = 1 n ∑ i = 1 n ( y i − ( w T x i + w 0 ) ) 2 + λ ∥ w ∥ 2 2 L(w, w_0) = \frac{1}{n} \sum_{i=1}^{n} (y_i - (w^T x_i + w_0))^2 + \lambda \|w\|_2^2 L(w,w0)=n1i=1n(yi(wTxi+w0))2+λw22

其中 ∥ w ∥ 2 2 \|w\|_2^2 w22 是权重向量的L2范数的平方,即权重向量各元素的平方和。

多项式回归

多项式回归是一种非线性回归方法,它通过将特征进行多项式扩展来拟合非线性关系。例如,对于一维特征 x x x,我们可以将其扩展到二次方:

y = w 0 + w 1 x + w 2 x 2 y = w_0 + w_1 x + w_2 x^2 y=w0+w1x+w2x2

或者更高次方来捕捉更复杂的非线性关系。多项式回归在实际应用中需要注意过拟合的风险。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/540829
推荐阅读
相关标签
  

闽ICP备14008679号