赞
踩
逻辑回归(Logistic Regression, LR)又称为逻辑回归分析,是一种机器学习算法,属于分类和预测算法中的一种,主要用于解决二分类问题。逻辑回归通过历史数据的表现对未来结果发生的概率进行预测。例如,我们可以将购买的概率设置为因变量,将用户的特征属性,例如性别,年龄,注册时间等设置为自变量。根据特征属性预测购买的概率。
逻辑回归它通过建立一个逻辑回归模型来预测输入样本属于某个类别的概率。逻辑回归模型的核心思想是使用一个称为sigmoid函数(或者称为逻辑函数)的函数来建模概率。sigmoid函数的公式为:
逻辑回归是一种简单而高效的机器学习算法,它具有多个优势。首先,逻辑回归模型易于理解和实现,计算效率高,特别适用于大规模数据集。其次,逻辑回归提供了对结果的解释和推断能力,模型的系数可以揭示哪些特征对分类结果的影响较大或较小。此外,逻辑回归适用于高维数据,能够处理具有大量特征的问题,并捕捉到不同特征之间的关系。另外,逻辑回归能够输出概率预测,而不仅仅是分类结果,对于需要概率估计或不确定性分析的任务非常有用。最后,逻辑回归对于数据中的噪声和缺失值具有一定的鲁棒性,能够适应现实世界中的不完美数据。综上所述,逻辑回归是一种强大而实用的分类算法,在许多实际应用中被广泛采用。以下是逻辑回归常见的应用场景。
医学领域:逻辑回归可以用于疾病诊断、患者预后评估、药物反应预测等医学决策支持任务。
市场营销:逻辑回归可用于客户分类、用户行为分析、广告点击率预测等市场营销领域的任务。
自然语言处理:逻辑回归可用于文本分类、情感分析、垃圾邮件过滤等自然语言处理任务。
图像识别:逻辑回归可以应用于图像分类、目标检测中的二分类问题。
逻辑回归的简单性和可解释性使其在许多实际应用中得到广泛应用。然而,对于复杂的非线性问题,逻辑回归可能不适用,此时可以考虑使用其他更复杂的模型或者结合特征工程技术来改进性能。
在一个银行欺诈数据集上,通过15个特征,得到二分类的判别结果:是否为欺诈失信人员。建的模型依旧是线性模型。输出的值通过sigmoid进行转换,变成0~1的概率。一般认为大于0.5就是1,小于0.5就是0。
0 | 56.75 | 12.25 | 0 | 0 | 6 | 0 | 1.25 | 0 | 0 | 4 | 0 | 0 | 200 | 0 | -1 |
0 | 31.67 | 16.165 | 0 | 0 | 1 | 0 | 3 | 0 | 0 | 9 | 1 | 0 | 250 | 730 | -1 |
1 | 23.42 | 0.79 | 1 | 1 | 8 | 0 | 1.5 | 0 | 0 | 2 | 0 | 0 | 80 | 400 | -1 |
1 | 20.42 | 0.835 | 0 | 0 | 8 | 0 | 1.585 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | -1 |
0 | 26.67 | 4.25 | 0 | 0 | 2 | 0 | 4.29 | 0 | 0 | 1 | 0 | 0 | 120 | 0 | -1 |
0 | 34.17 | 1.54 | 0 | 0 | 2 | 0 | 1.54 | 0 | 0 | 1 | 0 | 0 | 520 | 50000 | -1 |
1 | 36 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 11 | 1 | 0 | 0 | 456 | -1 |
0 | 25.5 | 0.375 | 0 | 0 | 6 | 0 | 0.25 | 0 | 0 | 3 | 1 | 0 | 260 | 15108 | -1 |
0 | 19.42 | 6.5 | 0 | 0 | 9 | 1 | 1.46 | 0 | 0 | 7 | 1 | 0 | 80 | 2954 | -1 |
0 | 35.17 | 25.125 | 0 | 0 | 10 | 1 | 1.625 | 0 | 0 | 1 | 0 | 0 | 515 | 500 | -1 |
0 | 32.33 | 7.5 | 0 | 0 | 11 | 2 | 1.585 | 0 | 1 | 0 | 0 | 2 | 420 | 0 | 1 |
1 | 38.58 | 5 | 0 | 0 | 2 | 0 | 13.5 | 0 | 1 | 0 | 0 | 0 | 980 | 0 | 1 |
最后一列为-1是失信欺诈人员,为1不是失信欺诈人员
- import torch
- import pandas as pd
- import numpy as np
- import matplotlib.pyplot as plt
- from torch import nn
- from torch.utils.data import TensorDataset, DataLoader
- from sklearn.model_selection import train_test_split
-
-
- def accuracy(y_pred,y_true):
- y_pred = (y_pred>0.5).type(torch.int32)
- acc = (y_pred == y_true).float().mean()
- return acc
-
- loss_fn = nn.BCELoss()
- epochs = 1000
- batch = 16
- lr = 0.0001
-
- data = pd.read_csv("credit.csv",header=None)
- X = data.iloc[:,:-1]
- Y = data.iloc[:,-1].replace(-1,0)
-
-
- X = torch.from_numpy(X.values).type(torch.float32)
- Y = torch.from_numpy(Y.values).type(torch.float32)
-
- train_x,test_x,train_y,test_y = train_test_split(X,Y)
-
- train_ds = TensorDataset(train_x,train_y)
- train_dl = DataLoader(train_ds,batch_size=batch,shuffle=True)
-
- test_ds = TensorDataset(test_x,test_y)
- test_dl = DataLoader(test_ds,batch_size=batch)
-
- model = nn.Sequential(
- nn.Linear(15,1),
- nn.Sigmoid()
- )
- optim = torch.optim.Adam(model.parameters(),lr=lr)
-
- accuracy_rate = []
-
- for epoch in range(epochs):
- for x,y in train_dl:
- y_pred = model(x)
- y_pred = y_pred.squeeze()
- loss = loss_fn(y_pred,y)
- optim.zero_grad()
- loss.backward()
- optim.step()
-
- with torch.no_grad():
- # 训练集的准确率和loss
- y_pred = model(train_x)
- y_pred = y_pred.squeeze()
- epoch_accuracy = accuracy(y_pred,train_y)
- epoch_loss = loss_fn(y_pred,train_y).data
-
- accuracy_rate.append(epoch_accuracy*100)
-
- # 测试集的准确率和loss
- y_pred = model(test_x)
- y_pred = y_pred.squeeze()
-
- epoch_test_accuracy = accuracy(y_pred,test_y)
- epoch_test_loss = loss_fn(y_pred,test_y).data
-
-
- print('epoch:',epoch,
- 'train_loss:',round(epoch_loss.item(),3),
- "train_accuracy",round(epoch_accuracy.item(),3),
- 'test_loss:',round(epoch_test_loss.item(),3),
- "test_accuracy",round(epoch_test_accuracy.item(),3)
- )
-
-
- accuracy_rate = np.array(accuracy_rate)
- times = np.linspace(1, epochs, epochs)
- plt.xlabel('epochs')
- plt.ylabel('accuracy rate')
- plt.plot(times, accuracy_rate)
- plt.show()
- epoch: 951 train_loss: 0.334 train_accuracy 0.869 test_loss: 0.346 test_accuracy 0.866
- epoch: 952 train_loss: 0.334 train_accuracy 0.863 test_loss: 0.348 test_accuracy 0.866
- epoch: 953 train_loss: 0.337 train_accuracy 0.867 test_loss: 0.358 test_accuracy 0.86
- epoch: 954 train_loss: 0.334 train_accuracy 0.867 test_loss: 0.35 test_accuracy 0.866
- epoch: 955 train_loss: 0.334 train_accuracy 0.871 test_loss: 0.346 test_accuracy 0.866
- epoch: 956 train_loss: 0.333 train_accuracy 0.865 test_loss: 0.348 test_accuracy 0.872
- epoch: 957 train_loss: 0.333 train_accuracy 0.871 test_loss: 0.349 test_accuracy 0.866
- epoch: 958 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.347 test_accuracy 0.866
- epoch: 959 train_loss: 0.334 train_accuracy 0.863 test_loss: 0.352 test_accuracy 0.866
- epoch: 960 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.35 test_accuracy 0.878
- epoch: 961 train_loss: 0.334 train_accuracy 0.873 test_loss: 0.346 test_accuracy 0.866
- epoch: 962 train_loss: 0.334 train_accuracy 0.865 test_loss: 0.353 test_accuracy 0.866
- epoch: 963 train_loss: 0.333 train_accuracy 0.873 test_loss: 0.35 test_accuracy 0.866
- epoch: 964 train_loss: 0.334 train_accuracy 0.863 test_loss: 0.345 test_accuracy 0.872
- epoch: 965 train_loss: 0.333 train_accuracy 0.861 test_loss: 0.351 test_accuracy 0.866
- epoch: 966 train_loss: 0.333 train_accuracy 0.873 test_loss: 0.348 test_accuracy 0.866
- epoch: 967 train_loss: 0.333 train_accuracy 0.863 test_loss: 0.348 test_accuracy 0.866
- epoch: 968 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.351 test_accuracy 0.866
- epoch: 969 train_loss: 0.334 train_accuracy 0.869 test_loss: 0.345 test_accuracy 0.878
- epoch: 970 train_loss: 0.333 train_accuracy 0.869 test_loss: 0.348 test_accuracy 0.872
- epoch: 971 train_loss: 0.335 train_accuracy 0.865 test_loss: 0.344 test_accuracy 0.86
- epoch: 972 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.35 test_accuracy 0.86
- epoch: 973 train_loss: 0.334 train_accuracy 0.871 test_loss: 0.345 test_accuracy 0.872
- epoch: 974 train_loss: 0.333 train_accuracy 0.865 test_loss: 0.351 test_accuracy 0.866
- epoch: 975 train_loss: 0.333 train_accuracy 0.873 test_loss: 0.351 test_accuracy 0.86
- epoch: 976 train_loss: 0.333 train_accuracy 0.869 test_loss: 0.346 test_accuracy 0.878
- epoch: 977 train_loss: 0.333 train_accuracy 0.863 test_loss: 0.351 test_accuracy 0.866
- epoch: 978 train_loss: 0.332 train_accuracy 0.865 test_loss: 0.351 test_accuracy 0.866
- epoch: 979 train_loss: 0.332 train_accuracy 0.871 test_loss: 0.349 test_accuracy 0.866
- epoch: 980 train_loss: 0.333 train_accuracy 0.865 test_loss: 0.345 test_accuracy 0.872
- epoch: 981 train_loss: 0.332 train_accuracy 0.867 test_loss: 0.348 test_accuracy 0.872
- epoch: 982 train_loss: 0.332 train_accuracy 0.863 test_loss: 0.349 test_accuracy 0.872
- epoch: 983 train_loss: 0.333 train_accuracy 0.865 test_loss: 0.353 test_accuracy 0.866
- epoch: 984 train_loss: 0.332 train_accuracy 0.865 test_loss: 0.35 test_accuracy 0.872
- epoch: 985 train_loss: 0.333 train_accuracy 0.867 test_loss: 0.353 test_accuracy 0.86
- epoch: 986 train_loss: 0.333 train_accuracy 0.871 test_loss: 0.345 test_accuracy 0.866
- epoch: 987 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.349 test_accuracy 0.872
- epoch: 988 train_loss: 0.332 train_accuracy 0.869 test_loss: 0.345 test_accuracy 0.872
- epoch: 989 train_loss: 0.332 train_accuracy 0.865 test_loss: 0.353 test_accuracy 0.866
- epoch: 990 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.348 test_accuracy 0.872
- epoch: 991 train_loss: 0.333 train_accuracy 0.875 test_loss: 0.344 test_accuracy 0.86
- epoch: 992 train_loss: 0.332 train_accuracy 0.865 test_loss: 0.351 test_accuracy 0.866
- epoch: 993 train_loss: 0.331 train_accuracy 0.869 test_loss: 0.348 test_accuracy 0.872
- epoch: 994 train_loss: 0.331 train_accuracy 0.871 test_loss: 0.348 test_accuracy 0.872
- epoch: 995 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.347 test_accuracy 0.872
- epoch: 996 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.347 test_accuracy 0.872
- epoch: 997 train_loss: 0.331 train_accuracy 0.867 test_loss: 0.35 test_accuracy 0.872
- epoch: 998 train_loss: 0.331 train_accuracy 0.867 test_loss: 0.349 test_accuracy 0.872
- epoch: 999 train_loss: 0.331 train_accuracy 0.865 test_loss: 0.348 test_accuracy 0.872
完整代码及数据集:代码和数据集
赞
踩
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。