赞
踩
朴素贝叶斯分类器
朴素贝叶斯分类器(Naive Bayes Classifier)是一种基于贝叶斯定理和特征条件独立假设的分类方法。它适用于分类任务,特别是文本分类、垃圾邮件识别等领域。
朴素贝叶斯分类器基于以下两个主要假设:
对于两个随机变量X和Y,贝叶斯定理可以表示为:
[ P(Y|X) = \frac{P(X|Y) \cdot P(Y)}{P(X)} ]
其中:
在分类任务中,X代表特征向量,Y代表类别。
朴素贝叶斯分类器在文本分类、垃圾邮件过滤、情感分析、新闻分类等领域有广泛应用。
Universal Bank 是一家业绩快速增长的银行。为了增加贷款业务,该银行探索将储蓄客户转变成个人贷款客户的方式。银行收集了5000条客户数据,包括客户特征(age、experience、income、family、 CCAvg、education、Zip Code)、客户对上一次贷款营销活动的响应( Personal Loan )、客户和银行的关系( mortgage,securities account.online.CD account、credit card)共13个特征,目标值是 Personal Loan,即客户是否接受了个人贷款。
ID | Age | Experience | Income | ZIP Code |
账户 | 年龄 | 经验 | 收入 | 邮政编码 |
Family | CCAvg | Education | Mortgage | Securities Account |
家庭成员人数 | 信用卡月平均消费 | 教育水平 | 按揭贷款数目 | 证券账户 |
CD Account | Online | CreditCard | Personal Loan | |
定期存款 | 在线 | 信用卡 | 个人贷款 (目标值) |
在5000个客户中,仅480个客户接受了提供给他们的个人贷款。
数据实例:
ID | Age | Experience | Income | ZIP Code | Family | CCAvg | Education | Mortgage | Personal Loan | Securities Account | CD Account | Online | CreditCard |
1 | 25 | 1 | 49 | 91107 | 4 | 1.6 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
2 | 45 | 19 | 34 | 90089 | 3 | 1.5 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | 39 | 15 | 11 | 94720 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 35 | 9 | 100 | 94112 | 1 | 2.7 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 35 | 8 | 45 | 91330 | 4 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 1 |
6 | 37 | 13 | 29 | 92121 | 4 | 0.4 | 2 | 155 | 0 | 0 | 0 | 1 | 0 |
7 | 53 | 27 | 72 | 91711 | 2 | 1.5 | 2 | 0 | 0 | 0 | 0 | 1 | 0 |
8 | 50 | 24 | 22 | 93943 | 1 | 0.3 | 3 | 0 | 0 | 0 | 0 | 0 | 1 |
9 | 35 | 10 | 81 | 90089 | 3 | 0.6 | 2 | 104 | 0 | 0 | 0 | 1 | 0 |
10 | 34 | 9 | 180 | 93023 | 1 | 8.9 | 3 | 0 | 1 | 0 | 0 | 0 | 0 |
11 | 65 | 39 | 105 | 94710 | 4 | 2.4 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
12 | 29 | 5 | 45 | 90277 | 3 | 0.1 | 2 | 0 | 0 | 0 | 0 | 1 | 0 |
13 | 48 | 23 | 114 | 93106 | 2 | 3.8 | 3 | 0 | 0 | 1 | 0 | 0 | 0 |
14 | 59 | 32 | 40 | 94920 | 4 | 2.5 | 2 | 0 | 0 | 0 | 0 | 1 | 0 |
15 | 67 | 41 | 112 | 91741 | 1 | 2 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
16 | 60 | 30 | 22 | 95054 | 1 | 1.5 | 3 | 0 | 0 | 0 | 0 | 1 | 1 |
17 | 38 | 14 | 130 | 95010 | 4 | 4.7 | 3 | 134 | 1 | 0 | 0 | 0 | 0 |
18 | 42 | 18 | 81 | 94305 | 4 | 2.4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
19 | 46 | 21 | 193 | 91604 | 2 | 8.1 | 3 | 0 | 1 | 0 | 0 | 0 | 0 |
20 | 55 | 28 | 21 | 94720 | 1 | 0.5 | 2 | 0 | 0 | 1 | 0 | 0 | 1 |
21 | 56 | 31 | 25 | 94015 | 4 | 0.9 | 2 | 111 | 0 | 0 | 0 | 1 | 0 |
22 | 57 | 27 | 63 | 90095 | 3 | 2 | 3 | 0 | 0 | 0 | 0 | 1 | 0 |
23 | 29 | 5 | 62 | 90277 | 1 | 1.2 | 1 | 260 | 0 | 0 | 0 | 1 | 0 |
24 | 44 | 18 | 43 | 91320 | 2 | 0.7 | 1 | 163 | 0 | 1 | 0 | 0 | 0 |
25 | 36 | 11 | 152 | 95521 | 2 | 3.9 | 1 | 159 | 0 | 0 | 0 | 0 | 1 |
26 | 43 | 19 | 29 | 94305 | 3 | 0.5 | 1 | 97 | 0 | 0 | 0 | 1 | 0 |
27 | 40 | 16 | 83 | 95064 | 4 | 0.2 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
注意:数据集中的编号(ID)和邮政编码(ZIP CODE)特征因为在分类模型中无意义,所以在数据预处理阶段将它们删除。
- #1. 读入数据
- df = pd.read_csv('universalbank.csv')
- y = df['Personal Loan']
- X = df.drop(['ID', 'ZIP Code', 'Personal Loan'], axis = 1)
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 0)
- #2. 训练高斯朴素贝叶斯模型
- gnb = GaussianNB()
- gnb.fit(X_train, y_train)
- # 3. 评估模型
- y_pred = gnb.predict(X_test)
- print('测试数据的预测结果:', y_pred)
-
- acc = gnb.score(X_test, y_test)
- print('GaussianNB模型的准确度:',acc)
- # 1. 读入数据
- y = df['Personal Loan']
- X = df[['Family', 'Education', 'Securities Account',
- 'CD Account', 'Online', 'CreditCard']] #只选用6个特征
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
- #2. 训练多项式朴素贝叶斯模型
- mnb = MultinomialNB()
- mnb.fit(X_train, y_train)
- y_pred = mnb.predict(X_test)
- print('测试数据的预测结果:', y_pred)
-
- acc = mnb.score(X_test, y_test)
- print('MultinomialNB模型的准确度:',acc)
代码:
- import pandas as pd
- from sklearn.model_selection import train_test_split
- from sklearn.naive_bayes import GaussianNB
- from sklearn.naive_bayes import MultinomialNB
-
- #1. 读入数据
- df = pd.read_csv('universalbank.csv')
- y = df['Personal Loan']
- X = df.drop(['ID', 'ZIP Code', 'Personal Loan'], axis = 1)
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 0)
-
- #2. 训练高斯朴素贝叶斯模型
- gnb = GaussianNB()
- gnb.fit(X_train, y_train)
-
- # 3. 评估模型
- y_pred = gnb.predict(X_test)
- print('测试数据的预测结果:', y_pred)
-
- acc = gnb.score(X_test, y_test)
- print('GaussianNB模型的准确度:',acc)
-
- # 1. 读入数据
- y = df['Personal Loan']
- X = df[['Family', 'Education', 'Securities Account',
- 'CD Account', 'Online', 'CreditCard']] #只选用6个特征
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
-
- #2. 训练多项式朴素贝叶斯模型
- mnb = MultinomialNB()
- mnb.fit(X_train, y_train)
-
- y_pred = mnb.predict(X_test)
- print('测试数据的预测结果:', y_pred)
-
- acc = mnb.score(X_test, y_test)
- print('MultinomialNB模型的准确度:',acc)
-
-
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。