赞
踩
既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!
由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新
这道题采用贝叶斯算法能够保证该数据集下准确率在100%。
# 朴素贝叶斯 from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn.metrics import accuracy_score def train_and_predict(train_input_features, train_outputs, prediction_features): G = GaussianNB() G.fit(train_input_features, train_outputs) y_pred = G.predict(prediction_features) return y_pred iris = datasets.load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=0) y_pred = train_and_predict(X_train, y_train, X_test) if y_pred is not None: print(accuracy_score(y_pred,y_test))
原题:鸢尾花分类_2_牛客题霸_牛客网 (nowcoder.com)")
我使用的是决策树模型,默认参数下该二分类问题准确率还是100%
import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.tree import DecisionTreeClassifier def transform_three2two_cate(): data = datasets.load_iris() new_data = np.hstack([data.data, data.target[:, np.newaxis]]) new_feat = new_data[new_data[:, -1] != 2][:, :4] new_label = new_data[new_data[:, -1] != 2][:, -1] return new_feat, new_label def train_and_evaluate(): data_X, data_Y = transform_three2two_cate() train_x, test_x, train_y, test_y = train_test_split(data_X, data_Y, test_size=0.2) DT = DecisionTreeClassifier() DT.fit(train_x, train_y) y_predict = DT.predict(test_x) print(accuracy_score(y_predict, test_y)) if __name__ == "__main__": train_and_evaluate()
原题:决策树的生成与训练-信息熵的计算_牛客题霸_牛客网 (nowcoder.com)")
这道题十分简单,我的做法是把下面的数据转换为numpy的ndarray矩阵取出最后一列,直接套公式:
import numpy as np import pandas as pd from collections import Counter dataSet = pd.read_csv('dataSet.csv', header=None).values[:, -1] def calcInfoEnt(dataSet): numEntres = len(dataSet) cnt = Counter(dataSet) # 计数每个值出现的次数 probability_lst = [1.0 * cnt[i] / numEntres for i in cnt] return -np.sum([p * np.log2(p) for p in probability_lst]) if __name__ == '__main__': print(calcInfoEnt(dataSet))
原题:决策树的生成与训练-信息增益_牛客题霸_牛客网 (nowcoder.com)")
import numpy as np import pandas as pd from collections import Counter import random dataSet = pd.read_csv('dataSet.csv', header=None).values.T # 转置 5*15数组 def entropy(data): # data 一维数组 numEntres = len(data) cnt = Counter(data) # 计数每个值出现的次数 Counter({1: 8, 0: 5}) probability_lst = [1.0 * cnt[i] / numEntres for i in cnt] return -np.sum([p * np.log2(p) for p in probability_lst]) # 返回信息熵 def calc_max_info_gain(dataSet): label = np.array(dataSet[-1]) total_entropy = entropy(label) ![img](https://img-blog.csdnimg.cn/img_convert/68019c3cc066e42366bb84596c9067ba.png) ![img](https://img-blog.csdnimg.cn/img_convert/3e28c41c60706dbab1a1131b78058d4d.png) ![img](https://img-blog.csdnimg.cn/img_convert/18955aae0548c7200428717f10fc8731.png) **既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!** **由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新** **[需要这份系统化资料的朋友,可以戳这里获取](https://bbs.csdn.net/topics/618545628)** 以上大数据知识点,真正体系化!** **由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新** **[需要这份系统化资料的朋友,可以戳这里获取](https://bbs.csdn.net/topics/618545628)**
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。