当前位置:   article > 正文

常用的机器学习算法(使用 Python 和 R 代码)_机器学习lr的r语言代码

机器学习lr的r语言代码

R代码最常用的10种机器学习算法在Python和R中的代码对比:

① 线性回归算法(Linear Regression)

② 逻辑回归算法(Logistic Regression)

③ 决策树算法(Decision Tree

④ 支持向量机算法(SVM)

⑤ 朴素贝叶斯算法(Naive Bayes)

⑥ K邻近算法(k- Nearest Neighbors,kNN)

⑦ K均值算法(k-Means)

⑧ 随机森林算法(Random Forest)

⑨ 主成分分析算法(PCA)

⑩ 梯度提升树(Gradient Boosting)

  1. GBM
  2. XGBoos
  3. LightGBM
  4. CatBoost

一、线性回归算法(Linear Regression)

线性回归主要有两种类型:简单线性回归和多元线性回归。简单线性回归的特征是一个自变量。而且,多元线性回归(顾名思义)的特征是多个(超过1个)自变量。在查找最佳拟合线时,可以拟合多项式或曲线回归。这些被称为多项式或曲线回归。

1、Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.linear_model import LinearRegression
  4. from sklearn.metrics import mean_squared_error
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train.csv')
  7. test_data = pd.read_csv('test.csv')
  8. print(train_data.head())
  9. # shape of the dataset
  10. print('\nShape of training data :',train_data.shape)
  11. print('\nShape of testing data :',test_data.shape)
  12. # Now, we need to predict the missing target variable in the test data
  13. # target variable - Item_Outlet_Sales
  14. # seperate the independent and target variable on training data
  15. train_x = train_data.drop(columns=['Item_Outlet_Sales'],axis=1)
  16. train_y = train_data['Item_Outlet_Sales']
  17. # seperate the independent and target variable on training data
  18. test_x = test_data.drop(columns=['Item_Outlet_Sales'],axis=1)
  19. test_y = test_data['Item_Outlet_Sales']
  20. '''
  21. Create the object of the Linear Regression model
  22. You can also add other parameters and test your code here
  23. Some parameters are : fit_intercept and normalize
  24. Documentation of sklearn LinearRegression:
  25. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
  26. '''
  27. model = LinearRegression()
  28. # fit the model with the training data
  29. model.fit(train_x,train_y)
  30. # coefficeints of the trained model
  31. print('\nCoefficient of model :', model.coef_)
  32. # intercept of the model
  33. print('\nIntercept of model',model.intercept_)
  34. # predict the target on the test dataset
  35. predict_train = model.predict(train_x)
  36. print('\nItem_Outlet_Sales on training data',predict_train)
  37. # Root Mean Squared Error on training dataset
  38. rmse_train = mean_squared_error(train_y,predict_train)**(0.5)
  39. print('\nRMSE on train dataset : ', rmse_train)
  40. # predict the target on the testing dataset
  41. predict_test = model.predict(test_x)
  42. print('\nItem_Outlet_Sales on test data',predict_test)
  43. # Root Mean Squared Error on testing dataset
  44. rmse_test = mean_squared_error(test_y,predict_test)**(0.5)
  45. print('\nRMSE on test dataset : ', rmse_test)

2、R代码

  1. #Load Train and Test datasets
  2. #Identify feature and response variable(s) and values must be numeric and numpy arrays
  3. x_train <- input_variables_values_training_datasets
  4. y_train <- target_variables_values_training_datasets
  5. x_test <- input_variables_values_test_datasets
  6. x <- cbind(x_train,y_train)
  7. # Train the model using the training sets and check score
  8. linear <- lm(y_train ~ ., data = x)
  9. summary(linear)
  10. #Predict Output
  11. predicted= predict(linear,x_test)

二、逻辑回归算法(Logistic Regression)

不要被它的名字弄糊涂了!它是一种分类,而不是回归算法。它用于根据给定的自变量集估计离散值(二进制值,如0/1,是/否,真/假)。简而言之,它通过将数据拟合到logit函数来预测事件发生的概率。因此,它也被称为logit回归。由于它预测概率,因此其输出值介于 0 和 1 之间(如预期的那样)。

同样,让我们通过一个简单的例子来尝试理解这一点。

假设你的朋友给了你一个谜题来解决。只有2个结果场景 - 要么你解决它,要么你不解决它。现在想象一下,你正在接受各种各样的谜题/测验,试图了解你擅长哪些主题。这项研究的结果将是这样的 - 如果你得到一个基于三角学的十年级问题,你有70%的1可能性来解决它。另一方面,如果是五年级历史问题,得到答案的概率只有30%。这就是 Logistic 回归为您提供的。

1、Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.linear_model import LogisticRegression
  4. from sklearn.metrics import accuracy_score
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train-data.csv')
  7. test_data = pd.read_csv('test-data.csv')
  8. print(train_data.head())
  9. # shape of the dataset
  10. print('Shape of training data :',train_data.shape)
  11. print('Shape of testing data :',test_data.shape)
  12. # Now, we need to predict the missing target variable in the test data
  13. # target variable - Survived
  14. # seperate the independent and target variable on training data
  15. train_x = train_data.drop(columns=['Survived'],axis=1)
  16. train_y = train_data['Survived']
  17. # seperate the independent and target variable on testing data
  18. test_x = test_data.drop(columns=['Survived'],axis=1)
  19. test_y = test_data['Survived']
  20. '''
  21. Create the object of the Logistic Regression model
  22. You can also add other parameters and test your code here
  23. Some parameters are : fit_intercept and penalty
  24. Documentation of sklearn LogisticRegression:
  25. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
  26. '''
  27. model = LogisticRegression()
  28. # fit the model with the training data
  29. model.fit(train_x,train_y)
  30. # coefficeints of the trained model
  31. print('Coefficient of model :', model.coef_)
  32. # intercept of the model
  33. print('Intercept of model',model.intercept_)
  34. # predict the target on the train dataset
  35. predict_train = model.predict(train_x)
  36. print('Target on train data',predict_train)
  37. # Accuray Score on train dataset
  38. accuracy_train = accuracy_score(train_y,predict_train)
  39. print('accuracy_score on train dataset : ', accuracy_train)
  40. # predict the target on the test dataset
  41. predict_test = model.predict(test_x)
  42. print('Target on test data',predict_test)
  43. # Accuracy Score on test dataset
  44. accuracy_test = accuracy_score(test_y,predict_test)
  45. print('accuracy_score on test dataset : ', accuracy_test)

2、R代码

  1. x <- cbind(x_train,y_train)
  2. # Train the model using the training sets and check score
  3. logistic <- glm(y_train ~ ., data = x,family='binomial')
  4. summary(logistic)
  5. #Predict Output
  6. predicted= predict(logistic,x_test)

三、决策树算法(Decision Tree)

1、Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.tree import DecisionTreeClassifier
  4. from sklearn.metrics import accuracy_score
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train-data.csv')
  7. test_data = pd.read_csv('test-data.csv')
  8. # shape of the dataset
  9. print('Shape of training data :',train_data.shape)
  10. print('Shape of testing data :',test_data.shape)
  11. # Now, we need to predict the missing target variable in the test data
  12. # target variable - Survived
  13. # seperate the independent and target variable on training data
  14. train_x = train_data.drop(columns=['Survived'],axis=1)
  15. train_y = train_data['Survived']
  16. # seperate the independent and target variable on testing data
  17. test_x = test_data.drop(columns=['Survived'],axis=1)
  18. test_y = test_data['Survived']
  19. '''
  20. Create the object of the Decision Tree model
  21. You can also add other parameters and test your code here
  22. Some parameters are : max_depth and max_features
  23. Documentation of sklearn DecisionTreeClassifier:
  24. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
  25. '''
  26. model = DecisionTreeClassifier()
  27. # fit the model with the training data
  28. model.fit(train_x,train_y)
  29. # depth of the decision tree
  30. print('Depth of the Decision Tree :', model.get_depth())
  31. # predict the target on the train dataset
  32. predict_train = model.predict(train_x)
  33. print('Target on train data',predict_train)
  34. # Accuray Score on train dataset
  35. accuracy_train = accuracy_score(train_y,predict_train)
  36. print('accuracy_score on train dataset : ', accuracy_train)
  37. # predict the target on the test dataset
  38. predict_test = model.predict(test_x)
  39. print('Target on test data',predict_test)
  40. # Accuracy Score on test dataset
  41. accuracy_test = accuracy_score(test_y,predict_test)
  42. print('accuracy_score on test dataset : ', accuracy_test)

2、R代码

  1. library(rpart)
  2. x <- cbind(x_train,y_train)
  3. # grow tree
  4. fit <- rpart(y_train ~ ., data = x,method="class")
  5. summary(fit)
  6. #Predict Output
  7. predicted= predict(fit,x_test)

四、支持向量机算法(SVM)

  1. Python代码
  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.svm import SVC
  4. from sklearn.metrics import accuracy_score
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train-data.csv')
  7. test_data = pd.read_csv('test-data.csv')
  8. # shape of the dataset
  9. print('Shape of training data :',train_data.shape)
  10. print('Shape of testing data :',test_data.shape)
  11. # Now, we need to predict the missing target variable in the test data
  12. # target variable - Survived
  13. # seperate the independent and target variable on training data
  14. train_x = train_data.drop(columns=['Survived'],axis=1)
  15. train_y = train_data['Survived']
  16. # seperate the independent and target variable on testing data
  17. test_x = test_data.drop(columns=['Survived'],axis=1)
  18. test_y = test_data['Survived']
  19. '''
  20. Create the object of the Support Vector Classifier model
  21. You can also add other parameters and test your code here
  22. Some parameters are : kernal and degree
  23. Documentation of sklearn Support Vector Classifier:
  24. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
  25. '''
  26. model = SVC()
  27. # fit the model with the training data
  28. model.fit(train_x,train_y)
  29. # predict the target on the train dataset
  30. predict_train = model.predict(train_x)
  31. print('Target on train data',predict_train)
  32. # Accuray Score on train dataset
  33. accuracy_train = accuracy_score(train_y,predict_train)
  34. print('accuracy_score on train dataset : ', accuracy_train)
  35. # predict the target on the test dataset
  36. predict_test = model.predict(test_x)
  37. print('Target on test data',predict_test)
  38. # Accuracy Score on test dataset
  39. accuracy_test = accuracy_score(test_y,predict_test)
  40. print('accuracy_score on test dataset : ', accuracy_test)
  1. R代码
  1. library(e1071)
  2. x <- cbind(x_train,y_train)
  3. # Fitting model
  4. fit <-svm(y_train ~ ., data = x)
  5. summary(fit)
  6. #Predict Output
  7. predicted= predict(fit,x_test)

五、朴素贝叶斯算法(Naive Bayes)

  1. Python代码
  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.naive_bayes import GaussianNB
  4. from sklearn.metrics import accuracy_score
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train-data.csv')
  7. test_data = pd.read_csv('test-data.csv')
  8. # shape of the dataset
  9. print('Shape of training data :',train_data.shape)
  10. print('Shape of testing data :',test_data.shape)
  11. # Now, we need to predict the missing target variable in the test data
  12. # target variable - Survived
  13. # seperate the independent and target variable on training data
  14. train_x = train_data.drop(columns=['Survived'],axis=1)
  15. train_y = train_data['Survived']
  16. # seperate the independent and target variable on testing data
  17. test_x = test_data.drop(columns=['Survived'],axis=1)
  18. test_y = test_data['Survived']
  19. '''
  20. Create the object of the Naive Bayes model
  21. You can also add other parameters and test your code here
  22. Some parameters are : var_smoothing
  23. Documentation of sklearn GaussianNB:
  24. https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
  25. '''
  26. model = GaussianNB()
  27. # fit the model with the training data
  28. model.fit(train_x,train_y)
  29. # predict the target on the train dataset
  30. predict_train = model.predict(train_x)
  31. print('Target on train data',predict_train)
  32. # Accuray Score on train dataset
  33. accuracy_train = accuracy_score(train_y,predict_train)
  34. print('accuracy_score on train dataset : ', accuracy_train)
  35. # predict the target on the test dataset
  36. predict_test = model.predict(test_x)
  37. print('Target on test data',predict_test)
  38. # Accuracy Score on test dataset
  39. accuracy_test = accuracy_score(test_y,predict_test)
  40. print('accuracy_score on test dataset : ', accuracy_test)
  1. R代码
  1. library(e1071)
  2. x <- cbind(x_train,y_train)
  3. # Fitting model
  4. fit <-naiveBayes(y_train ~ ., data = x)
  5. summary(fit)
  6. #Predict Output
  7. predicted= predict(fit,x_test)

六、K邻近算法(k- Nearest Neighbors,kNN)

1、 Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.neighbors import KNeighborsClassifier
  4. from sklearn.metrics import accuracy_score
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train-data.csv')
  7. test_data = pd.read_csv('test-data.csv')
  8. # shape of the dataset
  9. print('Shape of training data :',train_data.shape)
  10. print('Shape of testing data :',test_data.shape)
  11. # Now, we need to predict the missing target variable in the test data
  12. # target variable - Survived
  13. # seperate the independent and target variable on training data
  14. train_x = train_data.drop(columns=['Survived'],axis=1)
  15. train_y = train_data['Survived']
  16. # seperate the independent and target variable on testing data
  17. test_x = test_data.drop(columns=['Survived'],axis=1)
  18. test_y = test_data['Survived']
  19. '''
  20. Create the object of the K-Nearest Neighbor model
  21. You can also add other parameters and test your code here
  22. Some parameters are : n_neighbors, leaf_size
  23. Documentation of sklearn K-Neighbors Classifier:
  24. https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
  25. '''
  26. model = KNeighborsClassifier()
  27. # fit the model with the training data
  28. model.fit(train_x,train_y)
  29. # Number of Neighbors used to predict the target
  30. print('\nThe number of neighbors used to predict the target : ',model.n_neighbors)
  31. # predict the target on the train dataset
  32. predict_train = model.predict(train_x)
  33. print('\nTarget on train data',predict_train)
  34. # Accuray Score on train dataset
  35. accuracy_train = accuracy_score(train_y,predict_train)
  36. print('accuracy_score on train dataset : ', accuracy_train)
  37. # predict the target on the test dataset
  38. predict_test = model.predict(test_x)
  39. print('Target on test data',predict_test)
  40. # Accuracy Score on test dataset
  41. accuracy_test = accuracy_score(test_y,predict_test)
  42. print('accuracy_score on test dataset : ', accuracy_test)

2、 R代码

  1. library(knn)
  2. x <- cbind(x_train,y_train)
  3. # Fitting model
  4. fit <-knn(y_train ~ ., data = x,k=5)
  5. summary(fit)
  6. #Predict Output
  7. predicted= predict(fit,x_test)

七、K均值算法(k-Means)

1、 Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.cluster import KMeans
  4. # read the train and test dataset
  5. train_data = pd.read_csv('train-data.csv')
  6. test_data = pd.read_csv('test-data.csv')
  7. # shape of the dataset
  8. print('Shape of training data :',train_data.shape)
  9. print('Shape of testing data :',test_data.shape)
  10. # Now, we need to divide the training data into differernt clusters
  11. # and predict in which cluster a particular data point belongs.
  12. '''
  13. Create the object of the K-Means model
  14. You can also add other parameters and test your code here
  15. Some parameters are : n_clusters and max_iter
  16. Documentation of sklearn KMeans:
  17. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
  18. '''
  19. model = KMeans()
  20. # fit the model with the training data
  21. model.fit(train_data)
  22. # Number of Clusters
  23. print('\nDefault number of Clusters : ',model.n_clusters)
  24. # predict the clusters on the train dataset
  25. predict_train = model.predict(train_data)
  26. print('\nCLusters on train data',predict_train)
  27. # predict the target on the test dataset
  28. predict_test = model.predict(test_data)
  29. print('Clusters on test data',predict_test)
  30. # Now, we will train a model with n_cluster = 3
  31. model_n3 = KMeans(n_clusters=3)
  32. # fit the model with the training data
  33. model_n3.fit(train_data)
  34. # Number of Clusters
  35. print('\nNumber of Clusters : ',model_n3.n_clusters)
  36. # predict the clusters on the train dataset
  37. predict_train_3 = model_n3.predict(train_data)
  38. print('\nCLusters on train data',predict_train_3)
  39. # predict the target on the test dataset
  40. predict_test_3 = model_n3.predict(test_data)
  41. print('Clusters on test data',predict_test_3)

2、 R代码

  1. library(cluster)
  2. fit <- kmeans(X, 3) # 5 cluster solution

八、随机森林算法(Random Forest)

1、 Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.ensemble import RandomForestClassifier
  4. from sklearn.metrics import accuracy_score
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train-data.csv')
  7. test_data = pd.read_csv('test-data.csv')
  8. # view the top 3 rows of the dataset
  9. print(train_data.head(3))
  10. # shape of the dataset
  11. print('\nShape of training data :',train_data.shape)
  12. print('\nShape of testing data :',test_data.shape)
  13. # Now, we need to predict the missing target variable in the test data
  14. # target variable - Survived
  15. # seperate the independent and target variable on training data
  16. train_x = train_data.drop(columns=['Survived'],axis=1)
  17. train_y = train_data['Survived']
  18. # seperate the independent and target variable on testing data
  19. test_x = test_data.drop(columns=['Survived'],axis=1)
  20. test_y = test_data['Survived']
  21. '''
  22. Create the object of the Random Forest model
  23. You can also add other parameters and test your code here
  24. Some parameters are : n_estimators and max_depth
  25. Documentation of sklearn RandomForestClassifier:
  26. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
  27. '''
  28. model = RandomForestClassifier()
  29. # fit the model with the training data
  30. model.fit(train_x,train_y)
  31. # number of trees used
  32. print('Number of Trees used : ', model.n_estimators)
  33. # predict the target on the train dataset
  34. predict_train = model.predict(train_x)
  35. print('\nTarget on train data',predict_train)
  36. # Accuray Score on train dataset
  37. accuracy_train = accuracy_score(train_y,predict_train)
  38. print('\naccuracy_score on train dataset : ', accuracy_train)
  39. # predict the target on the test dataset
  40. predict_test = model.predict(test_x)
  41. print('\nTarget on test data',predict_test)
  42. # Accuracy Score on test dataset
  43. accuracy_test = accuracy_score(test_y,predict_test)
  44. print('\naccuracy_score on test dataset : ', accuracy_test)

2、 R代码

  1. library(randomForest)
  2. x <- cbind(x_train,y_train)
  3. # Fitting model
  4. fit <- randomForest(Species ~ ., x,ntree=500)
  5. summary(fit)
  6. #Predict Output
  7. predicted= predict(fit,x_test)

九、主成分分析算法(PCA)

1、 Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.decomposition import PCA
  4. from sklearn.linear_model import LinearRegression
  5. from sklearn.metrics import mean_squared_error
  6. # read the train and test dataset
  7. train_data = pd.read_csv('train.csv')
  8. test_data = pd.read_csv('test.csv')
  9. # view the top 3 rows of the dataset
  10. print(train_data.head(3))
  11. # shape of the dataset
  12. print('\nShape of training data :',train_data.shape)
  13. print('\nShape of testing data :',test_data.shape)
  14. # Now, we need to predict the missing target variable in the test data
  15. # target variable - Survived
  16. # seperate the independent and target variable on training data
  17. # target variable - Item_Outlet_Sales
  18. train_x = train_data.drop(columns=['Item_Outlet_Sales'],axis=1)
  19. train_y = train_data['Item_Outlet_Sales']
  20. # seperate the independent and target variable on testing data
  21. test_x = test_data.drop(columns=['Item_Outlet_Sales'],axis=1)
  22. test_y = test_data['Item_Outlet_Sales']
  23. print('\nTraining model with {} dimensions.'.format(train_x.shape[1]))
  24. # create object of model
  25. model = LinearRegression()
  26. # fit the model with the training data
  27. model.fit(train_x,train_y)
  28. # predict the target on the train dataset
  29. predict_train = model.predict(train_x)
  30. # Accuray Score on train dataset
  31. rmse_train = mean_squared_error(train_y,predict_train)**(0.5)
  32. print('\nRMSE on train dataset : ', rmse_train)
  33. # predict the target on the test dataset
  34. predict_test = model.predict(test_x)
  35. # Accuracy Score on test dataset
  36. rmse_test = mean_squared_error(test_y,predict_test)**(0.5)
  37. print('\nRMSE on test dataset : ', rmse_test)
  38. # create the object of the PCA (Principal Component Analysis) model
  39. # reduce the dimensions of the data to 12
  40. '''
  41. You can also add other parameters and test your code here
  42. Some parameters are : svd_solver, iterated_power
  43. Documentation of sklearn PCA:
  44. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
  45. '''
  46. model_pca = PCA(n_components=12)
  47. new_train = model_pca.fit_transform(train_x)
  48. new_test = model_pca.fit_transform(test_x)
  49. print('\nTraining model with {} dimensions.'.format(new_train.shape[1]))
  50. # create object of model
  51. model_new = LinearRegression()
  52. # fit the model with the training data
  53. model_new.fit(new_train,train_y)
  54. # predict the target on the new train dataset
  55. predict_train_pca = model_new.predict(new_train)
  56. # Accuray Score on train dataset
  57. rmse_train_pca = mean_squared_error(train_y,predict_train_pca)**(0.5)
  58. print('\nRMSE on new train dataset : ', rmse_train_pca)
  59. # predict the target on the new test dataset
  60. predict_test_pca = model_new.predict(new_test)
  61. # Accuracy Score on test dataset
  62. rmse_test_pca = mean_squared_error(test_y,predict_test_pca)**(0.5)
  63. print('\nRMSE on new test dataset : ', rmse_test_pca)

2、 R代码

  1. library(stats)
  2. pca <- princomp(train, cor = TRUE)
  3. train_reduced <- predict(pca,train)
  4. test_reduced <- predict(pca,test)

十、梯度提升树(Gradient Boosting)

  1. GBM

1、 Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from sklearn.ensemble import GradientBoostingClassifier
  4. from sklearn.metrics import accuracy_score
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train-data.csv')
  7. test_data = pd.read_csv('test-data.csv')
  8. # shape of the dataset
  9. print('Shape of training data :',train_data.shape)
  10. print('Shape of testing data :',test_data.shape)
  11. # Now, we need to predict the missing target variable in the test data
  12. # target variable - Survived
  13. # seperate the independent and target variable on training data
  14. train_x = train_data.drop(columns=['Survived'],axis=1)
  15. train_y = train_data['Survived']
  16. # seperate the independent and target variable on testing data
  17. test_x = test_data.drop(columns=['Survived'],axis=1)
  18. test_y = test_data['Survived']
  19. '''
  20. Create the object of the GradientBoosting Classifier model
  21. You can also add other parameters and test your code here
  22. Some parameters are : learning_rate, n_estimators
  23. Documentation of sklearn GradientBoosting Classifier:
  24. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
  25. '''
  26. model = GradientBoostingClassifier(n_estimators=100,max_depth=5)
  27. # fit the model with the training data
  28. model.fit(train_x,train_y)
  29. # predict the target on the train dataset
  30. predict_train = model.predict(train_x)
  31. print('\nTarget on train data',predict_train)
  32. # Accuray Score on train dataset
  33. accuracy_train = accuracy_score(train_y,predict_train)
  34. print('\naccuracy_score on train dataset : ', accuracy_train)
  35. # predict the target on the test dataset
  36. predict_test = model.predict(test_x)
  37. print('\nTarget on test data',predict_test)
  38. # Accuracy Score on test dataset
  39. accuracy_test = accuracy_score(test_y,predict_test)
  40. print('\naccuracy_score on test dataset : ', accuracy_test)

2、 R代码

  1. library(caret)
  2. x <- cbind(x_train,y_train)
  3. # Fitting model
  4. fitControl <- trainControl( method = "repeatedcv", number = 4, repeats = 4)
  5. fit <- train(y ~ ., data = x, method = "gbm", trControl = fitControl,verbose = FALSE)
  6. predicted= predict(fit,x_test,type= "prob")[,2]
  1. XGBoost

1、  Python代码

  1. # importing required libraries
  2. import pandas as pd
  3. from xgboost import XGBClassifier
  4. from sklearn.metrics import accuracy_score
  5. # read the train and test dataset
  6. train_data = pd.read_csv('train-data.csv')
  7. test_data = pd.read_csv('test-data.csv')
  8. # shape of the dataset
  9. print('Shape of training data :',train_data.shape)
  10. print('Shape of testing data :',test_data.shape)
  11. # Now, we need to predict the missing target variable in the test data
  12. # target variable - Survived
  13. # seperate the independent and target variable on training data
  14. train_x = train_data.drop(columns=['Survived'],axis=1)
  15. train_y = train_data['Survived']
  16. # seperate the independent and target variable on testing data
  17. test_x = test_data.drop(columns=['Survived'],axis=1)
  18. test_y = test_data['Survived']
  19. '''
  20. Create the object of the XGBoost model
  21. You can also add other parameters and test your code here
  22. Some parameters are : max_depth and n_estimators
  23. Documentation of xgboost:
  24. https://xgboost.readthedocs.io/en/latest/
  25. '''
  26. model = XGBClassifier()
  27. # fit the model with the training data
  28. model.fit(train_x,train_y)
  29. # predict the target on the train dataset
  30. predict_train = model.predict(train_x)
  31. print('\nTarget on train data',predict_train)
  32. # Accuray Score on train dataset
  33. accuracy_train = accuracy_score(train_y,predict_train)
  34. print('\naccuracy_score on train dataset : ', accuracy_train)
  35. # predict the target on the test dataset
  36. predict_test = model.predict(test_x)
  37. print('\nTarget on test data',predict_test)
  38. # Accuracy Score on test dataset
  39. accuracy_test = accuracy_score(test_y,predict_test)
  40. print('\naccuracy_score on test dataset : ', accuracy_test)

2、  R代码

  1. require(caret)
  2. x <- cbind(x_train,y_train)
  3. # Fitting model
  4. TrainControl <- trainControl( method = "repeatedcv", number = 10, repeats = 4)
  5. model<- train(y ~ ., data = x, method = "xgbLinear", trControl = TrainControl,verbose = FALSE)
  6. OR
  7. model<- train(y ~ ., data = x, method = "xgbTree", trControl = TrainControl,verbose = FALSE)
  8. predicted <- predict(model, x_test)
  1. LightGBM

1、  Python代码

  1. data = np.random.rand(500, 10) # 500 entities, each contains 10 features
  2. label = np.random.randint(2, size=500) # binary target
  3. train_data = lgb.Dataset(data, label=label)
  4. test_data = train_data.create_valid('test.svm')
  5. param = {'num_leaves':31, 'num_trees':100, 'objective':'binary'}
  6. param['metric'] = 'auc'
  7. num_round = 10
  8. bst = lgb.train(param, train_data, num_round, valid_sets=[test_data])
  9. bst.save_model('model.txt')
  10. # 7 entities, each contains 10 features
  11. data = np.random.rand(7, 10)
  12. ypred = bst.predict(data)

2、  R代码

  1. library(RLightGBM)
  2. data(example.binary)
  3. #Parameters
  4. num_iterations <- 100
  5. config <- list(objective = "binary", metric="binary_logloss,auc", learning_rate = 0.1, num_leaves = 63, tree_learner = "serial", feature_fraction = 0.8, bagging_freq = 5, bagging_fraction = 0.8, min_data_in_leaf = 50, min_sum_hessian_in_leaf = 5.0)
  6. #Create data handle and booster
  7. handle.data <- lgbm.data.create(x)
  8. lgbm.data.setField(handle.data, "label", y)
  9. handle.booster <- lgbm.booster.create(handle.data, lapply(config, as.character))
  10. #Train for num_iterations iterations and eval every 5 steps
  11. lgbm.booster.train(handle.booster, num_iterations, 5)
  12. #Predict
  13. pred <- lgbm.booster.predict(handle.booster, x.test)
  14. #Test accuracy
  15. sum(y.test == (y.pred > 0.5)) / length(y.test)
  16. #Save model (can be loaded again via lgbm.booster.load(filename))
  17. lgbm.booster.save(handle.booster, filename = "/tmp/model.txt")
  1. require(caret)
  2. require(RLightGBM)
  3. data(iris)
  4. model <-caretModel.LGBM()
  5. fit <- train(Species ~ ., data = iris, method=model, verbosity = 0)
  6. print(fit)
  7. y.pred <- predict(fit, iris[,1:4])
  8. library(Matrix)
  9. model.sparse <- caretModel.LGBM.sparse()
  10. #Generate a sparse matrix
  11. mat <- Matrix(as.matrix(iris[,1:4]), sparse = T)
  12. fit <- train(data.frame(idx = 1:nrow(iris)), iris$Species, method = model.sparse, matrix = mat, verbosity = 0)
  13. print(fit)
  1. CatBoost

1、  Python代码

  1. import pandas as pd
  2. import numpy as np
  3. from catboost import CatBoostRegressor
  4. #Read training and testing files
  5. train = pd.read_csv("train.csv")
  6. test = pd.read_csv("test.csv")
  7. #Imputing missing values for both train and test
  8. train.fillna(-999, inplace=True)
  9. test.fillna(-999,inplace=True)
  10. #Creating a training set for modeling and validation set to check model performance
  11. X = train.drop(['Item_Outlet_Sales'], axis=1)
  12. y = train.Item_Outlet_Sales
  13. from sklearn.model_selection import train_test_split
  14. X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.7, random_state=1234)
  15. categorical_features_indices = np.where(X.dtypes != np.float)[0]
  16. #importing library and building model
  17. from catboost import CatBoostRegressormodel=CatBoostRegressor(iterations=50, depth=3, learning_rate=0.1, loss_function='RMSE')
  18. model.fit(X_train, y_train,cat_features=categorical_features_indices,eval_set=(X_validation, y_validation),plot=True)
  19. submission = pd.DataFrame()
  20. submission['Item_Identifier'] = test['Item_Identifier']
  21. submission['Outlet_Identifier'] = test['Outlet_Identifier']
  22. submission['Item_Outlet_Sales'] = model.predict(test)

2、  R代码

  1. set.seed(1)
  2. require(titanic)
  3. require(caret)
  4. require(catboost)
  5. tt <- titanic::titanic_train[complete.cases(titanic::titanic_train),]
  6. data <- as.data.frame(as.matrix(tt), stringsAsFactors = TRUE)
  7. drop_columns = c("PassengerId", "Survived", "Name", "Ticket", "Cabin")
  8. x <- data[,!(names(data) %in% drop_columns)]y <- data[,c("Survived")]
  9. fit_control <- trainControl(method = "cv", number = 4,classProbs = TRUE)
  10. grid <- expand.grid(depth = c(4, 6, 8),learning_rate = 0.1,iterations = 100, l2_leaf_reg = 1e-3, rsm = 0.95, border_count = 64)
  11. report <- train(x, as.factor(make.names(y)),method = catboost.caret,verbose = TRUE, preProc = NULL,tuneGrid = grid, trControl = fit_control)
  12. print(report)
  13. importance <- varImp(report, scale = FALSE)
  14. print(importance)
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/130121
推荐阅读
相关标签
  

闽ICP备14008679号