赞
踩
目标函数为:
min w 1 2 n s a m p l e s ∣ ∣ X w − y ∣ ∣ 2 2 + α ∣ ∣ w ∣ ∣ 1 = min w 1 2 n s a m p l e s ∑ i = 1 n ( y ^ i − y i ) 2 + α ∑ i = 1 n ∣ w i ∣ (1) \min_w \frac{1}{2n_{samples}}||X_w-y||^2_2+\alpha ||w||_1=\\ \min_w \frac{1}{2n_{samples}}\sum_{i=1}^n(\hat{y}_i-y_i)^2+\alpha\sum_{i=1}^n |w_i| \tag{1} wmin2nsamples1∣∣Xw−y∣∣22+α∣∣w∣∣1=wmin2nsamples1i=1∑n(y^i−yi)2+αi=1∑n∣wi∣(1)
其中, α \alpha α是一个常数, ∣ ∣ w ∣ ∣ 1 ||w||_1 ∣∣w∣∣1是L1范数。
from sklearn.linear_model import Lasso
alpha = 0.1
lasso = Lasso(alpha=alpha)
y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)
r2_score_lasso = r2_score(y_test, y_pred_lasso)
print(lasso)
print("r^2 on test data : %f" % r2_score_lasso)
评价指标:r2_score
。
有两种交叉验证——LassoCV
和 LassoLarsCV
。
LassoCV
coordinate descent
算法LassoLarsCV
Least Angle Regression
算法对于很多共线性特征的高维数据集,LassoCV
表现更好。如果样本量很小,样本量小于特征数,LassoLarsCV
比较快。
LassoCV
例子:
from sklearn.linear_model import LassoCV start_time = time.time() model = make_pipeline(StandardScaler(), LassoCV(cv=20)).fit(X, y) fit_time = time.time() - start_time import matplotlib.pyplot as plt ymin, ymax = 2300, 3800 lasso = model[-1] plt.semilogx(lasso.alphas_, lasso.mse_path_, linestyle=":") plt.plot( lasso.alphas_, lasso.mse_path_.mean(axis=-1), color="black", label="Average across the folds", linewidth=2, ) plt.axvline(lasso.alpha_, linestyle="--", color="black", label="alpha: CV estimate") plt.ylim(ymin, ymax) plt.xlabel(r"$\alpha$") plt.ylabel("Mean square error") plt.legend() _ = plt.title( f"Mean square error on each fold: coordinate descent (train time: {fit_time:.2f}s)" )
LassoLarsCV
例子:
from sklearn.linear_model import LassoLarsCV start_time = time.time() model = make_pipeline(StandardScaler(), LassoLarsCV(cv=20)).fit(X, y) fit_time = time.time() - start_time lasso = model[-1] plt.semilogx(lasso.cv_alphas_, lasso.mse_path_, ":") plt.semilogx( lasso.cv_alphas_, lasso.mse_path_.mean(axis=-1), color="black", label="Average across the folds", linewidth=2, ) plt.axvline(lasso.alpha_, linestyle="--", color="black", label="alpha CV") plt.ylim(ymin, ymax) plt.xlabel(r"$\alpha$") plt.ylabel("Mean square error") plt.legend() _ = plt.title(f"Mean square error on each fold: Lars (train time: {fit_time:.2f}s)")
计数信息准则是基于训练集数据。
困难:
计算公式为:
A I C = − 2 l o g ( L ^ ) + 2 d (2) AIC=-2log(\hat{L})+2d \tag{2} AIC=−2log(L^)+2d(2)
其中, L ^ \hat{L} L^是模型的最大似然估计函数, d d d是参数个数,即自由度。
B I C BIC BIC的计算就是把式(2)中的2替换为 l o g ( N ) log(N) log(N):
B I C = − 2 l o g ( L ^ ) + l o g ( N ) d (3) BIC=-2log(\hat{L})+log(N)d \tag{3} BIC=−2log(L^)+log(N)d(3)
其中 N N N是样本量。
对于一个线性高斯模型,最大似然函数的对数为:
l o g ( L ^ ) = − n 2 l o g ( 2 π ) − n 2 l n ( σ 2 ) − ∑ i = 1 n ( y i − y ^ i ) 2 2 σ 2 (4) log(\hat{L})=-\frac{n}{2}log(2\pi)-\frac{n}{2}ln(\sigma^2)-\frac{\sum_{i=1}^n (y_i-\hat{y}_i)^2}{2\sigma^2} \tag{4} log(L^)=−2nlog(2π)−2nln(σ2)−2σ2∑i=1n(yi−y^i)2(4)
将式(4)代入式(2)得:
A I C = n l o g ( 2 π σ 2 ) + ∑ i = 1 n ( y i − y ^ i ) 2 σ 2 + 2 d (5) AIC=nlog(2\pi\sigma^2)+\frac{\sum_{i=1}^n (y_i-\hat{y}_i)^2}{\sigma^2}+2d \tag{5} AIC=nlog(2πσ2)+σ2∑i=1n(yi−y^i)2+2d(5)
其中 σ 2 \sigma^2 σ2是常数,由式(6)估计而得:
σ 2 = ∑ i = 1 n ( y i − y ^ i ) 2 n − p (6) \sigma^2=\frac{\sum_{i=1}^n (y_i-\hat{y}_i)^2}{n-p} \tag{6} σ2=n−p∑i=1n(yi−y^i)2(6)
其中,
p
p
p是特征个数,且仅当n_samples > n_features
时成立。
import time from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LassoLarsIC from sklearn.pipeline import make_pipeline start_time = time.time() lasso_lars_ic = make_pipeline(StandardScaler(), LassoLarsIC(criterion="aic")).fit(X, y)#AIC fit_time = time.time() - start_time results = pd.DataFrame( { "alphas": lasso_lars_ic[-1].alphas_, "AIC criterion": lasso_lars_ic[-1].criterion_, } ).set_index("alphas") alpha_aic = lasso_lars_ic[-1].alpha_
以上是 A I C AIC AIC计算。
lasso_lars_ic.set_params(lassolarsic__criterion="bic").fit(X, y)
results["BIC criterion"] = lasso_lars_ic[-1].criterion_
alpha_bic = lasso_lars_ic[-1].alpha_
#加粗列的最小值
def highlight_min(x):
x_min = x.min()
return ["font-weight: bold" if v == x_min else "" for v in x]
results.style.apply(highlight_min)
这是 B I C BIC BIC的计算。
一般使用交叉验证选择参数 α \alpha α,因为有很多共线性特征的高维数据集,使用交叉验证的限制较少,而是用信息准则需求更严格。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。