赞
踩
这里先明确一下“置信区间”和“置信水平”的关系:
给定一fixed置信水平,如果样本量越多,则该置信水平下的置信区间越窄;
置信水平越大,则置信区间越宽;
置信区间越宽,则置信水平越大;
置信水平
sklearn.gaussian_process.GaussianProcessRegressor(kernel=None, alpha=1e-10, optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None)
#kernel:用于拟合covariance matrix的核函数
#alpha:个人理解,相当于一个正则化系数,alpha越大,则代表data的noise越多,则模型在拟合的时候,会通过增加bias来降低variance。vice versa。
#optimizer:用来优化kernel参数的optimizer,可以用系统自带的优化器,也可以自己编写一个优化器,赋值给optimizer。optimizer的作用是:通过最大化log-marginal-likelihood来求解kernel(covariance estimation)的最优hyperparameter(recap:covariance estimation的原理,假设data服从Guassian distribution,且covariance未知,我们的目标函数就是求解使得data似然值最大(似然值通过Guassian distribution获得)的covariance。个人理解,这里的kernel相当于是计算data covariance的一种概率密度函数)。
#note that:kernel added with WhiteKernel can estimate the noise level of data
#n_restarts_optimizer:optimizer的启动次数。每次,验证一个初始化kernel parameter。通过n_restarts_optimizer次初始化,确定最终的最优参数。
#normalize_y:bool,是否预测值y需要标准化(也就是说:是否training data的y值需要标准化)。如果training data的y平均值不为0,则normalize_y=True。(个人理解)???
#copy_X_train:是否将X的copy存储下来。
#attribute
.log_marginal_likelihood_value_ #kernel某一hyperparameter下的 log-marginal-likelihood
#method
log_marginal_likelihood([theta, eval_gradient]) #求解给定kernel下hyperparameter = theta时的log_marginal_likelihood。
sklearn.gaussian_process.GaussianProcessClassifier(kernel=None, optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0, max_iter_predict=100, warm_start=False, copy_X_train=True, random_state=None, multi_class=’one_vs_rest’, n_jobs=None)
#multi_class:{one_vs_rest,one_vs_one};multi-classification采用的分类方法:one_vs_one是指形成C{k,2}个分类器,将其组合用于多分类问题,其中k为class的数量;one_vs_rest是指形成k个分类器,将其组合用于多分类问题。
GPC核心思想:首先根据先验概率,求得一个latent function,然后在用link function将该latent function进行封装,使得最后得到的y和x的关系为一种更简洁的形式,比如linear。(Recap:这种思想remind me : EM中的E步和M步(首先根据initial 参数值求隐含变量,然后在根据隐含变量求解参数值,不断迭代,直到达到stopping condition))。
GPC中the posterior of latent function为logistic link fucntion,而不是Guassion link function ,因为Guassian likelihood不适用于discrete class label。GPC approximates the non-Gaussian posterior with a Gaussian based on the Laplace approximation(个人理解:GPC算法中,假设X是服从Guassion distribution(这个Guassian distribution由拉普拉斯近似),y值的预测是通过一个logistic likelihood来进行的,因为Guassion likelihood不适用于离散label的预测)。
需要注意的是GPC中当multi_class=one_vs_one时,其并不给出概率预测值,而直接给出predict_label。GPC在执行多分类任务时,其内部并不会生成a true multi-class Laplace approximation(X的分布形式),而是通过多个二分类model,组合起来形成一个多分类model。
参考贴(link function):广义线性模型中, 联系函数(link function) 的作用是不是就是将不是正态分布的Y转换成正态分布?
广义线性模型GML
拉普拉斯近似
Prior 、Posterior 和 Likelihood 的理解与几种表达方式
在GP中kernel也叫做covariance funcition,covariance function是基于以下假设学习的:假设相似的point之间具有相似的target,利用两个point的相似度来学习covariance。
在GP中,kernel可以粗略分为以下几种:
from sklearn.gaussian_process.kernels.Kernel import ConstantKernel, RBF
#attribute:
.bounds #返回the log-transformed bounds on the theta.
.hyperparameters #返回the specifications of hyperparameters列表
.n_dims #返会kernel的Non-fixed的超参数
.theta #Returns the (flattened, log-transformed) non-fixed hyperparameters.
#method
__call__(X[, Y, eval_gradient]) #返回kernel(non-diagonal)。其参数可以是k(x),也可以是k(x,y=x)
clone_with_theta(theta) #Returns a clone of self with given hyperparameters theta.
diag(X) #Returns the diagonal of the kernel k(X, X).
get_params([deep]) #Get parameters of this kernel.
is_stationary() #Returns whether the kernel is stationary.
set_params(**params) #Set the parameters of this kernel.
下面介绍几种常用的Kernel,他们都为sklearn.gaussian_process.kernels.Kernel的子类。
1、ConstantKernel
#Can be used as part of a product-kernel where it scales the magnitude of the other factor (kernel) or as part of a sum-kernel, where it modifies the mean of the Gaussian process.
sklearn.gaussian_process.kernels.ConstantKernel(constant_value=1.0, constant_value_bounds=(1e-05, 100000.0))
#constant_value:对于任意k(x1,x2)=constant_value
#constant_value_bounds:constant的上下界。
it is definded as follows:对于“任意x1,x2”,其kernel value都为constant.
2、WhiteKernel
#The main use-case of this kernel is as part of a sum-kernel where it explains the noise-component of the signal. Tuning its parameter corresponds to estimating the noise-level.
sklearn.gaussian_process.kernels.WhiteKernel(noise_level=1.0, noise_level_bounds=(1e-05, 100000.0))
#noise_level:控制noise_level的参数
#noise_level_bounds:noise_level的上下界
it is definded as follows:如果xi==xj,则kernel=noise_level,否则=0.
3、Radial-basis function (RBF) kernel
#The RBF kernel is a stationary kernel。
#This kernel is infinitely differentiable,and are thus very smooth.
sklearn.gaussian_process.kernels.RBF(length_scale=1.0, length_scale_bounds=(1e-05, 100000.0))
it is definded as follows:
4、Matérn kernel
#The class of Matern kernels is a generalization of the RBF and the absolute exponential kernel parameterized by an additional parameter nu. The smaller nu, the less smooth the approximated function is. For nu=inf, the kernel becomes equivalent to the RBF kernel and for nu=0.5 to the absolute exponential kernel. Important intermediate values are nu=1.5 (once differentiable functions) and nu=2.5 (twice differentiable functions) which are popular choices for learning functions that are not infinitely differentiable but at least once (ν=1.5) or twice differentiable (ν=2.5).
sklearn.gaussian_process.kernels.Matern(length_scale=1.0, length_scale_bounds=(1e-05, 100000.0), nu=1.5)
#length_scale:The length scale of the kernel. If a float, an isotropic kernel is used. If an array, an anisotropic kernel is used where each dimension of l defines the length-scale of the respective feature dimension.
#nu:The parameter nu controlling the smoothness of the learned function.
it is defined as follows:
5、Rational quadratic kernel
#The RationalQuadratic kernel can be seen as a scale mixture (an infinite sum) of RBF kernels with different characteristic length-scales
sklearn.gaussian_process.kernels.RationalQuadratic(length_scale=1.0, alpha=1.0, length_scale_bounds=(1e-05, 100000.0), alpha_bounds=(1e-05, 100000.0))
#alpha:Scale mixture parameter
it is defined as follows:
5、Exp-Sine-Squared kernel
#The ExpSineSquared kernel allows modeling periodic functions.
sklearn.gaussian_process.kernels.ExpSineSquared(length_scale=1.0, periodicity=1.0, length_scale_bounds=(1e-05, 100000.0), periodicity_bounds=(1e-05, 100000.0))
it is defined as follows:
6、Dot-Product kernel
sklearn.gaussian_process.kernels.DotProduct(sigma_0=1.0, sigma_0_bounds=(1e-05, 100000.0))
The DotProduct kernel is non-stationary and can be obtained from linear regression by putting N(0, 1) priors on the coefficients of x_d (d = 1, . . . , D) and a prior of N(0, sigma_0^2) on the bias. The DotProduct kernel is invariant to a rotation of the coordinates about the origin, but not translations. It is parameterized by a parameter sigma_0^2. For sigma_0^2 =0, the kernel is called the homogeneous linear kernel, otherwise it is inhomogeneous. The kernel is given by
The DotProduct kernel is commonly combined with exponentiation.
7、kernel opreator:+ , * ,exponent
摘录一下在该案例中对kernel的描述,不同的kernel component对data的描述起到了不同的作用,值得借鉴学习:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。