赞
踩
这个summarizer用于计算样本各维特征的均值,方差等常用统计量
class MultivariateOnlineSummarizer extends MultivariateStatisticalSummary with Serializable {
private var n = 0
//均值
private var currMean: Array[Double] = _
//用于方差统计
private var currM2n: Array[Double] = _
//平方和
private var currM2: Array[Double] = _
//L1范数
private var currL1: Array[Double] = _
//样本计数
private var totalCnt: Long = 0
//所有样本weight的和
private var totalWeightSum: Double = 0.0
//weight平方和,用于计算方差
private var weightSquareSum: Double = 0.0
//每维特征非0的权重和
private var weightSum: Array[Double] = _
//非0计数
private var nnz: Array[Long] = _
//最大值
private var currMax: Array[Double] = _
//最小值
private var currMin: Array[Double] = _
这里的统计项,除了均值和方差,其他的直接计算即可,
Wikipedia给出了带权online统计算法,这里的算法支持分布式统计,各部分样本先合并,然后各个统计器再合并。样本统计
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。