当前位置:   article > 正文

Spark-MLlib 学习入门到掌握-FeatureHasher特征向量[9]_sparkml featurehasher

sparkml featurehasher

FeatureHasher:将不同数据类型通过hash算法转换成特征向量。如String、bool、int等等。

  def FeatureHasher(): Unit ={
    import org.apache.spark.ml.feature.FeatureHasher
    val spark: SparkSession = SparkSession.builder().appName("implicits").master("local[2]").getOrCreate()

    val dataset = spark.createDataFrame(Seq(
      (2.2, true, "1", "foo"),
      (3.3, false, "2", "bar"),
      (4.4, false, "3", "baz"),
      (5.5, false, "4", "foo")
    )).toDF("real", "bool", "stringNum", "string")

    val hasher = new FeatureHasher()
      //输入映射列
      .setInputCols("real", "bool", "stringNum", "string")
      //输出映射列
      .setOutputCol("features")

    val featurized = hasher.transform(dataset)
    //输出特征向量
    featurized.show(false)
    [0,WrappedArray(a, b, c),(3,[0,1,2],[1.0,1.0,1.0])]
    [1,WrappedArray(a, b, b, c, a),(3,[0,1,2],[2.0,2.0,1.0])]
  }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

运行结果:运行结果

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/空白诗007/article/detail/988771
推荐阅读
相关标签
  

闽ICP备14008679号