当前位置:   article > 正文

spark rdd转换为dataFrame的两种方式_简述rdd转换dataframe的两种方法。

简述rdd转换dataframe的两种方法。

1. 隐式转换toDF

1)使用case-class

case class Person(name: String, age: Int)

引入隐式转换的包

import sqlContext.implicits._

创建case-class的Rdd

  1. val rdd: RDD[Person] = sc.parallelize(Array(
  2. Person("fanghailiang", 29),
  3. Person("sunyu", 28),
  4. Person("jiaolu", 26),
  5. Person("dingzelin", 31)
  6. ))

转换成DataFrame

val df: DataFrame = rdd.toDF()

2)将二元数组直接转换为DataFrame

  1. val rdd2: RDD[(String, Int)] = sc.parallelize(Array(
  2. ("fanghailiang", 29),
  3. ("sunyu", 28),
  4. ("jiaolu", 26),
  5. ("dingzelin", 31)
  6. ))
  7. val df2: DataFrame = rdd2.toDF("name2", "age3")

2.通过RowRdd+Scheme

1, 构建RowRdd

  1. val rowRdd: RDD[Row] = sc.parallelize(Array(
  2. ("fanghailiang", 29),
  3. ("sunyu", 28),
  4. ("jiaolu", 26),
  5. ("dingzelin", 31)
  6. )).map{
  7. case (name, age) => {
  8. Row(name, age)
  9. }
  10. }

2. 构建scheme

  1. val schema: StructType = StructType(Array(
  2. StructField("name", StringType, false),
  3. StructField("age", IntegerType, false)
  4. ))

3. 创建DataFrame

val df3: DataFrame = sqlContext.createDataFrame(rowRdd, schema)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/693327
推荐阅读
相关标签
  

闽ICP备14008679号