赞
踩
val newRdd = oldRdd1.partitionBy(new org.apache.spark.HashPartitioner(partitions))
partitions 表示分区数
def partitionBy(partitioner : org.apache.spark.Partitioner) : org.apache.spark.rdd.RDD[scala.Tuple2[K, V]] = { /* compiled code */ }
对K-V类型的RDD重新分配分区。
package com.day1 import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} object oper { def main(args: Array[String]): Unit = { val config:SparkConf = new SparkConf().setMaster("local[*]").setAppName("wordCount") // 创建上下文对象 val sc = new SparkContext(config) // partitionBy算子 val arrayRdd = sc.makeRDD(Array((1,"张三"),(2,"李四"),(3,"王五"),(4,"刘六")),4) val partitionByRdd = arrayRdd.partitionBy(new org.apache.spark.HashPartitioner(2)) println(partitionByRdd.getNumPartitions) } } 输入 (1,"张三"),(2,"李四"),(3,"王五"),(4,"刘六") 输出 2
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。