赞
踩
val hdfsFile = sc.textFile("hdfs://hadoop01:9000/employee.txt")
hdfsFile.saveAsTextFile("/employeeOut")
每一行是一条JSON串
import scala.util.parsing.json.JSON
val json = sc.textFile("/employee.json")
val result = json.map(JSON.parseFull)
Sequence文件是Hadoop用来存储key-value二进制形式数据的文件
scala> val sequenceRdd = sc.parallelize(Array((1,2),(3,4),(5,6)))
scala> sequenceRdd.saveAsSequenceFile("file:///home/hadoop/spark/seqdata")
scala> val seq = sc.sequenceFile[Int,Int]("file:///opt/module/spark/seqdata")
scala> seq.collect
数据会被序列化
scala> val rdd = sc.parallelize(Array(1,2,3))
scala> rdd.saveAsObjectFile("file:///home/hadoop/spark/objectdata")
scala> val obj = sc.objectFile[Int]("file:///home/hadoop/spark/objectdata")
scala> obj.collect
旧版
scala> import org.apache.hadoop.io.{
IntWritable, Text}
scala> import org.apache.hadoop.mapred.TextOutputFormat
scala> val content = sc.parallelize(Array(("laozhang",22),("laoli",18)))
scala> content.saveAsHadoopFile("hdfs://hadoop01:9000/test",classOf[Text],classOf[IntWritable],classOf[TextOutputFormat[Text,IntWritable]])
val conf = new SparkConf().setMaster(
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。