赞
踩
目录:
1.概念
2.概述
伪分布式环境搭建:在hdfs环境搭建好的前提下,官网文档:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
mv mapred-site.xml.template mapred-site.xml
第二步:修改hadoop安装目录下etc/hadoop/mapred-site.xml: - <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- </configuration>
第三步:修改hadoop安装目录下etc/hadoop/yarn-site.xml:
- <configuration>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- </configuration>
- 在hadoop安装目录下的sbin目录下 start-yarn.sh
- 验证:
需求:
开发步骤:
- [root@kd01 test]# vi bbb.txt
- [root@kd01 test]# hdfs dfs -put bbb.txt /user/yiguang/
- <dependencies>
- <dependency>
- <groupId>junit</groupId>
- <artifactId>junit</artifactId>
- <version>3.8.1</version>
- <scope>test</scope>
- </dependency>
-
- <!-- hdfs依赖包 -->
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-hdfs</artifactId>
- <version>2.7.3</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-common</artifactId>
- <version>2.7.3</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
- <version>2.7.3</version>
- </dependency>
-
- <!-- mapreduce依赖包 -->
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-mapreduce-client-core</artifactId>
- <version>2.7.3</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-mapreduce-client-common</artifactId>
- <version>2.7.3</version>
- </dependency>
- </dependencies>
- package com.yiguang.test.mapreduceTest;
-
- import java.io.IOException;
-
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
-
- public class WordMap extends Mapper<LongWritable, Text, Text, LongWritable>{
- @Override
- protected void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException {
- /*
- * key: 输入的key
- * value: 数据
- * context: Map上下文
- */
- //得到一行的值
- //将拿到的一行数据通过空格分割为一个一个的元素
- String[] str=value.toString().split(" ");
- for(String str1:str) {
- //将每一个元素发送到reduce
- context.write(new Text(str1),new LongWritable(1));
- }
- }
- }
- package com.yiguang.test.mapreduceTest;
-
- import java.io.IOException;
-
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Reducer;
-
-
- public class WordReduce extends Reducer<Text,LongWritable,Text,LongWritable>{
- @Override
- protected void reduce(Text key,Iterable<LongWritable> values,Context context) throws IOException, InterruptedException {
- int sum=0;
- for(LongWritable iw:values) {
- sum+=iw.get();
- }
- //输出
- context.write(key, new LongWritable(sum));
- }
-
- }
- package com.yiguang.test.mapreduceTest;
-
- import java.io.IOException;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
-
-
-
- public class WordJob {
- public static void main(String[] args) throws Exception{
- //创建一个job=map+reduce
- Job job=Job.getInstance(new Configuration());
- job.setJobName("kd01");
- //指定任务的入口
- job.setJarByClass(WordJob.class);
-
- //指定job的mapper
- job.setMapperClass(WordMap.class);
- job.setMapOutputKeyClass(Text.class);
- job.setMapOutputValueClass(LongWritable.class);
-
- //指定job的reducer
- job.setReducerClass(WordReduce.class);
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(LongWritable.class);
-
- //指定任务的输入:原始数据的存放位置
- FileInputFormat.addInputPath(job, new Path("/user/yiguang/bbb.txt"));
- //指定任务的输出:指定一个空目录或者一个不存在的目录
- FileOutputFormat.setOutputPath(job, new Path("/kd01_out"));
-
- //提交任务
- job.waitForCompletion(true);
-
- }
- }
注意:在项目编译的时候jdk的版本与在linux中发布hadoop时,hadoop关联的jdk的版本,两者要一致,如果两者不同,可以修改项目编译的版本,在pom.xml中通过插件的方式,修改maven项目jdk编译的版本。指定maven项目编译JDK版本
发布:
[root@kd01 home]# hadoop jar mapreduceTest-0.0.1-SNAPSHOT.jar com.yiguang.test.mapreduceTest.WordJob
查看:
例如:zookeeper+dubbo,发布接口,供使用者调用
介绍及使用:https://www.jb51.net/article/127852.htm
https://blog.csdn.net/zhaowen25/article/details/45443951
类结构
InputFormat:
FileInputFormat:
TextInputFormat
InputSplit(重点)
RecordReader:将InputSplit拆分成一个个<key,value>对给Map处理,也是实际的文件读取分隔对象</key,value>
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。