赞
踩
HDFS(分布式存储系统)
NameNode:HDFS集群主节点
SecondaryNamenode:NameNode 的冷备份
DateNode:是 HDFS 集群从节点
YARN(资源管理器)----MapReduce、Storm,spark,flink
ResourceManager
NodeManager
MapReduce(分布式并行计算系统)
1)HDFS的基础架构
2)HDFS的特点
3)高可用(容灾设计)
4)基本操作
- CLI基本操作
-
- 创建目录
- hdfs dfs -mkdir -p /user/test/input
- 上传文件
- hdfs dfs -put etc/hadoop/*.xml /user/hadoop/input
- 查看文件列表
- hdfs dfs -ls /user/root
- 查看文件内容
- hdfs dfs -cat /user/test/input/*xml
- 删除文件
- hdfs dfs -rm -r /user/root/output
说明:该图为借用,版权归原作者所有。
1)环境规划
机器ID | IP | 角色 |
1 | 192.168.31.10 | master |
2 | 192.168.31.11 | slave01 |
3 | 192.168.31.12 | slave02 |
2)安装SSH服务(如果已安装,则跳过)
- 安装SSH
- yum -y install openssh-server
- 生成密钥对
- mkdir /var/run/sshd
- ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
- ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
- 指定root密码
- /bin/echo 'root:123456'|chpasswd
- /bin/sed -i 's/.*session.*required.*pam_loginuid.so.*/session optional pam_loginuid.so/g' /etc/pam.d/sshd
- /bin/echo -e "LANG=\"en_US.UTF-8\"" > /etc/default/local
- 启动服务
- /usr/sbin/sshd -D
-
- 检测是否可以免密SSH登录localhost,不果不能,则执行以下命令
- ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- chmod 0600 ~/.ssh/authorized_keys
3)域名配置
- 编辑hosts文件,添加域名:
- 命令:
- vi /etc/hosts
- 内容:
- 192.168.31.10 master
- 192.168.31.11 slave01
- 192.168.31.12 slave02
4)JDK安装
- 1.下载JDK1.8安装包,上传至服务器,并解压。
- 2.环境变量配置
- vim /etc/profile
- 在文件最后添加
- export JAVA_HOME=/home/hadoop/java/jdk1.8.0_161
- export PATH=$PATH:$JAVA_HOME/bin
- 3.刷新使生效
- source /etc/profile
5)Hadoop安装
- 下载安装包,并解压至服务器。
-
- 下载地址:
- https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
- 官网下载极慢,建议使用下面链接
- https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/
6)Hadoop配置
- hadoop-env.sh
- export JAVA_HOME=/usr/local/software/java8/jdk1.8.0_311/
- export HDFS_NAMENODE_USER=root
- export HDFS_DATANODE_USER=root
- export HDFS_SECONDARYNAMENODE_USER=root
- export YARN_RESOURCEMANAGER_USER=root
- export YARN_NODEMANAGER_USER=root
- -----------------------------------
- core-site.xml
- <configuration>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>file:/usr/local/hadoop/tmp</value>
- <description>Abase for other temporary directories.</description>
- </property>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://master:9000</value>
- </property>
- </configuration>
- -----------------------------------
- hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file:/usr/local/software/hadoop/hadoop-3.3.1/namenode_dir</value>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>file:/usr/local/software/hadoop/hadoop-3.3.1/datanode_dir</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>3</value>
- </property>
- <!--需要配置端口,否则页面无法访问-->
- <property>
- <name>dfs.http.address</name>
- <value>master:50070</value>
- </property>
- </configuration>
- ----------------------------------------
- mapred-site.xml
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- <property>
- <name>yarn.app.mapreduce.am.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
- <property>
- <name>mapreduce.map.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
- <property>
- <name>mapreduce.reduce.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
- </configuration>
- ----------------------------------------
- yarn-site.xml
- <configuration>
- <!-- Site specific YARN configuration properties -->
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.resourcemanager.hostname</name>
- <value>master</value>
- </property>
- <property>
- <name>yarn.nodemanager.resource.memory-mb</name>
- <value>20480</value>
- </property>
- <property>
- <name>yarn.scheduler.minimum-allocation-mb</name>
- <value>2048</value>
- </property>
- <property>
- <name>yarn.nodemanager.vmem-pmem-ratio</name>
- <value>2.1</value>
- </property>
- </configuration>
- ------------------------------------
- 配置域名
- 修改主节点文件:etc/hadoop/workers,将 localhost 替换成两个 slave 的主机名
- slave01
- slave02
7)启动hadoop集群
- 格式化
- hdfs namenode -format
- 启动hdfs集群
- sbin/start-dfs.sh
- 启动yarn集群
- sbin/start-yarn.sh
8)踩坑点说明
- 1.50070端口不能访问
- 1)hdfs-site.xml天机端口配置
- <property>
- <name>dfs.http.address</name>
- <value>master:50070</value>
- </property>
- 2)删除节点数据
- 3)格式化
- 4)启动hdfs集群
- 5)启动yarn集群
-
-
- 2.报错:Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
- mapred-site.xml中添加配置:
- <property>
- <name>yarn.app.mapreduce.am.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
- <property>
- <name>mapreduce.map.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
- <property>
- <name>mapreduce.reduce.env</name>
- <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
- </property>
-
- 3.http://192.168.31.10:50070页面查看文件失败
-
- 在客户端机器上,更改C:\Windows\System32\drivers\etc\hosts文件,添加hadoop集群域名信息:
- 192.168.31.10 master
- 192.168.31.11 slave01
- 192.168.31.12 slave02
-
- 4.页面查看文件失败,报错:Couldn’t preview the file. NetworkError: Failed to execute ‘send’ on ‘XMLHttpRequest’: Failed to load ‘http://slave1:9864/webhdfs/v1/HelloHadoop.txt?op=OPEN&namenoderpcaddress=master:9820&offset=0&_=1609724219001’.
-
- [root@master ~]# vim /usr/bigdata/hadoop-3.3.0/etc/hadoop/hdfs-site.xml
- <property>
- <name>dfs.webhdfs.enabled</name>
- <value>true</value>
- </property>
- 重启集群
1)前提条件
已成功安装JDK1.8、Maven
2)资源下载
hadoop安装包
winutils
3)部署
使用7-Zip解压hadoop安装包至指定目录(两次提取)
解压winutils,并用指定版本的bin目录覆盖hadoop的bin目录。
4)Hadoop配置
- Hadoop安装目录:D:\SOFT\BigData\hadoop-3.3.1
-
- hadoop-env.cmd文件末尾添加一行:
- set HADOOP_IDENT_STRING="Administrator"
- ----------------------------------------------
- core-site.xml
- <configuration>
- <!-- 指定Hadoop运行时产生文件的存储目录 -->
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/D:/SOFT/BigData/hadoop-3.3.1/workplace/tmp</value>
- </property>
- <property>
- <name>dfs.name.dir</name>
- <value>/D:/SOFT/BigData/hadoop-3.3.1/workplace/name</value>
- </property>
- <!-- 指定HDFS中NameNode的地址 -->
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://localhost:9000</value>
- </property>
- </configuration>
- ----------------------------------------------
- hdfs-site.xml
- <configuration>
- <!-- 这个参数设置为1,因为是单机版hadoop -->
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/D:/SOFT/BigData/hadoop-3.3.1/workplace/data</value>
- </property>
- </configuration>
- ----------------------------------------------
- mapred-site.xml
- <configuration>
- <!-- 这个参数设置为1,因为是单机版hadoop -->
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/D:/Environment/hadoop/workplace/data</value>
- </property>
-
- </configuration>
- ----------------------------------------------
- yarn-site.xml
- <configuration>
-
- <!-- Site specific YARN configuration properties -->
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
- <value>org.apache.hadoop.mapred.ShuffleHandler</value>
- </property>
- </configuration>
5)项目开发
pom依赖
- <dependencies>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-common</artifactId>
- <version>3.3.1</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-client</artifactId>
- <version>3.3.1</version>
- </dependency>
- <dependency>
- <groupId>org.slf4j</groupId>
- <artifactId>slf4j-log4j12</artifactId>
- <version>1.7.30</version>
- <!-- <scope>compile</scope>-->
- </dependency>
- <dependency>
- <groupId>org.slf4j</groupId>
- <artifactId>slf4j-nop</artifactId>
- <version>1.7.30</version>
- </dependency>
- </dependencies>
log4j.properties
- log4j.rootLogger=WARN, stdout
- log4j.appender.stdout=org.apache.log4j.ConsoleAppender
- log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
- log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
WordCount.java
- package com.xxx.mapreducer;
-
- import org.apache.commons.io.FileUtils;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
-
- import java.io.File;
- import java.io.IOException;
-
- public class WordCount {
- public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
-
- File file=new File(args[1]);
- if (file.exists()){
- FileUtils.deleteDirectory(file);
- }
-
- System.setProperty("HADOOP_USER_NAME","root");
-
- Configuration configuration = new Configuration();
- configuration.set("hadoop.tmp.dir", "D:/SOFT/BigData/hadoop-3.3.1/workplace/tmp");
-
- Job job = Job.getInstance(configuration, "wordCount");
- job.setJarByClass(WordCount.class);
-
- job.setMapperClass(MyMapper.class);
-
- job.setCombinerClass(MyCombiner.class);
-
- job.setReducerClass(MyCombiner.class);
-
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
-
- FileInputFormat.addInputPath(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
-
- System.exit(job.waitForCompletion(true) ? 0 : 1);
- }
-
- }
MyMapper.java
- package com.xxx.mapreducer;
-
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
-
- import java.io.IOException;
- import java.util.StringTokenizer;
-
- public class MyMapper extends Mapper<Object, Text, Text, IntWritable> {
- private final static IntWritable one = new IntWritable(1);
- private Text word = new Text();
-
- @Override
- protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
- StringTokenizer itr = new StringTokenizer(value.toString());
- while (itr.hasMoreTokens()) {
- word.set(itr.nextToken());
- context.write(word, one);
- }
- }
- }
MyCombiner.java
- package com.xxx.mapreducer;
-
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Reducer;
-
- import java.io.IOException;
-
- public class MyCombiner extends Reducer<Text, IntWritable, Text, IntWritable> {
- private IntWritable result = new IntWritable();
-
- @Override
- public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
- int sum = 0;
- for (IntWritable val : values) {
- sum += val.get();
- }
- result.set(sum);
- context.write(key, result);
- }
- }
6)调试
7)踩坑点:
- 代码运行报错:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
-
- 检查环境变量是否配置正确,且Hadoop bin目录下hadoop.dll和winutils.exe
- 拷贝hadoop.dll至c:\windows\system32目录
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。