当前位置:   article > 正文

Hadoop_hadoop-env.sh

hadoop-env.sh

1.Hadoop主要包括以下三大组件

    HDFS(分布式存储系统)
        NameNode:HDFS集群主节点
        SecondaryNamenode:NameNode 的冷备份
        DateNode:是 HDFS 集群从节点            
    YARN(资源管理器)----MapReduce、Storm,spark,flink
        ResourceManager
        NodeManager
    MapReduce(分布式并行计算系统)

2.HDFS分布式存储系统

1)HDFS的基础架构

 2)HDFS的特点

  •     一次写入多次读取
  •     不适合低延迟数据访问
  •     无法高效存储大量小文件
  •     不支持多用户写入及任意修改文件

3)高可用(容灾设计)

  • 在NameNode和DateNode之间维持心跳检测,如果NameNode不能正常收到DataNode的心跳,则认为该节点挂了。NameNode会检索副本数目小于设置值文件块,复制其新的副本,并分发到其他数据节点上。
  • 检测文件块的完整性:HDFS会记录每个新创建的文件的所有块的校验,检索时会优先提取校验和记录相同的副本。
  • 集群的负载均衡:当某 个数据节点的空闲空间大于一个临界值的时候,HDFS    会自动从其他数据节点迁移数据过来。

4)基本操作

  1. CLI基本操作
  2. 创建目录
  3. hdfs dfs -mkdir -p /user/test/input
  4. 上传文件
  5. hdfs dfs -put etc/hadoop/*.xml /user/hadoop/input
  6. 查看文件列表
  7. hdfs dfs -ls /user/root
  8. 查看文件内容
  9. hdfs dfs -cat /user/test/input/*xml
  10. 删除文件
  11. hdfs dfs -rm -r /user/root/output

3.YARN

 4.MapReduce

 说明:该图为借用,版权归原作者所有。

 5.Hadoop集群搭建

1)环境规划

机器IDIP角色
1192.168.31.10master
2192.168.31.11slave01
3192.168.31.12slave02

2)安装SSH服务(如果已安装,则跳过)

  1. 安装SSH
  2. yum -y install openssh-server
  3. 生成密钥对
  4. mkdir /var/run/sshd
  5. ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
  6. ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
  7. 指定root密码
  8. /bin/echo 'root:123456'|chpasswd
  9. /bin/sed -i 's/.*session.*required.*pam_loginuid.so.*/session optional pam_loginuid.so/g' /etc/pam.d/sshd
  10. /bin/echo -e "LANG=\"en_US.UTF-8\"" > /etc/default/local
  11. 启动服务
  12. /usr/sbin/sshd -D
  13. 检测是否可以免密SSH登录localhost,不果不能,则执行以下命令
  14. ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  15. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  16. chmod 0600 ~/.ssh/authorized_keys

3)域名配置

  1. 编辑hosts文件,添加域名:
  2. 命令:
  3. vi /etc/hosts
  4. 内容:
  5. 192.168.31.10 master
  6. 192.168.31.11 slave01
  7. 192.168.31.12 slave02

4)JDK安装

  1. 1.下载JDK1.8安装包,上传至服务器,并解压。
  2. 2.环境变量配置
  3. vim /etc/profile
  4. 在文件最后添加
  5. export JAVA_HOME=/home/hadoop/java/jdk1.8.0_161
  6. export PATH=$PATH:$JAVA_HOME/bin
  7. 3.刷新使生效
  8. source /etc/profile

5)Hadoop安装

  1. 下载安装包,并解压至服务器。
  2. 下载地址:
  3. https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
  4. 官网下载极慢,建议使用下面链接
  5. https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/

6)Hadoop配置

  1. hadoop-env.sh
  2. export JAVA_HOME=/usr/local/software/java8/jdk1.8.0_311/
  3. export HDFS_NAMENODE_USER=root
  4. export HDFS_DATANODE_USER=root
  5. export HDFS_SECONDARYNAMENODE_USER=root
  6. export YARN_RESOURCEMANAGER_USER=root
  7. export YARN_NODEMANAGER_USER=root
  8. -----------------------------------
  9. core-site.xml
  10. <configuration>
  11. <property>
  12. <name>hadoop.tmp.dir</name>
  13. <value>file:/usr/local/hadoop/tmp</value>
  14. <description>Abase for other temporary directories.</description>
  15. </property>
  16. <property>
  17. <name>fs.defaultFS</name>
  18. <value>hdfs://master:9000</value>
  19. </property>
  20. </configuration>
  21. -----------------------------------
  22. hdfs-site.xml
  23. <configuration>
  24. <property>
  25. <name>dfs.namenode.name.dir</name>
  26. <value>file:/usr/local/software/hadoop/hadoop-3.3.1/namenode_dir</value>
  27. </property>
  28. <property>
  29. <name>dfs.datanode.data.dir</name>
  30. <value>file:/usr/local/software/hadoop/hadoop-3.3.1/datanode_dir</value>
  31. </property>
  32. <property>
  33. <name>dfs.replication</name>
  34. <value>3</value>
  35. </property>
  36. <!--需要配置端口,否则页面无法访问-->
  37. <property>
  38. <name>dfs.http.address</name>
  39. <value>master:50070</value>
  40. </property>
  41. </configuration>
  42. ----------------------------------------
  43. mapred-site.xml
  44. <configuration>
  45. <property>
  46. <name>mapreduce.framework.name</name>
  47. <value>yarn</value>
  48. </property>
  49. <property>
  50. <name>yarn.app.mapreduce.am.env</name>
  51. <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
  52. </property>
  53. <property>
  54. <name>mapreduce.map.env</name>
  55. <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
  56. </property>
  57. <property>
  58. <name>mapreduce.reduce.env</name>
  59. <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
  60. </property>
  61. </configuration>
  62. ----------------------------------------
  63. yarn-site.xml
  64. <configuration>
  65. <!-- Site specific YARN configuration properties -->
  66. <property>
  67. <name>yarn.nodemanager.aux-services</name>
  68. <value>mapreduce_shuffle</value>
  69. </property>
  70. <property>
  71. <name>yarn.resourcemanager.hostname</name>
  72. <value>master</value>
  73. </property>
  74. <property>
  75. <name>yarn.nodemanager.resource.memory-mb</name>
  76. <value>20480</value>
  77. </property>
  78. <property>
  79. <name>yarn.scheduler.minimum-allocation-mb</name>
  80. <value>2048</value>
  81. </property>
  82. <property>
  83. <name>yarn.nodemanager.vmem-pmem-ratio</name>
  84. <value>2.1</value>
  85. </property>
  86. </configuration>
  87. ------------------------------------
  88. 配置域名
  89. 修改主节点文件:etc/hadoop/workers,将 localhost 替换成两个 slave 的主机名
  90. slave01
  91. slave02

7)启动hadoop集群

  1. 格式化
  2. hdfs namenode -format
  3. 启动hdfs集群
  4. sbin/start-dfs.sh
  5. 启动yarn集群
  6. sbin/start-yarn.sh

8)踩坑点说明

  1. 1.50070端口不能访问
  2. 1)hdfs-site.xml天机端口配置
  3. <property>
  4. <name>dfs.http.address</name>
  5. <value>master:50070</value>
  6. </property>
  7. 2)删除节点数据
  8. 3)格式化
  9. 4)启动hdfs集群
  10. 5)启动yarn集群
  11. 2.报错:Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
  12. mapred-site.xml中添加配置:
  13. <property>
  14. <name>yarn.app.mapreduce.am.env</name>
  15. <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
  16. </property>
  17. <property>
  18. <name>mapreduce.map.env</name>
  19. <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
  20. </property>
  21. <property>
  22. <name>mapreduce.reduce.env</name>
  23. <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
  24. </property>
  25. 3.http://192.168.31.10:50070页面查看文件失败
  26. 在客户端机器上,更改C:\Windows\System32\drivers\etc\hosts文件,添加hadoop集群域名信息:
  27. 192.168.31.10 master
  28. 192.168.31.11 slave01
  29. 192.168.31.12 slave02
  30. 4.页面查看文件失败,报错:Couldn’t preview the file. NetworkError: Failed to execute ‘sendon ‘XMLHttpRequest’: Failed to load ‘http://slave1:9864/webhdfs/v1/HelloHadoop.txt?op=OPEN&namenoderpcaddress=master:9820&offset=0&_=1609724219001’.
  31. [root@master ~]# vim /usr/bigdata/hadoop-3.3.0/etc/hadoop/hdfs-site.xml
  32. <property>
  33. <name>dfs.webhdfs.enabled</name>
  34. <value>true</value>
  35. </property>
  36. 重启集群

6.基于IDEA搭建Windows开发环境

1)前提条件

        已成功安装JDK1.8、Maven

2)资源下载

        hadoop安装包

        winutils

3)部署

        使用7-Zip解压hadoop安装包至指定目录(两次提取)

        解压winutils,并用指定版本的bin目录覆盖hadoop的bin目录。

4)Hadoop配置

  1. Hadoop安装目录:D:\SOFT\BigData\hadoop-3.3.1
  2. hadoop-env.cmd文件末尾添加一行:
  3. set HADOOP_IDENT_STRING="Administrator"
  4. ----------------------------------------------
  5. core-site.xml
  6. <configuration>
  7. <!-- 指定Hadoop运行时产生文件的存储目录 -->
  8. <property>
  9. <name>hadoop.tmp.dir</name>
  10. <value>/D:/SOFT/BigData/hadoop-3.3.1/workplace/tmp</value>
  11. </property>
  12. <property>
  13. <name>dfs.name.dir</name>
  14. <value>/D:/SOFT/BigData/hadoop-3.3.1/workplace/name</value>
  15. </property>
  16. <!-- 指定HDFS中NameNode的地址 -->
  17. <property>
  18. <name>fs.defaultFS</name>
  19. <value>hdfs://localhost:9000</value>
  20. </property>
  21. </configuration>
  22. ----------------------------------------------
  23. hdfs-site.xml
  24. <configuration>
  25. <!-- 这个参数设置为1,因为是单机版hadoop -->
  26. <property>
  27. <name>dfs.replication</name>
  28. <value>1</value>
  29. </property>
  30. <property>
  31. <name>dfs.data.dir</name>
  32. <value>/D:/SOFT/BigData/hadoop-3.3.1/workplace/data</value>
  33. </property>
  34. </configuration>
  35. ----------------------------------------------
  36. mapred-site.xml
  37. <configuration>
  38. <!-- 这个参数设置为1,因为是单机版hadoop -->
  39. <property>
  40. <name>dfs.replication</name>
  41. <value>1</value>
  42. </property>
  43. <property>
  44. <name>dfs.data.dir</name>
  45. <value>/D:/Environment/hadoop/workplace/data</value>
  46. </property>
  47. </configuration>
  48. ----------------------------------------------
  49. yarn-site.xml
  50. <configuration>
  51. <!-- Site specific YARN configuration properties -->
  52. <property>
  53. <name>yarn.nodemanager.aux-services</name>
  54. <value>mapreduce_shuffle</value>
  55. </property>
  56. <property>
  57. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  58. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  59. </property>
  60. </configuration>

5)项目开发

        pom依赖

  1. <dependencies>
  2. <dependency>
  3. <groupId>org.apache.hadoop</groupId>
  4. <artifactId>hadoop-common</artifactId>
  5. <version>3.3.1</version>
  6. </dependency>
  7. <dependency>
  8. <groupId>org.apache.hadoop</groupId>
  9. <artifactId>hadoop-client</artifactId>
  10. <version>3.3.1</version>
  11. </dependency>
  12. <dependency>
  13. <groupId>org.slf4j</groupId>
  14. <artifactId>slf4j-log4j12</artifactId>
  15. <version>1.7.30</version>
  16. <!-- <scope>compile</scope>-->
  17. </dependency>
  18. <dependency>
  19. <groupId>org.slf4j</groupId>
  20. <artifactId>slf4j-nop</artifactId>
  21. <version>1.7.30</version>
  22. </dependency>
  23. </dependencies>

        log4j.properties

  1. log4j.rootLogger=WARN, stdout
  2. log4j.appender.stdout=org.apache.log4j.ConsoleAppender
  3. log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
  4. log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n

        WordCount.java

  1. package com.xxx.mapreducer;
  2. import org.apache.commons.io.FileUtils;
  3. import org.apache.hadoop.conf.Configuration;
  4. import org.apache.hadoop.fs.Path;
  5. import org.apache.hadoop.io.IntWritable;
  6. import org.apache.hadoop.io.Text;
  7. import org.apache.hadoop.mapreduce.Job;
  8. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  9. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  10. import java.io.File;
  11. import java.io.IOException;
  12. public class WordCount {
  13. public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
  14. File file=new File(args[1]);
  15. if (file.exists()){
  16. FileUtils.deleteDirectory(file);
  17. }
  18. System.setProperty("HADOOP_USER_NAME","root");
  19. Configuration configuration = new Configuration();
  20. configuration.set("hadoop.tmp.dir", "D:/SOFT/BigData/hadoop-3.3.1/workplace/tmp");
  21. Job job = Job.getInstance(configuration, "wordCount");
  22. job.setJarByClass(WordCount.class);
  23. job.setMapperClass(MyMapper.class);
  24. job.setCombinerClass(MyCombiner.class);
  25. job.setReducerClass(MyCombiner.class);
  26. job.setOutputKeyClass(Text.class);
  27. job.setOutputValueClass(IntWritable.class);
  28. FileInputFormat.addInputPath(job, new Path(args[0]));
  29. FileOutputFormat.setOutputPath(job, new Path(args[1]));
  30. System.exit(job.waitForCompletion(true) ? 0 : 1);
  31. }
  32. }

         MyMapper.java

  1. package com.xxx.mapreducer;
  2. import org.apache.hadoop.io.IntWritable;
  3. import org.apache.hadoop.io.Text;
  4. import org.apache.hadoop.mapreduce.Mapper;
  5. import java.io.IOException;
  6. import java.util.StringTokenizer;
  7. public class MyMapper extends Mapper<Object, Text, Text, IntWritable> {
  8. private final static IntWritable one = new IntWritable(1);
  9. private Text word = new Text();
  10. @Override
  11. protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
  12. StringTokenizer itr = new StringTokenizer(value.toString());
  13. while (itr.hasMoreTokens()) {
  14. word.set(itr.nextToken());
  15. context.write(word, one);
  16. }
  17. }
  18. }

         MyCombiner.java

  1. package com.xxx.mapreducer;
  2. import org.apache.hadoop.io.IntWritable;
  3. import org.apache.hadoop.io.Text;
  4. import org.apache.hadoop.mapreduce.Reducer;
  5. import java.io.IOException;
  6. public class MyCombiner extends Reducer<Text, IntWritable, Text, IntWritable> {
  7. private IntWritable result = new IntWritable();
  8. @Override
  9. public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
  10. int sum = 0;
  11. for (IntWritable val : values) {
  12. sum += val.get();
  13. }
  14. result.set(sum);
  15. context.write(key, result);
  16. }
  17. }

6)调试

7)踩坑点:

  1. 代码运行报错:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
  2. 检查环境变量是否配置正确,且Hadoop bin目录下hadoop.dll和winutils.exe
  3. 拷贝hadoop.dll至c:\windows\system32目录

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/621872
推荐阅读
相关标签
  

闽ICP备14008679号