当前位置:   article > 正文

Hadoop基础之《(7)—Hadoop三种运行模式》

hadoop三种运行模式

一、hadoop有三种运行模式

1、本地模式
数据存储在linux本地,不用

2、伪分布式集群
数据存储在HDFS,测试用

3、完全分布式集群
数据存储在HDFS,同时多台服务器工作。企业大量使用

二、单机运行
单机运行就是直接执行hadoop命令

1、例子-统计单词数量
cd /appserver/hadoop/hadoop-3.3.4
mkdir wcinput
mkdir outinput
在wcinput下建立一个word.txt,输入一些单词

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount wcinput/ wcoutput/

三、ssh免密登录

ssh-keygen生成本机的公私钥对。ssh-copy-id将本机公钥安装到远程主机上,实现免密登录远程主机

四、单机伪集群安装
见上一篇

五、集群部署规划

1、NameNode和SecondaryNameNode不要安装在同一台服务器。

2、ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。

hadoop101hadoop102hadoop103
HDFS

NameNode

DataNode

DataNode

SecondaryNameNode

DataNode

YARN

NodeManager

ResourceManager

NodeManager

NodeManager

3、配置文件配置
Hadoop配置文件分为两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值

(1)默认配置文件

默认的四大配置文件存放位置
core-default.xmlhadoop-common-3.x.x.jar/core-default.xml
hdfs-default.xmlhadoop-hdfs-3.x.x.jar/hdfs-default.xml
yarn-default.xmlhadoop-yarn-common-3.x.x.jar/yarn-default.xml
mapred-default.xmlhadoop-mapreduce-client-core-3.x.x.jar/mapred-default.xml

(2)自定义配置文件
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置

(3)workers配置文件
配置集群,把所有节点添加进去

六、NameNode初始化

1、如果是第一次启动集群,需要进行初始化操作,在hadoop101节点上格式化NameNode。(类似于电脑新加了一个硬盘,需要给硬盘进行分盘符等初始化操作)。

2、如果多次格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到以往数据。如果集群在运行过程中报错,需要重新格式化NameNode的话,一定要先停止NameNode和DataNode的进程,并且要删除所有机器的data和logs目录,然后再进行格式化。

七、查看HDFS上存储的数据

http://192.168.1.1:9870/,访问Browse the file system

八、查看Yarn上运行的Job

九、测试

1、在hdfs创建一个目录
cd /appserver/hadoop/hadoop-3.3.4/bin
./hadoop fs -mkdir /wcinput

2、上传一个文件
./hadoop fs -put /tmp/aa.txt /wcinput

3、点下载文件报错
Couldn't preview the file. NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'http://localhost:9864/webhdfs/v1/wcinput/aa.txt?op=OPEN&namenoderpcaddress=hadoop001:8020&offset=0&_=1675238533732'.

解决办法:检查/etc/hosts下主机名配置,关闭hdfs、yarn,重启服务器,就好了。。。

这里的Availability本来是localhost,重启后变成hadoop001。(PS:访问的电脑本机也要配置好hosts,加上节点和IP)

4、文件存储位置
在/appserver/hadoop/datanode/data/current/BP-1427885282-127.0.0.1-1675234871384/current/finalized/subdir0/subdir0目录下

  1. -rw-r--r--. 1 root root 29 2月 1 15:50 blk_1073741825
  2. -rw-r--r--. 1 root root 11 2月 1 15:50 blk_1073741825_1001.meta
  3. [root@hadoop001 subdir0]# cat blk_1073741825
  4. aa
  5. bb ce
  6. bobo
  7. tom
  8. aa
  9. bobo
  10. aa

5、执行wordcount程序
看看yarn是怎么工作的

  1. cd /appserver/hadoop/hadoop-3.3.4
  2. bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /wcinput /wcoutput

报错了:

  1. 2023-02-01 17:25:44,177 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop001/192.168.0.3:8032
  2. 2023-02-01 17:25:44,435 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1675239930166_0002
  3. 2023-02-01 17:25:45,103 INFO input.FileInputFormat: Total input files to process : 1
  4. 2023-02-01 17:25:46,136 INFO mapreduce.JobSubmitter: number of splits:1
  5. 2023-02-01 17:25:46,692 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1675239930166_0002
  6. 2023-02-01 17:25:46,692 INFO mapreduce.JobSubmitter: Executing with tokens: []
  7. 2023-02-01 17:25:46,803 INFO conf.Configuration: resource-types.xml not found
  8. 2023-02-01 17:25:46,803 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
  9. 2023-02-01 17:25:47,171 INFO impl.YarnClientImpl: Submitted application application_1675239930166_0002
  10. 2023-02-01 17:25:47,191 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1675239930166_0002/
  11. 2023-02-01 17:25:47,192 INFO mapreduce.Job: Running job: job_1675239930166_0002
  12. 2023-02-01 17:25:50,231 INFO mapreduce.Job: Job job_1675239930166_0002 running in uber mode : false
  13. 2023-02-01 17:25:50,232 INFO mapreduce.Job: map 0% reduce 0%
  14. 2023-02-01 17:25:50,241 INFO mapreduce.Job: Job job_1675239930166_0002 failed with state FAILED due to: Application application_1675239930166_0002 failed 2 times due to AM Container for appattempt_1675239930166_0002_000002 exited with exitCode: 1
  15. Failing this attempt.Diagnostics: [2023-02-01 17:25:49.755]Exception from container-launch.
  16. Container id: container_1675239930166_0002_02_000001
  17. Exit code: 1
  18. [2023-02-01 17:25:49.757]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
  19. Last 4096 bytes of prelaunch.err :
  20. Last 4096 bytes of stderr :
  21. 错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
  22. [2023-02-01 17:25:49.760]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
  23. Last 4096 bytes of prelaunch.err :
  24. Last 4096 bytes of stderr :
  25. 错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
  26. For more detailed output, check the application tracking page: http://hadoop001:8088/cluster/app/application_1675239930166_0002 Then click on links to logs of each attempt.
  27. . Failing the application.
  28. 2023-02-01 17:25:50,252 INFO mapreduce.Job: Counters: 0

解决办法:

  1. bin/hadoop classpath
  2. /appserver/hadoop/hadoop-3.3.4/etc/hadoop:/appserver/hadoop/hadoop-3.3.4/share/hadoop/common/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/common/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/mapreduce/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn/*

将路径加入yarn-site.xml:

  1. <property>
  2. <name>yarn.application.classpath</name>
  3. <value>/appserver/hadoop/hadoop-3.3.4/etc/hadoop:/appserver/hadoop/hadoop-3.3.4/share/hadoop/common/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/common/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/hdfs/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/mapreduce/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn/lib/*:/appserver/hadoop/hadoop-3.3.4/share/hadoop/yarn/*</value>
  4. </property>

重启yarn:

  1. yarn --daemon stop resourcemanager
  2. yarn --daemon stop nodemanager
  3. yarn --daemon start resourcemanager
  4. yarn --daemon start nodemanager

执行成功:

  1. 2023-02-01 17:40:08,122 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop001/192.168.0.3:8032
  2. 2023-02-01 17:40:08,390 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1675244388030_0001
  3. 2023-02-01 17:40:09,045 INFO input.FileInputFormat: Total input files to process : 1
  4. 2023-02-01 17:40:09,275 INFO mapreduce.JobSubmitter: number of splits:1
  5. 2023-02-01 17:40:09,833 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1675244388030_0001
  6. 2023-02-01 17:40:09,834 INFO mapreduce.JobSubmitter: Executing with tokens: []
  7. 2023-02-01 17:40:09,943 INFO conf.Configuration: resource-types.xml not found
  8. 2023-02-01 17:40:09,944 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
  9. 2023-02-01 17:40:10,106 INFO impl.YarnClientImpl: Submitted application application_1675244388030_0001
  10. 2023-02-01 17:40:10,162 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1675244388030_0001/
  11. 2023-02-01 17:40:10,162 INFO mapreduce.Job: Running job: job_1675244388030_0001
  12. 2023-02-01 17:40:16,271 INFO mapreduce.Job: Job job_1675244388030_0001 running in uber mode : false
  13. 2023-02-01 17:40:16,273 INFO mapreduce.Job: map 0% reduce 0%
  14. 2023-02-01 17:40:19,311 INFO mapreduce.Job: map 100% reduce 0%
  15. 2023-02-01 17:40:23,337 INFO mapreduce.Job: map 100% reduce 100%
  16. 2023-02-01 17:40:25,359 INFO mapreduce.Job: Job job_1675244388030_0001 completed successfully
  17. 2023-02-01 17:40:25,416 INFO mapreduce.Job: Counters: 54
  18. File System Counters
  19. FILE: Number of bytes read=54
  20. FILE: Number of bytes written=552301
  21. FILE: Number of read operations=0
  22. FILE: Number of large read operations=0
  23. FILE: Number of write operations=0
  24. HDFS: Number of bytes read=130
  25. HDFS: Number of bytes written=28
  26. HDFS: Number of read operations=8
  27. HDFS: Number of large read operations=0
  28. HDFS: Number of write operations=2
  29. HDFS: Number of bytes read erasure-coded=0
  30. Job Counters
  31. Launched map tasks=1
  32. Launched reduce tasks=1
  33. Data-local map tasks=1
  34. Total time spent by all maps in occupied slots (ms)=1397
  35. Total time spent by all reduces in occupied slots (ms)=1565
  36. Total time spent by all map tasks (ms)=1397
  37. Total time spent by all reduce tasks (ms)=1565
  38. Total vcore-milliseconds taken by all map tasks=1397
  39. Total vcore-milliseconds taken by all reduce tasks=1565
  40. Total megabyte-milliseconds taken by all map tasks=1430528
  41. Total megabyte-milliseconds taken by all reduce tasks=1602560
  42. Map-Reduce Framework
  43. Map input records=7
  44. Map output records=8
  45. Map output bytes=61
  46. Map output materialized bytes=54
  47. Input split bytes=101
  48. Combine input records=8
  49. Combine output records=5
  50. Reduce input groups=5
  51. Reduce shuffle bytes=54
  52. Reduce input records=5
  53. Reduce output records=5
  54. Spilled Records=10
  55. Shuffled Maps =1
  56. Failed Shuffles=0
  57. Merged Map outputs=1
  58. GC time elapsed (ms)=62
  59. CPU time spent (ms)=540
  60. Physical memory (bytes) snapshot=567222272
  61. Virtual memory (bytes) snapshot=5580062720
  62. Total committed heap usage (bytes)=479199232
  63. Peak Map Physical memory (bytes)=329158656
  64. Peak Map Virtual memory (bytes)=2786742272
  65. Peak Reduce Physical memory (bytes)=238063616
  66. Peak Reduce Virtual memory (bytes)=2793320448
  67. Shuffle Errors
  68. BAD_ID=0
  69. CONNECTION=0
  70. IO_ERROR=0
  71. WRONG_LENGTH=0
  72. WRONG_MAP=0
  73. WRONG_REDUCE=0
  74. File Input Format Counters
  75. Bytes Read=29
  76. File Output Format Counters
  77. Bytes Written=28

6、查看结果

十、历史服务器和日志归集

1、历史服务器可以查看程序的历史运行情况。

2、日志归集

日志聚集:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。

我们配置的历史服务器虽然可以查看到历史 job 的运行信息,但是如果点击后面的logs链接查看其详细日志,却无法查看,提示 Aggregation is not enabled.(日志聚集功能未开启)。

我们开启了日志聚集功能后,可以很方便的查看程序运行详情,方便开发调试。

开启日志聚集功能,需要重新启动 NodeMananger、ResourceMananger、HistoryServer。

3、配置、重启好服务后,删除wcoutput目录,重新执行任务

  1. hadoop fs -rm -f -r /wcoutput
  2. hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /wcinput /wcoutput

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/530091
推荐阅读
相关标签
  

闽ICP备14008679号