当前位置:   article > 正文

【Hadoop】使用Docker容器搭建伪分布式集群_在容器中配置伪分布式模式 hadoop,并执行命令

在容器中配置伪分布式模式 hadoop,并执行命令

使用Docker容器搭建Hadoop伪分布式集群

1、编写docker-compose.yaml文件配置集群
version: "3"
services:
   namenode:
      image: apache/hadoop:3.3.6
      hostname: namenode
      command: ["hdfs", "namenode"]
      ports:
        - 9870:9870
      env_file:
        - ./config
      environment:
          ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name"
   datanode:
      image: apache/hadoop:3.3.6
      command: ["hdfs", "datanode"]
      env_file:
        - ./config      
   resourcemanager:
      image: apache/hadoop:3.3.6
      hostname: resourcemanager
      command: ["yarn", "resourcemanager"]
      ports:
         - 8088:8088
      env_file:
        - ./config
      volumes:
        - ./test.sh:/opt/test.sh
   nodemanager:
      image: apache/hadoop:3.3.6
      command: ["yarn", "nodemanager"]
      env_file:
        - ./config
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32

配置文件相关解释

hdfs namenode - 该命令用于启动Hadoop分布式文件系统的名称节点。名称节点是HDFS的关键组件之一,负责管理文件系统的命名空间和元数据。

hdfs datanode - 该命令用于启动Hadoop分布式文件系统的数据节点。数据节点负责存储和处理实际的数据块。

yarn resourcemanager - 该命令用于启动YARN的资源管理器。ResourceManager是YARN的核心组件,负责管理集群中的资源分配和作业调度。

yarn nodemanager - 该命令用于启动YARN的节点管理器。NodeManager在每个节点上运行,负责管理和监控容器的启动、停止和状态报告。

ENSURE_NAMENODE_DIR: “/tmp/hadoop-root/dfs/name” 这个配置是用来指定Hadoop分布式文件系统(HDFS)的名称节点(NameNode)的数据存储路径。

2、Config 配置文件
HADOOP_HOME=/opt/hadoop
CORE-SITE.XML_fs.default.name=hdfs://namenode
CORE-SITE.XML_fs.defaultFS=hdfs://namenode
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:8020
HDFS-SITE.XML_dfs.replication=1
MAPRED-SITE.XML_mapreduce.framework.name=yarn
MAPRED-SITE.XML_yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.map.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.reduce.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager
YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600
YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings=
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
3、启动相应容器
Hadoop-workspace % docker-compose up -d
[+] Running 5/5
 ⠿ Network hadoop-workspace_default              Created                   0.0s
 ⠿ Container hadoop-workspace-namenode-1         Started                   0.9s
 ⠿ Container hadoop-workspace-nodemanager-1      Started                   0.9s
 ⠿ Container hadoop-workspace-resourcemanager-1  Started                   0.9s
 ⠿ Container hadoop-workspace-datanode-1         Started                   0.7s
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

完成容器初始化后进入namenode容器

docker exec -it namenode /bin/bash
  • 1
4、尝试进行MapReduce服务

编写namenode的mapred-site.xml配置

<configuration>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/opt/hadoop</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/opt/hadoop</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/opt/hadoop</value>
    </property>
</configuration>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

将配置文件传入namenode

Hadoop-workspace % docker cp ./mapred-site.xml 1dbbd393fac19275547ba4d810cd7e7952bf594bb581c594f31e38300e795fcf:/opt/hadoop/etc/hadoop
  • 1

尝试进行MapReduce服务

bash-4.2$ yarn jar hadoop-mapreduce-examples-3.3.6.jar pi 10 15
Number of Maps  = 10
Samples per Map = 15
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2023-07-10 01:43:15 INFO  DefaultNoHARMFailoverProxyProvider:64 - Connecting to ResourceManager at resourcemanager/172.19.0.5:8032
2023-07-10 01:43:15 INFO  JobResourceUploader:907 - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1688952567349_0001
2023-07-10 01:43:15 INFO  FileInputFormat:300 - Total input files to process : 10
2023-07-10 01:43:16 INFO  JobSubmitter:202 - number of splits:10
2023-07-10 01:43:16 INFO  JobSubmitter:298 - Submitting tokens for job: job_1688952567349_0001
2023-07-10 01:43:16 INFO  JobSubmitter:299 - Executing with tokens: []
2023-07-10 01:43:16 INFO  Configuration:2854 - resource-types.xml not found
2023-07-10 01:43:16 INFO  ResourceUtils:476 - Unable to find 'resource-types.xml'.
2023-07-10 01:43:17 INFO  YarnClientImpl:338 - Submitted application application_1688952567349_0001
2023-07-10 01:43:17 INFO  Job:1682 - The url to track the job: http://resourcemanager:8088/proxy/application_1688952567349_0001/
2023-07-10 01:43:17 INFO  Job:1727 - Running job: job_1688952567349_0001
2023-07-10 01:43:25 INFO  Job:1748 - Job job_1688952567349_0001 running in uber mode : false
2023-07-10 01:43:25 INFO  Job:1755 -  map 0% reduce 0%
2023-07-10 01:43:31 INFO  Job:1755 -  map 10% reduce 0%
2023-07-10 01:43:32 INFO  Job:1755 -  map 20% reduce 0%
2023-07-10 01:43:34 INFO  Job:1755 -  map 30% reduce 0%
2023-07-10 01:43:36 INFO  Job:1755 -  map 40% reduce 0%
2023-07-10 01:43:38 INFO  Job:1755 -  map 50% reduce 0%
2023-07-10 01:43:40 INFO  Job:1755 -  map 80% reduce 0%
2023-07-10 01:43:43 INFO  Job:1755 -  map 100% reduce 0%
2023-07-10 01:43:44 INFO  Job:1755 -  map 100% reduce 100%
2023-07-10 01:43:45 INFO  Job:1766 - Job job_1688952567349_0001 completed successfully
2023-07-10 01:43:45 INFO  Job:1773 - Counters: 54
    File System Counters
        FILE: Number of bytes read=226
        FILE: Number of bytes written=3045185
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=2600
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=45
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
        HDFS: Number of bytes read erasure-coded=0
    Job Counters 
        Launched map tasks=10
        Launched reduce tasks=1
        Rack-local map tasks=10
        Total time spent by all maps in occupied slots (ms)=50140
        Total time spent by all reduces in occupied slots (ms)=9631
        Total time spent by all map tasks (ms)=50140
        Total time spent by all reduce tasks (ms)=9631
        Total vcore-milliseconds taken by all map tasks=50140
        Total vcore-milliseconds taken by all reduce tasks=9631
        Total megabyte-milliseconds taken by all map tasks=51343360
        Total megabyte-milliseconds taken by all reduce tasks=9862144
    Map-Reduce Framework
        Map input records=10
        Map output records=20
        Map output bytes=180
        Map output materialized bytes=280
        Input split bytes=1420
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=280
        Reduce input records=20
        Reduce output records=0
        Spilled Records=40
        Shuffled Maps =10
        Failed Shuffles=0
        Merged Map outputs=10
        GC time elapsed (ms)=1374
        CPU time spent (ms)=5180
        Physical memory (bytes) snapshot=3030937600
        Virtual memory (bytes) snapshot=29260795904
        Total committed heap usage (bytes)=2622488576
        Peak Map Physical memory (bytes)=297189376
        Peak Map Virtual memory (bytes)=2661572608
        Peak Reduce Physical memory (bytes)=209162240
        Peak Reduce Virtual memory (bytes)=2667085824
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=1180
    File Output Format Counters 
        Bytes Written=97
Job Finished in 30.425 seconds
Estimated value of Pi is 3.17333333333333333333
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/602131
推荐阅读
相关标签
  

闽ICP备14008679号