赞
踩
version: "3" services: namenode: image: apache/hadoop:3.3.6 hostname: namenode command: ["hdfs", "namenode"] ports: - 9870:9870 env_file: - ./config environment: ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name" datanode: image: apache/hadoop:3.3.6 command: ["hdfs", "datanode"] env_file: - ./config resourcemanager: image: apache/hadoop:3.3.6 hostname: resourcemanager command: ["yarn", "resourcemanager"] ports: - 8088:8088 env_file: - ./config volumes: - ./test.sh:/opt/test.sh nodemanager: image: apache/hadoop:3.3.6 command: ["yarn", "nodemanager"] env_file: - ./config
配置文件相关解释
hdfs namenode - 该命令用于启动Hadoop分布式文件系统的名称节点。名称节点是HDFS的关键组件之一,负责管理文件系统的命名空间和元数据。
hdfs datanode - 该命令用于启动Hadoop分布式文件系统的数据节点。数据节点负责存储和处理实际的数据块。
yarn resourcemanager - 该命令用于启动YARN的资源管理器。ResourceManager是YARN的核心组件,负责管理集群中的资源分配和作业调度。
yarn nodemanager - 该命令用于启动YARN的节点管理器。NodeManager在每个节点上运行,负责管理和监控容器的启动、停止和状态报告。
ENSURE_NAMENODE_DIR: “/tmp/hadoop-root/dfs/name” 这个配置是用来指定Hadoop分布式文件系统(HDFS)的名称节点(NameNode)的数据存储路径。
HADOOP_HOME=/opt/hadoop CORE-SITE.XML_fs.default.name=hdfs://namenode CORE-SITE.XML_fs.defaultFS=hdfs://namenode HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:8020 HDFS-SITE.XML_dfs.replication=1 MAPRED-SITE.XML_mapreduce.framework.name=yarn MAPRED-SITE.XML_yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=$HADOOP_HOME MAPRED-SITE.XML_mapreduce.map.env=HADOOP_MAPRED_HOME=$HADOOP_HOME MAPRED-SITE.XML_mapreduce.reduce.env=HADOOP_MAPRED_HOME=$HADOOP_HOME YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600 YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=* CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=* CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40 CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings= CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false
Hadoop-workspace % docker-compose up -d
[+] Running 5/5
⠿ Network hadoop-workspace_default Created 0.0s
⠿ Container hadoop-workspace-namenode-1 Started 0.9s
⠿ Container hadoop-workspace-nodemanager-1 Started 0.9s
⠿ Container hadoop-workspace-resourcemanager-1 Started 0.9s
⠿ Container hadoop-workspace-datanode-1 Started 0.7s
完成容器初始化后进入namenode容器
docker exec -it namenode /bin/bash
编写namenode的mapred-site.xml配置
<configuration> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/opt/hadoop</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/opt/hadoop</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/opt/hadoop</value> </property> </configuration>
将配置文件传入namenode
Hadoop-workspace % docker cp ./mapred-site.xml 1dbbd393fac19275547ba4d810cd7e7952bf594bb581c594f31e38300e795fcf:/opt/hadoop/etc/hadoop
尝试进行MapReduce服务
bash-4.2$ yarn jar hadoop-mapreduce-examples-3.3.6.jar pi 10 15 Number of Maps = 10 Samples per Map = 15 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 2023-07-10 01:43:15 INFO DefaultNoHARMFailoverProxyProvider:64 - Connecting to ResourceManager at resourcemanager/172.19.0.5:8032 2023-07-10 01:43:15 INFO JobResourceUploader:907 - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1688952567349_0001 2023-07-10 01:43:15 INFO FileInputFormat:300 - Total input files to process : 10 2023-07-10 01:43:16 INFO JobSubmitter:202 - number of splits:10 2023-07-10 01:43:16 INFO JobSubmitter:298 - Submitting tokens for job: job_1688952567349_0001 2023-07-10 01:43:16 INFO JobSubmitter:299 - Executing with tokens: [] 2023-07-10 01:43:16 INFO Configuration:2854 - resource-types.xml not found 2023-07-10 01:43:16 INFO ResourceUtils:476 - Unable to find 'resource-types.xml'. 2023-07-10 01:43:17 INFO YarnClientImpl:338 - Submitted application application_1688952567349_0001 2023-07-10 01:43:17 INFO Job:1682 - The url to track the job: http://resourcemanager:8088/proxy/application_1688952567349_0001/ 2023-07-10 01:43:17 INFO Job:1727 - Running job: job_1688952567349_0001 2023-07-10 01:43:25 INFO Job:1748 - Job job_1688952567349_0001 running in uber mode : false 2023-07-10 01:43:25 INFO Job:1755 - map 0% reduce 0% 2023-07-10 01:43:31 INFO Job:1755 - map 10% reduce 0% 2023-07-10 01:43:32 INFO Job:1755 - map 20% reduce 0% 2023-07-10 01:43:34 INFO Job:1755 - map 30% reduce 0% 2023-07-10 01:43:36 INFO Job:1755 - map 40% reduce 0% 2023-07-10 01:43:38 INFO Job:1755 - map 50% reduce 0% 2023-07-10 01:43:40 INFO Job:1755 - map 80% reduce 0% 2023-07-10 01:43:43 INFO Job:1755 - map 100% reduce 0% 2023-07-10 01:43:44 INFO Job:1755 - map 100% reduce 100% 2023-07-10 01:43:45 INFO Job:1766 - Job job_1688952567349_0001 completed successfully 2023-07-10 01:43:45 INFO Job:1773 - Counters: 54 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=3045185 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2600 HDFS: Number of bytes written=215 HDFS: Number of read operations=45 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=10 Launched reduce tasks=1 Rack-local map tasks=10 Total time spent by all maps in occupied slots (ms)=50140 Total time spent by all reduces in occupied slots (ms)=9631 Total time spent by all map tasks (ms)=50140 Total time spent by all reduce tasks (ms)=9631 Total vcore-milliseconds taken by all map tasks=50140 Total vcore-milliseconds taken by all reduce tasks=9631 Total megabyte-milliseconds taken by all map tasks=51343360 Total megabyte-milliseconds taken by all reduce tasks=9862144 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1420 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=1374 CPU time spent (ms)=5180 Physical memory (bytes) snapshot=3030937600 Virtual memory (bytes) snapshot=29260795904 Total committed heap usage (bytes)=2622488576 Peak Map Physical memory (bytes)=297189376 Peak Map Virtual memory (bytes)=2661572608 Peak Reduce Physical memory (bytes)=209162240 Peak Reduce Virtual memory (bytes)=2667085824 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 30.425 seconds Estimated value of Pi is 3.17333333333333333333
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。