MapReduce 作业通常将输入数据集分割成独立的块,这些块由 map 任务以完全并行的方式进行处理。MR框架对映射的输出进行排序,然后将其输入到 reduce 任务中。通常,作业的输入和输出都存储在文件系统中。该框架负责调度任务、监视任务并重新执行失败的任务。
通常,计算节点和存储节点是相同的,也就是说,MapReduce 框架和 Hadoop 分布式文件系统在同一组节点上运行。这种配置允许框架在数据已经存在的节点上有效地调度任务,从而产生跨集群的高聚合带宽。
Reducer 有三个主要阶段:grouping,sortpartiton,reduce
注意:Shuffle 和 Sort 同时进行,当获取 map 输出时,它们被合并。
Reduce 的数量可以由 Job.setNumReduceTasks(int) 指定。//数量必须比 partition 分区的数量大,不然会报错
hive 任务的底层就是 MapReduce 任务
- 0: jdbc:hive2://hiveserver2.bigdata.chinatele> insert OVERWRITE table jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d PARTITION (data_day)
- . . . . . . . . . . . . . . . . . . . . . . .> select * from sc_share_db.app_mbl_user_trmnl_trail_info_d a where a.data_day BETWEEN 20190619 AND 20191007;
- INFO : Compiling command(queryId=hive_20191010113232_c82e18e3-5853-43c5-9856-d1d1c55dde45): insert OVERWRITE table jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d PARTITION (data_day)
- select * from sc_share_db.app_mbl_user_trmnl_trail_info_d a where a.data_day BETWEEN 20190619 AND 20191007
- INFO : Semantic Analysis Completed
- INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:a.mdn, type:string, comment:null), FieldSchema(name:a.r_trmnl_brand, type:string, comment:null), FieldSchema(name:a.r_trmnl_model, type:string, comment:null), FieldSchema(name:a.r_use_day, type:string, comment:null), FieldSchema(name:a.d_trmnl_brand, type:string, comment:null), FieldSchema(name:a.d_trmnl_model, type:string, comment:null), FieldSchema(name:a.d_use_day, type:string, comment:null), FieldSchema(name:a.data_day, type:string, comment:null)], properties:null)
- INFO : Completed compiling command(queryId=hive_20191010113232_c82e18e3-5853-43c5-9856-d1d1c55dde45); Time taken: 0.412 seconds
- INFO : Concurrency mode is disabled, not creating a lock manager
- INFO : Executing command(queryId=hive_20191010113232_c82e18e3-5853-43c5-9856-d1d1c55dde45): insert OVERWRITE table jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d PARTITION (data_day)
- select * from sc_share_db.app_mbl_user_trmnl_trail_info_d a where a.data_day BETWEEN 20190619 AND 20191007
- INFO : Query ID = hive_20191010113232_c82e18e3-5853-43c5-9856-d1d1c55dde45
- INFO : Total jobs = 3
- INFO : Launching Job 1 out of 3
- INFO : Starting task [Stage-1:MAPRED] in serial mode
- INFO : Number of reduce tasks is set to 0 since there's no reduce operator
- INFO : number of splits:237
- INFO : Submitting tokens for job: job_1569295562481_2677748
- INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns4, Ident: (token for jt_jtsjzxsjyyc_sc_fwfz: HDFS_DELEGATION_TOKEN owner=jt_jtsjzxsjyyc_sc_fwfz, renewer=yarn, realUser=hive/hiveserver2.bigdata.chinatelecom.cn@HADOOP.CHINATELECOM.CN, issueDate=1570678407251, maxDate=1571283207251, sequenceNumber=99889585, masterKeyId=889)
- INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns3, Ident: (token for jt_jtsjzxsjyyc_sc_fwfz: HDFS_DELEGATION_TOKEN owner=jt_jtsjzxsjyyc_sc_fwfz, renewer=yarn, realUser=hive/hiveserver2.bigdata.chinatelecom.cn@HADOOP.CHINATELECOM.CN, issueDate=1570678407264, maxDate=1571283207264, sequenceNumber=100362646, masterKeyId=873)
- INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns, Ident: (token for jt_jtsjzxsjyyc_sc_fwfz: HDFS_DELEGATION_TOKEN owner=jt_jtsjzxsjyyc_sc_fwfz, renewer=yarn, realUser=hive/hiveserver2.bigdata.chinatelecom.cn@HADOOP.CHINATELECOM.CN, issueDate=1570678406757, maxDate=1571283206757, sequenceNumber=381027444, masterKeyId=1165)
- INFO : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 16 6a 74 5f 6a 74 73 6a 7a 78 73 6a 79 79 63 5f 73 63 5f 66 77 66 7a 16 6a 74 5f 6a 74 73 6a 7a 78 73 6a 79 79 63 5f 73 63 5f 66 77 66 7a 3f 68 69 76 65 2f 68 69 76 65 73 65 72 76 65 72 32 2e 62 69 67 64 61 74 61 2e 63 68 69 6e 61 74 65 6c 65 63 6f 6d 2e 63 6e 40 48 41 44 4f 4f 50 2e 43 48 49 4e 41 54 45 4c 45 43 4f 4d 2e 43 4e 8a 01 6d b3 a7 a2 5e 8a 01 6d d7 b4 26 5e 8e 77 aa 8e 19 30
- INFO : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns2, Ident: (token for jt_jtsjzxsjyyc_sc_fwfz: HDFS_DELEGATION_TOKEN owner=jt_jtsjzxsjyyc_sc_fwfz, renewer=yarn, realUser=hive/hiveserver2.bigdata.chinatelecom.cn@HADOOP.CHINATELECOM.CN, issueDate=1570678407250, maxDate=1571283207250, sequenceNumber=110691977, masterKeyId=871)
- INFO : The url to track the job: http://NM-304-RH5885V3-BIGDATA-008:8088/proxy/application_1569295562481_2677748/
- INFO : Starting Job = job_1569295562481_2677748, Tracking URL = http://NM-304-RH5885V3-BIGDATA-008:8088/proxy/application_1569295562481_2677748/
- INFO : Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1569295562481_2677748
- INFO : Hadoop job information for Stage-1: number of mappers: 237; number of reducers: 0
- INFO : 2019-10-10 11:35:34,427 Stage-1 map = 0%, reduce = 0%
- INFO : 2019-10-10 11:36:18,032 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 10.65 sec
- INFO : 2019-10-10 11:36:19,080 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 96.18 sec
- INFO : 2019-10-10 11:36:20,131 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 224.8 sec
- INFO : 2019-10-10 11:36:21,181 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 342.52 sec
- INFO : 2019-10-10 11:36:22,243 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 444.37 sec
- INFO : 2019-10-10 11:36:23,304 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 1014.68 sec
- INFO : 2019-10-10 11:36:24,582 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 1638.8 sec
- ================================中间有一部分略
- INFO : 2019-10-10 11:36:37,280 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 2181.11 sec
- INFO : 2019-10-10 11:36:48,674 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 2209.67 sec
- INFO : 2019-10-10 11:37:01,292 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 2299.86 sec
- INFO : 2019-10-10 11:37:07,494 Stage-1 map = 92%, reduce = 0%, Cumulative CPU 2322.2 sec
- INFO : 2019-10-10 11:37:12,849 Stage-1 map = 93%, reduce = 0%, Cumulative CPU 2335.47 sec
- INFO : 2019-10-10 11:37:13,886 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 2363.0 sec
- INFO : 2019-10-10 11:37:14,922 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 2372.74 sec
- INFO : 2019-10-10 11:39:16,852 Stage-1 map = 99%, reduce = 0%, Cumulative CPU 2386.92 sec
- INFO : 2019-10-10 11:46:52,457 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2398.68 sec
- INFO : MapReduce Total cumulative CPU time: 39 minutes 58 seconds 680 msec
- INFO : Ended Job = job_1569295562481_2677748
- INFO : Starting task [Stage-7:CONDITIONAL] in serial mode
- INFO : Stage-4 is selected by condition resolver.
- INFO : Stage-3 is filtered out by condition resolver.
- INFO : Stage-5 is filtered out by condition resolver.
- INFO : Starting task [Stage-4:MOVE] in serial mode
- INFO : Moving data to: viewfs://ctccfs/user/hive_tmp/.hive-staging_hive_2019-10-10_11-32-41_413_3936369249909689263-4162/-ext-10000 from viewfs://ctccfs/user/hive_tmp/.hive-staging_hive_2019-10-10_11-32-41_413_3936369249909689263-4162/-ext-10002
- INFO : Starting task [Stage-0:MOVE] in serial mode
- INFO : Loading data to table jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d partition (data_day=null) from viewfs://ctccfs/user/hive_tmp/.hive-staging_hive_2019-10-10_11-32-41_413_3936369249909689263-4162/-ext-10000
- INFO : Time taken for load dynamic partitions : 37085
- INFO : Loading partition {data_day=20190915}
- INFO : Loading partition {data_day=20190624}
- INFO : Loading partition {data_day=20190906}
- INFO : Loading partition {data_day=20190902}
- INFO : Loading partition {data_day=20190909}
- ================================中间有一部分略
- INFO : Loading partition {data_day=20190901}
- INFO : Loading partition {data_day=20190916}
- INFO : Loading partition {data_day=20190908}
- INFO : Loading partition {data_day=20190723}
- INFO : Time taken for adding to write entity : 12
- INFO : Starting task [Stage-2:STATS] in serial mode
- INFO : Partition jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d{data_day=20191001} stats: [numFiles=1, numRows=106817, totalSize=5047510, rawDataSize=4940693]
- INFO : Partition jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d{data_day=20191002} stats: [numFiles=1, numRows=142186, totalSize=7349564, rawDataSize=7207378]
- INFO : Partition jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d{data_day=20191003} stats: [numFiles=1, numRows=146760, totalSize=7585261, rawDataSize=7438501]
- ================================中间有一部分略
- INFO : Partition jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d{data_day=20191004} stats: [numFiles=1, numRows=115010, totalSize=5880787, rawDataSize=5765777]
- INFO : Partition jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d{data_day=20191005} stats: [numFiles=1, numRows=128308, totalSize=6711669, rawDataSize=6583361]
- INFO : Partition jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d{data_day=20191006} stats: [numFiles=1, numRows=104644, totalSize=5418150, rawDataSize=5313506]
- INFO : Partition jt_jtsjzxsjyyc_sc_fwfz.app_mbl_user_trmnl_trail_info_d{data_day=20191007} stats: [numFiles=1, numRows=89627, totalSize=4577004, rawDataSize=4487377]
- INFO : MapReduce Jobs Launched:
- INFO : Stage-Stage-1: Map: 237 Cumulative CPU: 2398.68 sec HDFS Read: 21622480200 HDFS Write: 459476088 SUCCESS
- INFO : Total MapReduce CPU Time Spent: 39 minutes 58 seconds 680 msec
- INFO : Completed executing command(queryId=hive_2019101011 3232_c82e18e3-5853-43c5-9856-d1d1c55dde45); Time taken: 1008.946 seconds
- No rows affected (1009.374 seconds)

Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。