当前位置:   article > 正文

Pig的安装及基本使用

Pig的安装及基本使用

pig的安装

下载并解压安装包

  在Apache下载最新的Pig软件包,点击下载会推荐最快的镜像站点,以下为下载地址:
   pig下载地址

配置环境

  解压缩到安装路径, 用如下命令编辑/etc/profile文件:
Pig工作模式

本地模式:只需要配置PATH环境变量${PIG_HOME}/bin即可,适用于测试

Mapreduce模式:需要添加环境变量PIG_CLASSPATH=${HADOOP_HOME}/conf/,指向hadoop的conf目录,我的是hadoop2.6 ,hadoop home: /usr/local/hadoop/etc/hadoop

 sudo vi /etc/profile
 添加:
    export PIG_HOME=/app/pig-0.13.0
    export PIG_CLASSPATH=/usr/local/hadoop/etc/hadoop
    export PATH=$PATH:$PIG_HOME/bin
  • 1
  • 2
  • 3
  • 4
  • 5

基本使用

将测试数据复制到hdfs上:  测试数据下载

hadoop fs -put ncdc_data.txt  /input/in1/
  • 1

 使用Pig latin求年最高气温

  1、加载天气数据

 第一次将地址写错了, 导致一直没有找到文件

grunt> A = LOAD '/input/in1/ncdc_data.txt' USING PigStorage(':') AS (year:int, temp:int, quality:int);
  • 1

 1、过滤数据

grunt> B = FILTER A BY temp != 9999 AND ((chararray)quality matches '[01459]');
或  B = FILTER A BY temp != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9);
  • 1
  • 2

  按年分组天气数据

grunt> C = GROUP B BY year; 
  • 1

  逐行扫描数据并求最大值和对应的年份(group)

grunt> D = FOREACH C GENERATE group, MAX(B.temp) AS max_temp;
  • 1

 输出结果

grunt> DUMP D;
  • 1
2016-11-20 06:02:41,902 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2016-11-20 06:02:42,053 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-20 06:02:42,054 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-11-20 06:02:42,067 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-11-20 06:02:42,069 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-11-20 06:02:42,107 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-11-20 06:02:42,114 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner
2016-11-20 06:02:42,140 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-11-20 06:02:42,140 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-11-20 06:02:42,241 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-20 06:02:42,250 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:02:42,263 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-11-20 06:02:42,278 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-11-20 06:02:42,280 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2016-11-20 06:02:42,280 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2016-11-20 06:02:42,308 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=3673672
2016-11-20 06:02:42,308 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2016-11-20 06:02:42,308 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2016-11-20 06:02:43,095 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0-core-h2.jar to DistributedCache through /tmp/temp-60624248/tmp72750994/pig-0.16.0-core-h2.jar
2016-11-20 06:02:43,367 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-60624248/tmp-2105835473/automaton-1.11-8.jar
2016-11-20 06:02:43,518 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-60624248/tmp1218719075/antlr-runtime-3.4.jar
2016-11-20 06:02:43,701 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-60624248/tmp-2048402576/joda-time-2.9.3.jar
2016-11-20 06:02:43,707 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-11-20 06:02:43,710 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2016-11-20 06:02:43,710 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2016-11-20 06:02:43,710 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2016-11-20 06:02:43,840 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-11-20 06:02:43,847 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:02:44,029 [JobControl] WARN  org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2016-11-20 06:02:44,159 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2016-11-20 06:02:44,172 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-11-20 06:02:44,172 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-11-20 06:02:44,350 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-11-20 06:02:44,709 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2016-11-20 06:02:47,105 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1479576092520_0006
2016-11-20 06:02:47,816 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2016-11-20 06:02:53,694 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1479576092520_0006
2016-11-20 06:02:54,016 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://TEST:8088/proxy/application_1479576092520_0006/
2016-11-20 06:02:54,017 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1479576092520_0006
2016-11-20 06:02:54,017 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C,D
2016-11-20 06:02:54,017 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[5,4],A[-1,-1],B[6,4],D[8,4],C[7,4] C: D[8,4],C[7,4] R: D[8,4]
2016-11-20 06:02:54,252 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-11-20 06:02:54,252 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0006]
2016-11-20 06:04:53,944 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 5% complete
2016-11-20 06:04:53,945 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0006]
2016-11-20 06:04:56,974 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 21% complete
2016-11-20 06:04:56,978 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0006]
2016-11-20 06:05:04,031 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2016-11-20 06:05:04,031 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0006]
2016-11-20 06:05:24,319 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2016-11-20 06:05:24,320 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0006]
2016-11-20 06:10:06,870 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
2016-11-20 06:10:06,870 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0006]
2016-11-20 06:10:14,258 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete
2016-11-20 06:10:14,258 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0006]
2016-11-20 06:10:22,325 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0006]
2016-11-20 06:11:03,514 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:11:03,646 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:11:49,363 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:11:49,434 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:11:49,883 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:11:49,910 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:11:50,354 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-11-20 06:11:50,367 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.6.0   0.16.0  chb     2016-11-20 06:02:42     2016-11-20 06:11:50     GROUP_BY,FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTime      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime       Alias    Feature Outputs
job_1479576092520_0006  1       1       88      88      88      88      310     310     310     310     A,B,C,D GROUP_BY,COMBINER       hdfs://192.168.1.124:9000/tmp/temp-60624248/tmp-1087782019,

Input(s):
Successfully read 321146 records (3674048 bytes) from: "/input/in1/ncdc_data.txt"

Output(s):
Successfully stored 43 records (430 bytes) in: "hdfs://192.168.1.124:9000/tmp/temp-60624248/tmp-1087782019"

Counters:
Total records written : 43
Total bytes written : 430
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1479576092520_0006


2016-11-20 06:11:50,377 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:11:50,397 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:11:50,554 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:11:50,573 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:11:51,275 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:11:51,349 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:11:52,066 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2016-11-20 06:11:52,068 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-20 06:11:52,069 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-11-20 06:11:52,070 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-11-20 06:11:52,528 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-11-20 06:11:52,528 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1901,317)
(1902,261)
(1903,278)
(1904,194)
(1905,278)
(1906,283)
(1907,300)
(1908,322)
(1909,350)
(1910,322)
(1911,322)
(1912,411)
(1913,361)
(1914,378)
(1915,411)
(1916,289)
(1917,478)
(1918,450)
(1919,428)
(1920,344)
(1921,417)
(1922,400)
(1923,394)
(1924,456)
(1925,322)
(1926,411)
(1928,161)
(1929,178)
(1930,311)
(1931,450)
(1932,322)
(1933,411)
(1934,300)
(1935,311)
(1936,389)
(1937,339)
(1938,411)
(1939,433)
(1940,433)
(1941,462)
(1942,278)
(1949,367)
(1953,400)
grunt> 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147

 存储结果到文件

grunt>  STORE D INTO 'max_temp' USING PigStorage(':');
  • 1
2016-11-20 06:28:32,644 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-20 06:28:32,645 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-11-20 06:28:32,925 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator
2016-11-20 06:28:33,159 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2016-11-20 06:28:33,444 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-20 06:28:33,444 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-11-20 06:28:33,447 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-11-20 06:28:33,448 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-11-20 06:28:33,496 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-11-20 06:28:33,520 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner
2016-11-20 06:28:33,546 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-11-20 06:28:33,546 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-11-20 06:28:33,751 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-11-20 06:28:33,773 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:28:33,781 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-11-20 06:28:33,804 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-11-20 06:28:33,806 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2016-11-20 06:28:33,806 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2016-11-20 06:28:33,826 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=3673672
2016-11-20 06:28:33,826 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2016-11-20 06:28:33,826 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2016-11-20 06:28:36,502 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0-core-h2.jar to DistributedCache through /tmp/temp-60624248/tmp-1199985731/pig-0.16.0-core-h2.jar
2016-11-20 06:28:36,765 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-60624248/tmp721246289/automaton-1.11-8.jar
2016-11-20 06:28:37,076 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-60624248/tmp341502194/antlr-runtime-3.4.jar
2016-11-20 06:28:37,560 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-60624248/tmp-587981636/joda-time-2.9.3.jar
2016-11-20 06:28:37,567 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-11-20 06:28:37,574 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2016-11-20 06:28:37,574 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2016-11-20 06:28:37,574 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2016-11-20 06:28:37,907 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-11-20 06:28:37,943 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:28:38,104 [JobControl] WARN  org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2016-11-20 06:28:38,208 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2016-11-20 06:28:38,233 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-11-20 06:28:38,234 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-11-20 06:28:38,249 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-11-20 06:28:38,887 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2016-11-20 06:28:39,586 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1479576092520_0007
2016-11-20 06:28:39,610 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2016-11-20 06:28:39,843 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1479576092520_0007
2016-11-20 06:28:39,945 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://TEST:8088/proxy/application_1479576092520_0007/
2016-11-20 06:28:39,945 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1479576092520_0007
2016-11-20 06:28:39,947 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C,D
2016-11-20 06:28:39,947 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[5,4],A[-1,-1],B[6,4],D[8,4],C[7,4] C: D[8,4],C[7,4] R: D[8,4]
2016-11-20 06:28:40,011 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-11-20 06:28:40,011 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0007]
2016-11-20 06:30:39,691 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 9% complete
2016-11-20 06:30:39,704 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0007]
2016-11-20 06:30:44,340 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2016-11-20 06:30:44,340 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0007]
2016-11-20 06:30:54,464 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2016-11-20 06:30:54,465 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0007]
2016-11-20 06:32:23,937 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete
2016-11-20 06:32:23,937 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0007]
2016-11-20 06:32:29,164 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1479576092520_0007]
2016-11-20 06:32:51,921 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:32:52,670 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:33:02,007 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:33:02,123 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:33:02,537 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:33:02,561 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:33:02,822 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-11-20 06:33:02,824 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.6.0   0.16.0  chb     2016-11-20 06:28:33     2016-11-20 06:33:02     GROUP_BY,FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTime      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime       Alias    Feature Outputs
job_1479576092520_0007  1       1       35      35      35      35      97      97      97      97      A,B,C,D GROUP_BY,COMBINER       hdfs://192.168.1.124:9000/user/chb/max_temp,

Input(s):
Successfully read 321146 records (3674048 bytes) from: "/input/in1/ncdc_data.txt"

Output(s):
Successfully stored 43 records (387 bytes) in: "hdfs://192.168.1.124:9000/user/chb/max_temp"

Counters:
Total records written : 43
Total bytes written : 387
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1479576092520_0007


2016-11-20 06:33:02,847 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:33:02,884 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:33:03,175 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:33:03,209 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:33:03,469 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at TEST/192.168.1.124:8032
2016-11-20 06:33:03,491 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-11-20 06:33:03,725 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
grunt> 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98

 查看结果:

grunt> cat max_temp
  • 1
1901:317
1902:261
1903:278
1904:194
1905:278
1906:283
1907:300
1908:322
1909:350
1910:322
1911:322
1912:411
1913:361
1914:378
1915:411
1916:289
1917:478
1918:450
1919:428
1920:344
1921:417
1922:400
1923:394
1924:456
1925:322
1926:411
1928:161
1929:178
1930:311
1931:450
1932:322
1933:411
1934:300
1935:311
1936:389
1937:339
1938:411
1939:433
1940:433
1941:462
1942:278
1949:367
1953:400
grunt> 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/433808
推荐阅读
相关标签
  

闽ICP备14008679号