命令: hbase org.apache.hadoop.hbase.mapreduce.Export [-D <property=value>]* <tablename> <outputdir>
Note: -D properties will be applied to the conf used.
For example:
-D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapreduce.output.fileoutputformat.compress.type=BLOCK
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<familyName>
-D hbase.mapreduce.include.deleted.rows=true
-D hbase.mapreduce.scan.row.start=<ROWSTART>
-D hbase.mapreduce.scan.row.stop=<ROWSTOP>
For performance consider the following properties:
-Dhbase.client.scanner.caching=100
-Dmapreduce.map.speculative=false
-Dmapreduce.reduce.speculative=false
For tables with very wide rows consider setting the batch size as below:
-Dhbase.export.scanner.batch=10
示例: hbase org.apache.hadoop.hbase.mapreduce.Export LIVE_USER_NUM hdfs://sm61/hbase/bk/LIVE_USER_NUM
遇到如下问题:
[root@mt1 conf]# hbase org.apache.hadoop.hbase.mapreduce.Export LIVE_USER_NUM hdfs://sm61/hbase/bk/LIVE_USER_NUM
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/software/hbase-1.0.1.1/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/software/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2017-06-28 16:25:18,846 INFO [main] mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false
2017-06-28 16:25:19,348 INFO [main] client.RMProxy: Connecting to ResourceManager at mt1.hbase.starv.com/192.168.68.130:8032
2017-06-28 16:25:20,513 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:21,514 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:22,515 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:23,515 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:24,516 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:25,517 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:26,518 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:27,519 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:28,520 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-28 16:25:29,521 INFO [main] ipc.Client: Retrying connect to server: mt1.hbase.starv.com/192.168.68.130:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
查阅资料 研究了一下:8032端口是Hadoop yearn 的端口,故启动hadoop yearn 。
这里 将hadoop的 端口 整理了一下 :
端口 作用
9000 fs.defaultFS,如:hdfs://172.25.40.171:9000
9001 dfs.namenode.rpc-address,
DataNode会连接这个端口
50070 dfs.namenode.http-address
50470 dfs.namenode.https-address
50100 dfs.namenode.backup.address
50105 dfs.namenode.backup.http-address
50090 dfs.namenode.secondary.http-address,如:172.25.39.166:50090
50091 dfs.namenode.secondary.https-address,如:172.25.39.166:50091
50020 dfs.datanode.ipc.address
50075 dfs.datanode.http.address
50475 dfs.datanode.https.address
50010 dfs.datanode.address,
DataNode的数据传输端口
8480 dfs.journalnode.rpc-address
8481 dfs.journalnode.https-address
8032 yarn.resourcemanager.address
重新执行一下命令,还是报错:
2017-06-28 16:51:53,898 INFO [main] mapreduce.Job: map 0% reduce 0%
2017-06-28 16:51:56,967 INFO [main] mapreduce.Job: Task Id : attempt_1498638964556_0004_m_000000_0, Status : FAILED
Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): file /hbase/bk/LIVE_USER_NUM1/_temporary/1/_temporary/attempt_1498638964556_0004_m_000000_0/part-m-00000 on client 192.168.68.131.
Requested replication 2 is less than the required minimum 3
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2373)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy12.create(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1725)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1668)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1593)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1072)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:271)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:528)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:644)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
因为sm61是其他集群的hdfs,换成hbase的hdfs,是可以成功导出的
hbase org.apache.hadoop.hbase.mapreduce.Export LIVE_USER_NUM hdfs://mt1.hbase.starv.com:9000/hbase/bk/LIVE_USER_NUM
注意这里的地址一定是和Hadoop core_site.xml里面配置的fs.defaultFS 一致,否则不能成功导出。
那么是否可以知道导出到另外一台hdfs呢,接下来再研究:
。。。。。。。。。。。。。