当前位置:   article > 正文

【分布式数据库】使用HDFS底层文件进行HBase跨集群数据迁移_hdfs数据迁移至hbase

hdfs数据迁移至hbase

感谢点赞和关注 ,每天进步一点点!加油!

目录

一、概述

二、环境信息

三、HBCK2下载和编译

四、具体操作步骤

4.1 数据同步

4.2 添加元数据

4.3 重新分配region


一、概述


客户集群机房迁移,我们部署的集群也要完成跨集群迁移hbase 表,这里选择迁移Hadoop底层数据来实现hbase的表迁移。

迁移Hadoop底层文件的方式有两种:

  • distcp
  • 从旧集群get 获取文件到本地在 put到新集群上。

因为,我们这个A集群是kerberos 环境,B集群是不带kerberos环境,这里使用Distcp的方式。这里仅是为了迁移底层数据,所以选择那种方式都可以。


二、环境信息


HBase2.0.2


三、HBCK2下载和编译


GitHub 上下载

GitHub - apache/hbase-operator-tools: Apache HBase Operator Tools

编译

mvn clean install -DskipTests

如下图 我编译成功的 hbck2 jar包

命令行执行 可以看具体参数的用法

  1. sudo -u hbase hbase --config /etc/hbase/ hbck -j /home/jz/hbase-hbck2-1.3.0-SNAPSHOT.jar
  2. usage: HBCK2 [OPTIONS] COMMAND <ARGS>
  3. Options:
  4. -d,--debug run with debug output
  5. -h,--help output this help message
  6. -p,--hbase.zookeeper.property.clientPort <arg> port of hbase ensemble
  7. -q,--hbase.zookeeper.quorum <arg> hbase ensemble
  8. -s,--skip skip hbase version check
  9. (PleaseHoldException)
  10. -v,--version this hbck2 version
  11. -z,--zookeeper.znode.parent <arg> parent znode of hbase
  12. ensemble
  13. Command:
  14. addFsRegionsMissingInMeta [<NAMESPACE|NAMESPACE:TABLENAME>...|-i
  15. <INPUTFILES>...]
  16. Options:
  17. -i,--inputFiles take one or more files of namespace or table names
  18. To be used when regions missing from hbase:meta but directories
  19. are present still in HDFS. Can happen if user has run _hbck1_

四、具体操作步骤


两个步骤:
(1)拷贝底层表hdfs数据
(2)使用hbck工具恢复表数据

4.1 数据同步

  1. hadoop distcp -Dipc.client.fallback-to-simple-auth-allowed=true -Dmapreduce.map.memory.mb=1024 -D mapred.map.max.attempts=3 -m 3 -numListstatusThreads 3 \
  2. hdfs://10.82.28.171:8020/apps/hbase/data/data/default/trafficLhbDevInOutData_2020 \
  3. hdfs://10.82.50.191:8020/apps/hbase/data/data/default

修改数据表目录权限

sudo -u hdfs hdfs dfs -chown -R hbase:hdfs   /apps/hbase/data/data/default/trafficLhbDevInOutData_2020

4.2 添加元数据

  1. hbase --config /usr/hdp/3.1.4.0-315/hbase/ hbck -j /home/jz/hbase-hbck2-1.3.0-SNAPSHOT.jar -z /hbase-unsecure addFsRegionsMissingInMeta default:trafficLhbDevInOutData_2020
  2. ## 执行
  3. [xxxprd@qcs-client ~]$ hbase --config /usr/hdp/3.1.4.0-315/hbase/conf/ hbck -j /home/jz/hbase-hbck2-1.3.0-SNAPSHOT.jar -z /hbase-unsecure addFsRegionsMissingInMeta default:trafficLhbDevInOutData_2020
  4. SLF4J: Class path contains multiple SLF4J bindings.
  5. SLF4J: Found binding in [jar:file:/home/jzhprd/hbase-hbck2-1.3.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/phoenix/phoenix-5.0.0.3.1.4.0-315-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  7. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  8. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  9. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  10. 18:16:30.311 [main] INFO org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Connect 0x74f7d1d2 to qcs-namenode:2181,qcs-client:2181,qcs-snamenode:2181 with session timeout=90000ms, retries 6, retry interval 1000ms, keepAlive=60000ms
  11. 18:16:31.284 [main] INFO org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Close zookeeper connection 0x74f7d1d2 to qcs-namenode:2181,qcs-client:2181,qcs-snamenode:2181
  12. Regions re-added into Meta: 6
  13. WARNING:
  14. 6 regions were added to META, but these are not yet on Masters cache.
  15. You need to restart Masters, then run hbck2 'assigns' command below:
  16. assigns 034307e7c4ca3f0f65c806077d4bb123 3302faa35da3877220c1b49589208b89 3fcd8d4755d9ddd2d1a7af7e309dd049 4c306f12b329a78928e1439b48e257f7 56213be6765d63585ed4f3c1c9d48f9f 887bbf63e7f905bc84ba1bcd4fd8681c
  17. [xxxprd@qcs-client ~]$

You need to restart Masters, then run hbck2 'assigns' command below(重启Master)


从HMaster -web 可以看到表的 2 个Region 的状态为 Other Regions

4.3 重新分配region

hbase  --config /usr/hdp/3.1.4.0-315/hbase/   hbck -j /home/jz/hbase-hbck2-1.3.0-SNAPSHOT.jar -z  /hbase-unsecure  assigns  063844973716717e1ddf6a705dd02907 afad0ca9c4587a3a26a778f2b79929d4

执行 返回ID, 如果返回 -1 则表示分配不成功

  1. [jz@qcs-client ~]$ hbase --config /usr/hdp/3.1.4.0-315/hbase/ hbck -j /home/jz/hbase-hbck2-1.3.0-SNAPSHOT.jar -z /hbase-unsecure assigns 063844973716717e1ddf6a705dd02907 afad0ca9c4587a3a26a778f2b79929d4
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/home/jzhprd/hbase-hbck2-1.3.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/phoenix/phoenix-5.0.0.3.1.4.0-315-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  8. 17:58:15.614 [main] INFO org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Connect 0x3fc79729 to localhost:2181 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
  9. [114, 115]
  10. 17:58:16.450 [main] INFO org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Close zookeeper connection 0x3fc79729 to localhost:2181
  11. [jz@qcs-client ~]$

执行完上面的 命令后可以看到 2 个Region 的状态由 Other Regions 变为 Online Regions

trafficLhbDevInOutData_2020 表迁移前后的Region 个数一样

查询数据

如果是带kerberos 认证的集群我们需要在jar包中 加入 集群的 xml 配置文件即可。

参考:

Hbase跨集群迁移_hbase跨集群数据迁移_喧嚣已默,往事非昨的博客-CSDN博客


感谢点赞和关注 !

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/凡人多烦事01/article/detail/638549
推荐阅读
相关标签
  

闽ICP备14008679号