当前位置:   article > 正文

Datax,hbase与mysql数据相互同步

Datax,hbase与mysql数据相互同步

参考文章:datax mysql 和hbase的 相互导入

目录

0、软件版本说明

1、hbase数据同步至mysql

1.1、hbase数据

1.2、mysql数据

1.3、json脚本(hbase2mysql.json)

1.4、同步成功日志

2、mysql数据同步至hbase

1.1、hbase数据

1.2、mysql数据

1.3、json脚本(mysql2hbase.json)

1.4、同步成功日志

3、总结


0、软件版本说明

  1. hbase版本:2.0.5
  2. mysql版本:8.0.34

1、hbase数据同步至mysql

1.1、hbase数据

  1. hbase(main):018:0> scan "bigdata:student"
  2. ROW COLUMN+CELL
  3. 1001 column=info:age, timestamp=1712562704820, value=18
  4. 1001 column=info:name, timestamp=1712562696088, value=lisi
  5. 1002 column=info:age, timestamp=1712566667737, value=222
  6. 1002 column=info:name, timestamp=1712566689576, value=\xE5\xAE\x8B\xE5\xA3\xB9
  7. 2 row(s)
  8. Took 0.0805 seconds
  9. hbase(main):019:0>
  1. [atguigu@node001 hbase]$ cd hbase-2.0.5/
  2. [atguigu@node001 hbase-2.0.5]$ bin/hbase shell
  3. For more on the HBase Shell, see http://hbase.apache.org/book.html
  4. hbase(main):002:0> create_namespace 'bigdata'
  5. Took 3.4979 seconds
  6. hbase(main):003:0> list_namespace
  7. NAMESPACE
  8. EDU_REALTIME
  9. SYSTEM
  10. bigdata
  11. default
  12. hbase
  13. 5 row(s)
  14. Took 0.1244 seconds
  15. hbase(main):004:0> create_namespace 'bigdata2'
  16. Took 0.5109 seconds
  17. hbase(main):005:0> list_namespace
  18. NAMESPACE
  19. EDU_REALTIME
  20. SYSTEM
  21. bigdata
  22. bigdata2
  23. default
  24. hbase
  25. 6 row(s)
  26. Took 0.0450 seconds
  27. hbase(main):006:0> create 'bigdata:student', {NAME => 'info', VERSIONS =>5}, {NAME => 'msg'}
  28. Created table bigdata:student
  29. Took 4.7854 seconds
  30. => Hbase::Table - bigdata:student
  31. hbase(main):007:0> create 'bigdata2:student', {NAME => 'info', VERSIONS =>5}, {NAME => 'msg'}
  32. Created table bigdata2:student
  33. Took 2.4732 seconds
  34. => Hbase::Table - bigdata2:student
  35. hbase(main):008:0> list
  36. TABLE
  37. EDU_REALTIME:DIM_BASE_CATEGORY_INFO
  38. EDU_REALTIME:DIM_BASE_PROVINCE
  39. EDU_REALTIME:DIM_BASE_SOURCE
  40. EDU_REALTIME:DIM_BASE_SUBJECT_INFO
  41. EDU_REALTIME:DIM_CHAPTER_INFO
  42. EDU_REALTIME:DIM_COURSE_INFO
  43. EDU_REALTIME:DIM_KNOWLEDGE_POINT
  44. EDU_REALTIME:DIM_TEST_PAPER
  45. EDU_REALTIME:DIM_TEST_PAPER_QUESTION
  46. EDU_REALTIME:DIM_TEST_POINT_QUESTION
  47. EDU_REALTIME:DIM_TEST_QUESTION_INFO
  48. EDU_REALTIME:DIM_TEST_QUESTION_OPTION
  49. EDU_REALTIME:DIM_USER_INFO
  50. EDU_REALTIME:DIM_VIDEO_INFO
  51. SYSTEM:CATALOG
  52. SYSTEM:FUNCTION
  53. SYSTEM:LOG
  54. SYSTEM:MUTEX
  55. SYSTEM:SEQUENCE
  56. SYSTEM:STATS
  57. bigdata2:student
  58. bigdata:student
  59. 22 row(s)
  60. Took 0.0711 seconds
  61. => ["EDU_REALTIME:DIM_BASE_CATEGORY_INFO", "EDU_REALTIME:DIM_BASE_PROVINCE", "EDU_REALTIME:DIM_BASE_SOURCE", "EDU_REALTIME:DIM_BASE_SUBJECT_INFO", "EDU_REALTIME:DIM_CHAPTER_INFO", "EDU_REALTIME:DIM_COURSE_INFO", "EDU_REALTIME:DIM_KNOWLEDGE_POINT", "EDU_REALTIME:DIM_TEST_PAPER", "EDU_REALTIME:DIM_TEST_PAPER_QUESTION", "EDU_REALTIME:DIM_TEST_POINT_QUESTION", "EDU_REALTIME:DIM_TEST_QUESTION_INFO", "EDU_REALTIME:DIM_TEST_QUESTION_OPTION", "EDU_REALTIME:DIM_USER_INFO", "EDU_REALTIME:DIM_VIDEO_INFO", "SYSTEM:CATALOG", "SYSTEM:FUNCTION", "SYSTEM:LOG", "SYSTEM:MUTEX", "SYSTEM:SEQUENCE", "SYSTEM:STATS", "bigdata2:student", "bigdata:student"]
  62. hbase(main):009:0> put 'bigdata:student','1001','info:name','zhangsan'
  63. Took 0.8415 seconds
  64. hbase(main):010:0> put 'bigdata:student','1001','info:name','lisi'
  65. Took 0.0330 seconds
  66. hbase(main):011:0> put 'bigdata:student','1001','info:age','18'
  67. Took 0.0201 seconds
  68. hbase(main):012:0> get 'bigdata:student','1001'
  69. COLUMN CELL
  70. info:age timestamp=1712562704820, value=18
  71. info:name timestamp=1712562696088, value=lisi
  72. 1 row(s)
  73. Took 0.4235 seconds
  74. hbase(main):013:0> scan student
  75. NameError: undefined local variable or method `student' for main:Object
  76. hbase(main):014:0> scan bigdata:student
  77. NameError: undefined local variable or method `student' for main:Object
  78. hbase(main):015:0> scan "bigdata:student"
  79. ROW COLUMN+CELL
  80. 1001 column=info:age, timestamp=1712562704820, value=18
  81. 1001 column=info:name, timestamp=1712562696088, value=lisi
  82. 1 row(s)
  83. Took 0.9035 seconds
  84. hbase(main):016:0> put 'bigdata:student','1002','info:age','222'
  85. Took 0.0816 seconds
  86. hbase(main):017:0> put 'bigdata:student','1002','info:name','宋壹'
  87. Took 0.0462 seconds
  88. hbase(main):018:0> scan "bigdata:student"
  89. ROW COLUMN+CELL
  90. 1001 column=info:age, timestamp=1712562704820, value=18
  91. 1001 column=info:name, timestamp=1712562696088, value=lisi
  92. 1002 column=info:age, timestamp=1712566667737, value=222
  93. 1002 column=info:name, timestamp=1712566689576, value=\xE5\xAE\x8B\xE5\xA3\xB9
  94. 2 row(s)
  95. Took 0.0805 seconds
  96. hbase(main):019:0>

1.2、mysql数据

  1. SELECT VERSION(); -- 查看mysql版本
  2. /*
  3. Navicat Premium Data Transfer
  4. Source Server : 大数据-node001
  5. Source Server Type : MySQL
  6. Source Server Version : 80034 (8.0.34)
  7. Source Host : node001:3306
  8. Source Schema : test
  9. Target Server Type : MySQL
  10. Target Server Version : 80034 (8.0.34)
  11. File Encoding : 65001
  12. Date: 08/04/2024 17:11:56
  13. */
  14. SET NAMES utf8mb4;
  15. SET FOREIGN_KEY_CHECKS = 0;
  16. -- ----------------------------
  17. -- Table structure for student
  18. -- ----------------------------
  19. DROP TABLE IF EXISTS `student`;
  20. CREATE TABLE `student` (
  21. `info` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  22. `msg` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL
  23. ) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
  24. -- ----------------------------
  25. -- Records of student
  26. -- ----------------------------
  27. INSERT INTO `student` VALUES ('111', '111111');
  28. INSERT INTO `student` VALUES ('222', '222222');
  29. INSERT INTO `student` VALUES ('18', 'lisi');
  30. INSERT INTO `student` VALUES ('222', '宋壹');
  31. SET FOREIGN_KEY_CHECKS = 1;

1.3、json脚本(hbase2mysql.json)

  1. {
  2. "job": {
  3. "content": [
  4. {
  5. "reader": {
  6. "name": "hbase11xreader",
  7. "parameter": {
  8. "hbaseConfig": {
  9. "hbase.zookeeper.quorum": "node001:2181"
  10. },
  11. "table": "bigdata:student",
  12. "encoding": "utf-8",
  13. "mode": "normal",
  14. "column": [
  15. {
  16. "name": "info:age",
  17. "type": "string"
  18. },
  19. {
  20. "name": "info:name",
  21. "type": "string"
  22. }
  23. ],
  24. "range": {
  25. "startRowkey": "",
  26. "endRowkey": "",
  27. "isBinaryRowkey": true
  28. }
  29. }
  30. },
  31. "writer": {
  32. "name": "mysqlwriter",
  33. "parameter": {
  34. "column": [
  35. "info",
  36. "msg"
  37. ],
  38. "connection": [
  39. {
  40. "jdbcUrl": "jdbc:mysql://node001:3306/test",
  41. "table": [
  42. "student"
  43. ]
  44. }
  45. ],
  46. "username": "root",
  47. "password": "123456",
  48. "preSql": [],
  49. "session": [],
  50. "writeMode": "insert"
  51. }
  52. }
  53. }
  54. ],
  55. "setting": {
  56. "speed": {
  57. "channel": "1"
  58. }
  59. }
  60. }
  61. }

1.4、同步成功日志

  1. [atguigu@node001 datax]$ python bin/datax.py job/hbase/hbase2mysql.json
  2. DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
  3. Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
  4. 2024-04-08 17:02:00.785 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
  5. 2024-04-08 17:02:00.804 [main] INFO Engine - the machine info =>
  6. osInfo: Red Hat, Inc. 1.8 25.372-b07
  7. jvmInfo: Linux amd64 3.10.0-862.el7.x86_64
  8. cpu num: 4
  9. totalPhysicalMemory: -0.00G
  10. freePhysicalMemory: -0.00G
  11. maxFileDescriptorCount: -1
  12. currentOpenFileDescriptorCount: -1
  13. GC Names [PS MarkSweep, PS Scavenge]
  14. MEMORY_NAME | allocation_size | init_size
  15. PS Eden Space | 256.00MB | 256.00MB
  16. Code Cache | 240.00MB | 2.44MB
  17. Compressed Class Space | 1,024.00MB | 0.00MB
  18. PS Survivor Space | 42.50MB | 42.50MB
  19. PS Old Gen | 683.00MB | 683.00MB
  20. Metaspace | -0.00MB | 0.00MB
  21. 2024-04-08 17:02:00.840 [main] INFO Engine -
  22. {
  23. "content":[
  24. {
  25. "reader":{
  26. "name":"hbase11xreader",
  27. "parameter":{
  28. "column":[
  29. {
  30. "name":"info:age",
  31. "type":"string"
  32. },
  33. {
  34. "name":"info:name",
  35. "type":"string"
  36. }
  37. ],
  38. "encoding":"utf-8",
  39. "hbaseConfig":{
  40. "hbase.zookeeper.quorum":"node001:2181"
  41. },
  42. "mode":"normal",
  43. "range":{
  44. "endRowkey":"",
  45. "isBinaryRowkey":true,
  46. "startRowkey":""
  47. },
  48. "table":"bigdata:student"
  49. }
  50. },
  51. "writer":{
  52. "name":"mysqlwriter",
  53. "parameter":{
  54. "column":[
  55. "info",
  56. "msg"
  57. ],
  58. "connection":[
  59. {
  60. "jdbcUrl":"jdbc:mysql://node001:3306/test",
  61. "table":[
  62. "student"
  63. ]
  64. }
  65. ],
  66. "password":"******",
  67. "preSql":[],
  68. "session":[],
  69. "username":"root",
  70. "writeMode":"insert"
  71. }
  72. }
  73. }
  74. ],
  75. "setting":{
  76. "speed":{
  77. "channel":"1"
  78. }
  79. }
  80. }
  81. 2024-04-08 17:02:00.875 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
  82. 2024-04-08 17:02:00.881 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
  83. 2024-04-08 17:02:00.881 [main] INFO JobContainer - DataX jobContainer starts job.
  84. 2024-04-08 17:02:00.885 [main] INFO JobContainer - Set jobId = 0
  85. 2024-04-08 17:02:03.040 [job-0] INFO OriginalConfPretreatmentUtil - table:[student] all columns:[
  86. info,msg
  87. ].
  88. 2024-04-08 17:02:03.098 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
  89. insert INTO %s (info,msg) VALUES(?,?)
  90. ], which jdbcUrl like:[jdbc:mysql://node001:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
  91. 2024-04-08 17:02:03.099 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
  92. 2024-04-08 17:02:03.099 [job-0] INFO JobContainer - DataX Reader.Job [hbase11xreader] do prepare work .
  93. 2024-04-08 17:02:03.099 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
  94. 2024-04-08 17:02:03.100 [job-0] INFO JobContainer - jobContainer starts to do split ...
  95. 2024-04-08 17:02:03.100 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
  96. 四月 08, 2024 5:02:03 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
  97. 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  98. 四月 08, 2024 5:02:03 下午 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper <init>
  99. 信息: Process identifier=hconnection-0x50313382 connecting to ZooKeeper ensemble=node001:2181
  100. 2024-04-08 17:02:03.982 [job-0] INFO ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
  101. 2024-04-08 17:02:03.983 [job-0] INFO ZooKeeper - Client environment:host.name=node001
  102. 2024-04-08 17:02:03.983 [job-0] INFO ZooKeeper - Client environment:java.version=1.8.0_372
  103. 2024-04-08 17:02:03.983 [job-0] INFO ZooKeeper - Client environment:java.vendor=Red Hat, Inc.
  104. 2024-04-08 17:02:03.983 [job-0] INFO ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.372.b07-1.el7_9.x86_64/jre
  105. 2024-04-08 17:02:03.983 [job-0] INFO ZooKeeper - Client environment:java.class.path=/opt/module/datax/lib/commons-io-2.4.jar:/opt/module/datax/lib/groovy-all-2.1.9.jar:/opt/module/datax/lib/datax-core-0.0.1-SNAPSHOT.jar:/opt/module/datax/lib/fluent-hc-4.4.jar:/opt/module/datax/lib/commons-beanutils-1.9.2.jar:/opt/module/datax/lib/commons-codec-1.9.jar:/opt/module/datax/lib/httpclient-4.4.jar:/opt/module/datax/lib/commons-cli-1.2.jar:/opt/module/datax/lib/commons-lang-2.6.jar:/opt/module/datax/lib/logback-core-1.0.13.jar:/opt/module/datax/lib/hamcrest-core-1.3.jar:/opt/module/datax/lib/fastjson-1.1.46.sec01.jar:/opt/module/datax/lib/commons-lang3-3.3.2.jar:/opt/module/datax/lib/commons-logging-1.1.1.jar:/opt/module/datax/lib/janino-2.5.16.jar:/opt/module/datax/lib/commons-configuration-1.10.jar:/opt/module/datax/lib/slf4j-api-1.7.10.jar:/opt/module/datax/lib/datax-common-0.0.1-SNAPSHOT.jar:/opt/module/datax/lib/datax-transformer-0.0.1-SNAPSHOT.jar:/opt/module/datax/lib/logback-classic-1.0.13.jar:/opt/module/datax/lib/httpcore-4.4.jar:/opt/module/datax/lib/commons-collections-3.2.1.jar:/opt/module/datax/lib/commons-math3-3.1.1.jar:.
  106. 2024-04-08 17:02:03.984 [job-0] INFO ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
  107. 2024-04-08 17:02:03.984 [job-0] INFO ZooKeeper - Client environment:java.io.tmpdir=/tmp
  108. 2024-04-08 17:02:03.984 [job-0] INFO ZooKeeper - Client environment:java.compiler=<NA>
  109. 2024-04-08 17:02:03.984 [job-0] INFO ZooKeeper - Client environment:os.name=Linux
  110. 2024-04-08 17:02:03.984 [job-0] INFO ZooKeeper - Client environment:os.arch=amd64
  111. 2024-04-08 17:02:03.984 [job-0] INFO ZooKeeper - Client environment:os.version=3.10.0-862.el7.x86_64
  112. 2024-04-08 17:02:03.987 [job-0] INFO ZooKeeper - Client environment:user.name=atguigu
  113. 2024-04-08 17:02:03.988 [job-0] INFO ZooKeeper - Client environment:user.home=/home/atguigu
  114. 2024-04-08 17:02:03.988 [job-0] INFO ZooKeeper - Client environment:user.dir=/opt/module/datax
  115. 2024-04-08 17:02:03.990 [job-0] INFO ZooKeeper - Initiating client connection, connectString=node001:2181 sessionTimeout=90000 watcher=hconnection-0x503133820x0, quorum=node001:2181, baseZNode=/hbase
  116. 2024-04-08 17:02:04.069 [job-0-SendThread(node001:2181)] INFO ClientCnxn - Opening socket connection to server node001/192.168.10.101:2181. Will not attempt to authenticate using SASL (unknown error)
  117. 2024-04-08 17:02:04.092 [job-0-SendThread(node001:2181)] INFO ClientCnxn - Socket connection established to node001/192.168.10.101:2181, initiating session
  118. 2024-04-08 17:02:04.139 [job-0-SendThread(node001:2181)] INFO ClientCnxn - Session establishment complete on server node001/192.168.10.101:2181, sessionid = 0x200000707b70025, negotiated timeout = 40000
  119. 2024-04-08 17:02:06.334 [job-0] INFO Hbase11xHelper - HBaseReader split job into 1 tasks.
  120. 2024-04-08 17:02:06.335 [job-0] INFO JobContainer - DataX Reader.Job [hbase11xreader] splits to [1] tasks.
  121. 2024-04-08 17:02:06.336 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
  122. 2024-04-08 17:02:06.366 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
  123. 2024-04-08 17:02:06.394 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
  124. 2024-04-08 17:02:06.402 [job-0] INFO JobContainer - Running by standalone Mode.
  125. 2024-04-08 17:02:06.426 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
  126. 2024-04-08 17:02:06.457 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
  127. 2024-04-08 17:02:06.458 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
  128. 2024-04-08 17:02:06.529 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
  129. 四月 08, 2024 5:02:06 下午 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper <init>
  130. 信息: Process identifier=hconnection-0x3b2eec42 connecting to ZooKeeper ensemble=node001:2181
  131. 2024-04-08 17:02:06.680 [0-0-0-reader] INFO ZooKeeper - Initiating client connection, connectString=node001:2181 sessionTimeout=90000 watcher=hconnection-0x3b2eec420x0, quorum=node001:2181, baseZNode=/hbase
  132. 2024-04-08 17:02:06.740 [0-0-0-reader-SendThread(node001:2181)] INFO ClientCnxn - Opening socket connection to server node001/192.168.10.101:2181. Will not attempt to authenticate using SASL (unknown error)
  133. 2024-04-08 17:02:06.773 [0-0-0-reader-SendThread(node001:2181)] INFO ClientCnxn - Socket connection established to node001/192.168.10.101:2181, initiating session
  134. 2024-04-08 17:02:06.808 [0-0-0-reader-SendThread(node001:2181)] INFO ClientCnxn - Session establishment complete on server node001/192.168.10.101:2181, sessionid = 0x200000707b70026, negotiated timeout = 40000
  135. 2024-04-08 17:02:06.960 [0-0-0-reader] INFO HbaseAbstractTask - The task set startRowkey=[], endRowkey=[].
  136. 2024-04-08 17:02:07.262 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[738]ms
  137. 2024-04-08 17:02:07.263 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
  138. 2024-04-08 17:02:16.483 [job-0] INFO StandAloneJobContainerCommunicator - Total 2 records, 11 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.248s | Percentage 100.00%
  139. 2024-04-08 17:02:16.483 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
  140. 2024-04-08 17:02:16.484 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
  141. 2024-04-08 17:02:16.485 [job-0] INFO JobContainer - DataX Reader.Job [hbase11xreader] do post work.
  142. 2024-04-08 17:02:16.485 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
  143. 2024-04-08 17:02:16.487 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /opt/module/datax/hook
  144. 2024-04-08 17:02:16.491 [job-0] INFO JobContainer -
  145. [total cpu info] =>
  146. averageCpu | maxDeltaCpu | minDeltaCpu
  147. -1.00% | -1.00% | -1.00%
  148. [total gc info] =>
  149. NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
  150. PS MarkSweep | 1 | 1 | 1 | 0.136s | 0.136s | 0.136s
  151. PS Scavenge | 1 | 1 | 1 | 0.072s | 0.072s | 0.072s
  152. 2024-04-08 17:02:16.491 [job-0] INFO JobContainer - PerfTrace not enable!
  153. 2024-04-08 17:02:16.493 [job-0] INFO StandAloneJobContainerCommunicator - Total 2 records, 11 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.248s | Percentage 100.00%
  154. 2024-04-08 17:02:16.495 [job-0] INFO JobContainer -
  155. 任务启动时刻 : 2024-04-08 17:02:00
  156. 任务结束时刻 : 2024-04-08 17:02:16
  157. 任务总计耗时 : 15s
  158. 任务平均流量 : 1B/s
  159. 记录写入速度 : 0rec/s
  160. 读出记录总数 : 2
  161. 读写失败总数 : 0
  162. [atguigu@node001 datax]$

2、mysql数据同步至hbase

1.1、hbase数据

"bigdata2:student"一开始是空数据,后来使用datax执行同步任务后,可以看到:"bigdata2:student"新增了一些数据。

  1. hbase(main):019:0> scan "bigdata2:student"
  2. ROW COLUMN+CELL
  3. 0 row(s)
  4. Took 1.6445 seconds
  5. hbase(main):020:0> scan "bigdata2:student"
  6. ROW COLUMN+CELL
  7. 111111111 column=info:age, timestamp=123456789, value=111
  8. 111111111 column=info:name, timestamp=123456789, value=111111
  9. 18lisi column=info:age, timestamp=123456789, value=18
  10. 18lisi column=info:name, timestamp=123456789, value=lisi
  11. 222222222 column=info:age, timestamp=123456789, value=222
  12. 222222222 column=info:name, timestamp=123456789, value=222222
  13. 333\xE5\xAE\x8B\xE5\xA3\xB9 column=info:age, timestamp=123456789, value=333
  14. 333\xE5\xAE\x8B\xE5\xA3\xB9 column=info:name, timestamp=123456789, value=\xE5\xAE\x8B\xE5\xA3\xB9
  15. 4 row(s)
  16. Took 0.3075 seconds
  17. hbase(main):021:0>

1.2、mysql数据

  1. SELECT VERSION(); -- 查看mysql版本
  2. /*
  3. Navicat Premium Data Transfer
  4. Source Server : 大数据-node001
  5. Source Server Type : MySQL
  6. Source Server Version : 80034 (8.0.34)
  7. Source Host : node001:3306
  8. Source Schema : test
  9. Target Server Type : MySQL
  10. Target Server Version : 80034 (8.0.34)
  11. File Encoding : 65001
  12. Date: 08/04/2024 17:11:56
  13. */
  14. SET NAMES utf8mb4;
  15. SET FOREIGN_KEY_CHECKS = 0;
  16. -- ----------------------------
  17. -- Table structure for student
  18. -- ----------------------------
  19. DROP TABLE IF EXISTS `student`;
  20. CREATE TABLE `student` (
  21. `info` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  22. `msg` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL
  23. ) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
  24. -- ----------------------------
  25. -- Records of student
  26. -- ----------------------------
  27. INSERT INTO `student` VALUES ('111', '111111');
  28. INSERT INTO `student` VALUES ('222', '222222');
  29. INSERT INTO `student` VALUES ('18', 'lisi');
  30. INSERT INTO `student` VALUES ('222', '宋壹');
  31. SET FOREIGN_KEY_CHECKS = 1;

1.3、json脚本(mysql2hbase.json)

  1. {
  2. "job": {
  3. "setting": {
  4. "speed": {
  5. "channel": 1
  6. }
  7. },
  8. "content": [{
  9. "reader": {
  10. "name": "mysqlreader",
  11. "parameter": {
  12. "column": [
  13. "info",
  14. "msg"
  15. ],
  16. "connection": [{
  17. "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test"],
  18. "table": [
  19. "student"
  20. ]
  21. }
  22. ],
  23. "username": "root",
  24. "password": "123456",
  25. "where": ""
  26. }
  27. },
  28. "writer": {
  29. "name": "hbase11xwriter",
  30. "parameter": {
  31. "hbaseConfig": {
  32. "hbase.zookeeper.quorum": "node001:2181"
  33. },
  34. "table": "bigdata2:student",
  35. "mode": "normal",
  36. "rowkeyColumn": [{
  37. "index": 0,
  38. "type": "string"
  39. },
  40. {
  41. "index": 1,
  42. "type": "string",
  43. "value": "_"
  44. }
  45. ],
  46. "column": [
  47. {
  48. "index": 0,
  49. "name": "info:age",
  50. "type": "string"
  51. },
  52. {
  53. "index": 1,
  54. "name": "info:name",
  55. "type": "string"
  56. }
  57. ],
  58. "versionColumn": {
  59. "index": -1,
  60. "value": "123456789"
  61. },
  62. "encoding": "utf-8"
  63. }
  64. }
  65. }
  66. ]
  67. }
  68. }

1.4、同步成功日志

  1. [atguigu@node001 datax]$ python bin/datax.py job/hbase/mysql2hbase.json
  2. DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
  3. Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
  4. 2024-04-08 17:44:45.536 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
  5. 2024-04-08 17:44:45.552 [main] INFO Engine - the machine info =>
  6. osInfo: Red Hat, Inc. 1.8 25.372-b07
  7. jvmInfo: Linux amd64 3.10.0-862.el7.x86_64
  8. cpu num: 4
  9. totalPhysicalMemory: -0.00G
  10. freePhysicalMemory: -0.00G
  11. maxFileDescriptorCount: -1
  12. currentOpenFileDescriptorCount: -1
  13. GC Names [PS MarkSweep, PS Scavenge]
  14. MEMORY_NAME | allocation_size | init_size
  15. PS Eden Space | 256.00MB | 256.00MB
  16. Code Cache | 240.00MB | 2.44MB
  17. Compressed Class Space | 1,024.00MB | 0.00MB
  18. PS Survivor Space | 42.50MB | 42.50MB
  19. PS Old Gen | 683.00MB | 683.00MB
  20. Metaspace | -0.00MB | 0.00MB
  21. 2024-04-08 17:44:45.579 [main] INFO Engine -
  22. {
  23. "content":[
  24. {
  25. "reader":{
  26. "name":"mysqlreader",
  27. "parameter":{
  28. "column":[
  29. "info",
  30. "msg"
  31. ],
  32. "connection":[
  33. {
  34. "jdbcUrl":[
  35. "jdbc:mysql://127.0.0.1:3306/test"
  36. ],
  37. "table":[
  38. "student"
  39. ]
  40. }
  41. ],
  42. "password":"******",
  43. "username":"root",
  44. "where":""
  45. }
  46. },
  47. "writer":{
  48. "name":"hbase11xwriter",
  49. "parameter":{
  50. "column":[
  51. {
  52. "index":0,
  53. "name":"info:age",
  54. "type":"string"
  55. },
  56. {
  57. "index":1,
  58. "name":"info:name",
  59. "type":"string"
  60. }
  61. ],
  62. "encoding":"utf-8",
  63. "hbaseConfig":{
  64. "hbase.zookeeper.quorum":"node001:2181"
  65. },
  66. "mode":"normal",
  67. "rowkeyColumn":[
  68. {
  69. "index":0,
  70. "type":"string"
  71. },
  72. {
  73. "index":1,
  74. "type":"string",
  75. "value":"_"
  76. }
  77. ],
  78. "table":"bigdata2:student",
  79. "versionColumn":{
  80. "index":-1,
  81. "value":"123456789"
  82. }
  83. }
  84. }
  85. }
  86. ],
  87. "setting":{
  88. "speed":{
  89. "channel":1
  90. }
  91. }
  92. }
  93. 2024-04-08 17:44:45.615 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
  94. 2024-04-08 17:44:45.618 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
  95. 2024-04-08 17:44:45.619 [main] INFO JobContainer - DataX jobContainer starts job.
  96. 2024-04-08 17:44:45.622 [main] INFO JobContainer - Set jobId = 0
  97. Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
  98. 2024-04-08 17:44:47.358 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
  99. 2024-04-08 17:44:47.734 [job-0] INFO OriginalConfPretreatmentUtil - table:[student] has columns:[info,msg].
  100. 2024-04-08 17:44:47.761 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
  101. 2024-04-08 17:44:47.762 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
  102. 2024-04-08 17:44:47.763 [job-0] INFO JobContainer - DataX Writer.Job [hbase11xwriter] do prepare work .
  103. 2024-04-08 17:44:47.764 [job-0] INFO JobContainer - jobContainer starts to do split ...
  104. 2024-04-08 17:44:47.764 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
  105. 2024-04-08 17:44:47.773 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
  106. 2024-04-08 17:44:47.774 [job-0] INFO JobContainer - DataX Writer.Job [hbase11xwriter] splits to [1] tasks.
  107. 2024-04-08 17:44:47.815 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
  108. 2024-04-08 17:44:47.821 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
  109. 2024-04-08 17:44:47.825 [job-0] INFO JobContainer - Running by standalone Mode.
  110. 2024-04-08 17:44:47.839 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
  111. 2024-04-08 17:44:47.846 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
  112. 2024-04-08 17:44:47.846 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
  113. 2024-04-08 17:44:47.870 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
  114. 2024-04-08 17:44:47.876 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select info,msg from student
  115. ] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
  116. 2024-04-08 17:44:48.010 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select info,msg from student
  117. ] jdbcUrl:[jdbc:mysql://127.0.0.1:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
  118. 四月 08, 2024 5:44:49 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
  119. 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  120. 四月 08, 2024 5:44:50 下午 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper <init>
  121. 信息: Process identifier=hconnection-0x26654712 connecting to ZooKeeper ensemble=node001:2181
  122. 2024-04-08 17:44:50.143 [0-0-0-writer] INFO ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
  123. 2024-04-08 17:44:50.143 [0-0-0-writer] INFO ZooKeeper - Client environment:host.name=node001
  124. 2024-04-08 17:44:50.143 [0-0-0-writer] INFO ZooKeeper - Client environment:java.version=1.8.0_372
  125. 2024-04-08 17:44:50.143 [0-0-0-writer] INFO ZooKeeper - Client environment:java.vendor=Red Hat, Inc.
  126. 2024-04-08 17:44:50.143 [0-0-0-writer] INFO ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.372.b07-1.el7_9.x86_64/jre
  127. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:java.class.path=/opt/module/datax/lib/commons-io-2.4.jar:/opt/module/datax/lib/groovy-all-2.1.9.jar:/opt/module/datax/lib/datax-core-0.0.1-SNAPSHOT.jar:/opt/module/datax/lib/fluent-hc-4.4.jar:/opt/module/datax/lib/commons-beanutils-1.9.2.jar:/opt/module/datax/lib/commons-codec-1.9.jar:/opt/module/datax/lib/httpclient-4.4.jar:/opt/module/datax/lib/commons-cli-1.2.jar:/opt/module/datax/lib/commons-lang-2.6.jar:/opt/module/datax/lib/logback-core-1.0.13.jar:/opt/module/datax/lib/hamcrest-core-1.3.jar:/opt/module/datax/lib/fastjson-1.1.46.sec01.jar:/opt/module/datax/lib/commons-lang3-3.3.2.jar:/opt/module/datax/lib/commons-logging-1.1.1.jar:/opt/module/datax/lib/janino-2.5.16.jar:/opt/module/datax/lib/commons-configuration-1.10.jar:/opt/module/datax/lib/slf4j-api-1.7.10.jar:/opt/module/datax/lib/datax-common-0.0.1-SNAPSHOT.jar:/opt/module/datax/lib/datax-transformer-0.0.1-SNAPSHOT.jar:/opt/module/datax/lib/logback-classic-1.0.13.jar:/opt/module/datax/lib/httpcore-4.4.jar:/opt/module/datax/lib/commons-collections-3.2.1.jar:/opt/module/datax/lib/commons-math3-3.1.1.jar:.
  128. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
  129. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:java.io.tmpdir=/tmp
  130. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:java.compiler=<NA>
  131. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:os.name=Linux
  132. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:os.arch=amd64
  133. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:os.version=3.10.0-862.el7.x86_64
  134. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:user.name=atguigu
  135. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:user.home=/home/atguigu
  136. 2024-04-08 17:44:50.144 [0-0-0-writer] INFO ZooKeeper - Client environment:user.dir=/opt/module/datax
  137. 2024-04-08 17:44:50.145 [0-0-0-writer] INFO ZooKeeper - Initiating client connection, connectString=node001:2181 sessionTimeout=90000 watcher=hconnection-0x266547120x0, quorum=node001:2181, baseZNode=/hbase
  138. 2024-04-08 17:44:50.256 [0-0-0-writer-SendThread(node001:2181)] INFO ClientCnxn - Opening socket connection to server node001/192.168.10.101:2181. Will not attempt to authenticate using SASL (unknown error)
  139. 2024-04-08 17:44:50.381 [0-0-0-writer-SendThread(node001:2181)] INFO ClientCnxn - Socket connection established to node001/192.168.10.101:2181, initiating session
  140. 2024-04-08 17:44:50.427 [0-0-0-writer-SendThread(node001:2181)] INFO ClientCnxn - Session establishment complete on server node001/192.168.10.101:2181, sessionid = 0x200000707b70028, negotiated timeout = 40000
  141. 2024-04-08 17:44:53.794 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[5930]ms
  142. 2024-04-08 17:44:53.795 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
  143. 2024-04-08 17:44:57.857 [job-0] INFO StandAloneJobContainerCommunicator - Total 4 records, 29 bytes | Speed 2B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
  144. 2024-04-08 17:44:57.858 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
  145. 2024-04-08 17:44:57.858 [job-0] INFO JobContainer - DataX Writer.Job [hbase11xwriter] do post work.
  146. 2024-04-08 17:44:57.859 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
  147. 2024-04-08 17:44:57.859 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
  148. 2024-04-08 17:44:57.862 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /opt/module/datax/hook
  149. 2024-04-08 17:44:57.866 [job-0] INFO JobContainer -
  150. [total cpu info] =>
  151. averageCpu | maxDeltaCpu | minDeltaCpu
  152. -1.00% | -1.00% | -1.00%
  153. [total gc info] =>
  154. NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
  155. PS MarkSweep | 1 | 1 | 1 | 0.120s | 0.120s | 0.120s
  156. PS Scavenge | 1 | 1 | 1 | 0.095s | 0.095s | 0.095s
  157. 2024-04-08 17:44:57.867 [job-0] INFO JobContainer - PerfTrace not enable!
  158. 2024-04-08 17:44:57.868 [job-0] INFO StandAloneJobContainerCommunicator - Total 4 records, 29 bytes | Speed 2B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
  159. 2024-04-08 17:44:57.876 [job-0] INFO JobContainer -
  160. 任务启动时刻 : 2024-04-08 17:44:45
  161. 任务结束时刻 : 2024-04-08 17:44:57
  162. 任务总计耗时 : 12s
  163. 任务平均流量 : 2B/s
  164. 记录写入速度 : 0rec/s
  165. 读出记录总数 : 4
  166. 读写失败总数 : 0
  167. [atguigu@node001 datax]$

3、总结

搞了一下午,ヾ(◍°∇°◍)ノ゙加油~

参考文章“datax,mysql和hbase的相互导入”,加上ChatGPT的帮助,搞了大约4个小时,做了一个小案例。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/436442
推荐阅读
相关标签
  

闽ICP备14008679号