当前位置:   article > 正文

DataX-数据迁移Oracle到Mysql-ETL工具_datax实现orcel迁移mysql

datax实现orcel迁移mysql

一、安装

https://github.com/WeiYe-Jing/datax-web/blob/master/doc/datax-web/datax-web-deploy.md

  • 1、直接下载DataX工具包:DataX下载地址

    下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

    $ cd  {YOUR_DATAX_HOME}/bin
    $ python datax.py {YOUR_JOB.json}

    自检脚本:    python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json

  • 2、Python (2.x) (支持Python3需要修改替换datax/bin下面的三个python文件,替换文件在doc/datax-web/datax-python3下) 必选,主要用于调度执行底层DataX的启动脚本,默认的方式是以Java子进程方式执行DataX,用户可以选择以Python方式来做自定义的改造

二、测试脚本

  1. C:\DATAX\datax\bin>CHCP 65001
  2. Active code page: 65001
  3. C:\DATAX\datax\bin>python datax.py C:\DATAX\datax\job\test.json
  4. DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
  5. Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
  6. 2024-03-14 15:02:01.361 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
  7. 2024-03-14 15:02:01.363 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
  8. 2024-03-14 15:02:01.375 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
  9. 2024-03-14 15:02:01.381 [main] INFO Engine - the machine info =>
  10. osInfo: Windows 10 amd64 10.0
  11. jvmInfo: Oracle Corporation 1.8 25.341-b10
  12. cpu num: 6
  13. totalPhysicalMemory: -0.00G
  14. freePhysicalMemory: -0.00G
  15. maxFileDescriptorCount: -1
  16. currentOpenFileDescriptorCount: -1
  17. GC Names [PS MarkSweep, PS Scavenge]
  18. MEMORY_NAME | allocation_size | init_size
  19. PS Eden Space | 256.00MB | 256.00MB
  20. Code Cache | 240.00MB | 2.44MB
  21. Compressed Class Space | 1,024.00MB | 0.00MB
  22. PS Survivor Space | 42.50MB | 42.50MB
  23. PS Old Gen | 683.00MB | 683.00MB
  24. Metaspace | -0.00MB | 0.00MB
  25. 2024-03-14 15:02:01.392 [main] INFO Engine -
  26. {
  27. "setting":{
  28. "speed":{
  29. "channel":4
  30. }
  31. },
  32. "content":[
  33. {
  34. "reader":{
  35. "name":"oraclereader",
  36. "parameter":{
  37. "username":"test",
  38. "password":"******",
  39. "column":[
  40. "*"
  41. ],
  42. "connection":[
  43. {
  44. "table":[
  45. "CAR.S_U"
  46. ],
  47. "jdbcUrl":[
  48. "jdbc:oracle:thin:@//10.33.51.231:1521/helowin"
  49. ]
  50. }
  51. ]
  52. }
  53. },
  54. "writer":{
  55. "name":"mysqlwriter",
  56. "parameter":{
  57. "username":"car",
  58. "password":"******",
  59. "writeMode":"replace",
  60. "column":[
  61. "*"
  62. ],
  63. "connection":[
  64. {
  65. "jdbcUrl":"jdbc:mysql://10.33.51.231:3306/car",
  66. "table":[
  67. "S_U"
  68. ]
  69. }
  70. ]
  71. }
  72. }
  73. }
  74. ]
  75. }
  76. 2024-03-14 15:02:01.417 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false
  77. 2024-03-14 15:02:01.418 [main] INFO JobContainer - DataX jobContainer starts job.
  78. 2024-03-14 15:02:01.420 [main] INFO JobContainer - Set jobId = 0
  79. 2024-03-14 15:02:01.623 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@//10.33.51.231:1521/helowin.
  80. 2024-03-14 15:02:01.624 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置 读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
  81. Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
  82. 2024-03-14 15:02:07.748 [job-0] INFO OriginalConfPretreatmentUtil - table:[S_BU] all columns:[
  83. M_ROW$$,ANNLRVW_ENDT_OFFST,ANNLRVW_STDT_OFFST,BU_FLG,CONFLICT_ID,CREATED,CREATED_BY,CURR_PRD_OBJ_NAME,CURR_PRD_RVW_NAME,DYN_HRCHY_ID,LAST_UPD,LAST_UPD_BY,MODIFICATION_NUM,NAME,PAR_ROW_ID,ROW_ID,X_LANG_ID
  84. ].
  85. 2024-03-14 15:02:07.748 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写 入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
  86. 2024-03-14 15:02:07.752 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
  87. replace INTO %s (M_ROW$$,ANNLRVW_ENDT_OFFST,ANNLRVW_STDT_OFFST,BU_FLG,CONFLICT_ID,CREATED,CREATED_BY,CURR_PRD_OBJ_NAME,CURR_PRD_RVW_NAME,DYN_HRCHY_ID,LAST_UPD,LAST_UPD_BY,MODIFICATION_NUM,NAME,PAR_ROW_ID,ROW_ID,X_LANG_ID) VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
  88. ], which jdbcUrl like:[jdbc:mysql://10.33.51.231:3306/carmen?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
  89. 2024-03-14 15:02:07.753 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
  90. 2024-03-14 15:02:07.753 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do prepare work .
  91. 2024-03-14 15:02:07.754 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
  92. 2024-03-14 15:02:07.755 [job-0] INFO JobContainer - jobContainer starts to do split ...
  93. 2024-03-14 15:02:07.755 [job-0] INFO JobContainer - Job set Channel-Number to 4 channels.
  94. 2024-03-14 15:02:07.757 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
  95. 2024-03-14 15:02:07.758 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
  96. 2024-03-14 15:02:07.778 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
  97. 2024-03-14 15:02:07.779 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
  98. 2024-03-14 15:02:07.781 [job-0] INFO JobContainer - Running by standalone Mode.
  99. 2024-03-14 15:02:07.786 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
  100. 2024-03-14 15:02:07.790 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
  101. 2024-03-14 15:02:07.790 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
  102. 2024-03-14 15:02:07.796 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
  103. 2024-03-14 15:02:07.798 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from CARMEN.S_BU
  104. ] jdbcUrl:[jdbc:oracle:thin:@//10.33.51.231:1521/helowin].
  105. 2024-03-14 15:02:16.306 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from CARMEN.S_BU
  106. ] jdbcUrl:[jdbc:oracle:thin:@//10.33.51.231:1521/helowin].
  107. 2024-03-14 15:02:16.845 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[9050]ms
  108. 2024-03-14 15:02:16.847 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
  109. 2024-03-14 15:02:17.798 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 14644530 bytes | Speed 1.40MB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 6.239s | All Task WaitReaderTime 1.988s | Percentage 100.00%
  110. 2024-03-14 15:02:17.800 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
  111. 2024-03-14 15:02:17.805 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
  112. 2024-03-14 15:02:17.805 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do post work.
  113. 2024-03-14 15:02:17.806 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
  114. 2024-03-14 15:02:17.807 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: C:\DATAX\datax\hook
  115. 2024-03-14 15:02:17.808 [job-0] INFO JobContainer -
  116. [total cpu info] =>
  117. averageCpu | maxDeltaCpu | minDeltaCpu
  118. -1.00% | -1.00% | -1.00%
  119. [total gc info] =>
  120. NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
  121. PS MarkSweep | 1 | 1 | 1 | 0.018s | 0.018s | 0.018s
  122. PS Scavenge | 5 | 5 | 5 | 0.028s | 0.028s | 0.028s
  123. 2024-03-14 15:02:17.808 [job-0] INFO JobContainer - PerfTrace not enable!
  124. 2024-03-14 15:02:17.808 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 14644530 bytes | Speed 1.40MB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 6.239s | All Task WaitReaderTime 1.988s | Percentage 100.00%
  125. 2024-03-14 15:02:17.810 [job-0] INFO JobContainer -
  126. 任务启动时刻 : 2024-03-14 15:02:01
  127. 任务结束时刻 : 2024-03-14 15:02:17
  128. 任务总计耗时 : 16s
  129. 任务平均流量 : 1.40MB/s
  130. 记录写入速度 : 10000rec/s
  131. 读出记录总数 : 100000
  132. 读写失败总数 : 0
  133. C:\DATAX\datax\bin>

三、datax-web: DataX管理调度GitHub - WeiYe-Jing/datax-web: DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持RDBMS、Hive、HBase、ClickHouse、MongoDB等数据源,批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/442514
推荐阅读
相关标签
  

闽ICP备14008679号