赞
踩
DATAX官方地址:https://github.com/alibaba/DataX
DATAX-WEB官方地址:https://github.com/WeiYe-Jing/datax-web
注:官方已经给了很详细的安装文档。这里不过多解释。
# 在跑job的时候遇到如下问题
/usr/bin/python: can't find '__main__' module in
# 解决办法
vim {datax-web}/modules/datax-executor/bin/datax-executor.sh
# 找到对应的
JAVA_OPTS=${JAVA_OPTS}" -Dserver.port="${SERVER_PORT}" -Ddata.path="${DATA_PATH}" -Dexecutor.port="${EXECUTOR_PORT}" -Djson.path="${JSON_PATH}" -Dpython.path="${PYTHON_PATH}" -Ddatax.admin.port="${DATAX_ADMIN_PORT}
# 修改如下
JAVA_OPTS=${JAVA_OPTS}" -Dserver.port="${SERVER_PORT}" -Ddata.path="${DATA_PATH}" -Dexecutor.port="${EXECUTOR_PORT}" -Djson.path="${JSON_PATH}" -Dpython.path="{datax}/bin/datax.py" -Ddatax.admin.port="${DATAX_ADMIN_PORT}
# 最后重启datax-web
{datax-web}/bin/stop-all.sh
{datax-web}/bin/start-all.sh
# DataX报错解决办法 - 在有总bps限速条件下,单个channel的bps值不能为空,也不能为非正数
# 修改datax/conf/core.json
# 修改core -> transport -> channel -> speed -> "byte": 2000000
"core": {
"dataXServer": {
"address": "http://localhost:7001/api",
"timeout": 10000,
"reportDataxLog": false,
"reportPerfLog": false
},
"transport": {
"channel": {
"class": "com.alibaba.datax.core.transport.channel.memory.MemoryChannel",
"speed": {
"byte": 2000000,
"record": -1
},
"flowControlInterval": 20,
"capacity": 512,
"byteCapacity": 67108864
},
"exchanger": {
"class": "com.alibaba.datax.core.plugin.BufferedRecordExchanger",
"bufferSize": 32
}
},
}
图略。
注:这里说明一下分区表如何操作(这里演示静态分区)。
- 静态分区
create table student2(
commentId int,
newsId int,
content String,
userIP string,
commentDate date
)
partitioned by (day string)
row format delimited fields terminated by '\t';
alter table student2 add partition (day=20230322);
执行脚本的时候修改如下
# 相关配置文件
"path": "/user/hive/warehouse/test.db/student2/day=20230322"
结果如下所示
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。