赞
踩
随着开发程序的增多,任务调度以及任务之间的依赖关系管理就成为一个比较头疼的问题,随时少量的任务可以用linux系统自带的crontab加以定时进行,但缺点也很明细,不够直观,以及修改起来比较麻烦,容易出错,这时候就需要调度工具来帮忙,不知道大家都接触过哪些调度工具,我这边接触过airflow、oozie、 Kyligence,但今天我想推荐的调度工具是dolphinscheduler,下面就从安装部署来简单介绍下该工具。
dolphinscheduler是一个国产的调度工具,非常符合国人的使用习惯,支持的调度任务类型也是非常之多,包括常见的spark、flink、sql、shell、python、datax、sqoop、seatunel、dinky等,可以说是相对比较全面,另外除了任务调度,还具有资源管理,多租户等功能,对于一般的中小型企业来说,这些功能足够用。
由于dolphinscheduler元数据注册在zookeeper中,所以部署dolphinscheduler前需安装zookeeper,具体安装步骤在我之前发表的文章中有讲解,可以去翻看下,另外,安装环境也是需要安装jdk的,具体安装步骤这里就不再赘述了,可以看下我之前发表的文章。
登录dolphinscher安装包下载地址https://dlcdn.apache.org/dolphinscheduler/,选择一个版本,点击apache-dolphinscheduler-xxx-bin.tar.gz,进入下载页面,目前最新的版本是3.2.0,但笔者还是推荐3.1.8版本,所以今天的安装部署都是围绕3.1.8版本来介绍,
安装包下载后,执行以下命令解压并修改名称
tar -zxvf apache-dolphinscheduler-3.1.8-bin.tar.gz
mv apache-dolphinscheduler-3.1.8-bin dolphinscheduler-3.1.8
进入解压后的文件到 dolphinscheduler-3.1.8/bin/env目录,vim dolphinscheduler_env.sh配置dolphinscheduler的数据源、zookeeper连接信息以及spark、flink、datax、seatunnel安装目录地址
提示:配置信息可根据自身环境不同而自行修改
export JAVA_HOME=${JAVA_HOME:-"/usr/java/jdk1.8.0_181-cloudera"} # Database related configuration, set database type, username and password export DATABASE=${DATABASE:-"mysql"} export SPRING_PROFILES_ACTIVE=${DATABASE} export SPRING_DATASOURCE_URL=${SPRING_DATASOURCE_URL:-"jdbc:mysql://ds1:3306/dolphinscheduler?useSSL=false"} export SPRING_DATASOURCE_USERNAME=${SPRING_DATASOURCE_USERNAME:-"root"} export SPRING_DATASOURCE_PASSWORD=${SPRING_DATASOURCE_PASSWORD:-"*****"} # DolphinScheduler server related configuration export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none} export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-"Asia/Shanghai"} export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10} # Registry center configuration, determines the type and link of the registry center export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper} export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-ds1:2181,ds2:2181,ds3:2181} # Tasks related configurations, need to change the configuration if you use the related tasks. export HADOOP_HOME=${HADOOP_HOME:-"/application/hadoop"} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/application/hadoop/etc/hadoop"} export SPARK_HOME2=${SPARK_HOME2:-"/application/spark"} export PYTHON_HOME=${PYTHON_HOME:-"/usr/bin/python"} export HIVE_HOME=${HIVE_HOME:-"/application/hive"} export FLINK_HOME=${FLINK_HOME:-"/application/flink"} export DATAX_HOME=${DATAX_HOME:-"/opt/soft/datax"} export SEATUNNEL_HOME=${SEATUNNEL_HOME:-"/application/seatunnel"} export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun} export PATH=$HADOOP_HOME/bin:$SPARK_HOME/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATH
vim install_env.sh编辑dolphinscheduler的master、worker、apiserver、alterserver、服务器上安装的路径以及部署的用户名和zookeeper的注册路径
ips=${ips:-"ds1,ds2,ds3"} # Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine # modify it if you use different ssh port sshPort=${sshPort:-"22"} # A comma separated list of machine hostname or IP would be installed Master server, it # must be a subset of configuration `ips`. # Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2" masters=${masters:-"ds1,ds2,ds3"} # A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a # subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts # Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default" workers=${workers:-"ds1:default,ds2:default,ds3:default"} # A comma separated list of machine hostname or IP would be installed Alert server, it # must be a subset of configuration `ips`. # Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3" alertServer=${alertServer:-"ds3"} # A comma separated list of machine hostname or IP would be installed API server, it # must be a subset of configuration `ips`. # Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1" apiServers=${apiServers:-"ds2"} # The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists. # Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path. installPath=${installPath:-"/application/dolphinscheduler"} # The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh` # script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs # to be created by this user deployUser=${deployUser:-"root"} # The root of zookeeper, for now DolphinScheduler default registry server is zookeeper. zkRoot=${zkRoot:-"/dolphinscheduler"}
进入解压后的文件目录dolphinscheduler-3.1.8/api-server/conf,vim common.properties编辑资源配置路径,dolphinscheduler-3.1.8/worker-server/conf目录下的common.properties也需要配置
提示:此处是配置文件或jar包上传的资源中心,需要注意的几个地方分别是data.basedir.path、resource.storage.type、resource.storage.upload.base.path、resource.hdfs.root.user、resource.hdfs.fs.defaultFS其他配置信息可根据需要自行配置或者抱持默认
data.basedir.path=/application/data # resource view suffixs #resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js # resource storage type: HDFS, S3, OSS, NONE resource.storage.type=HDFS # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended resource.storage.upload.base.path=/dolphinscheduler # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.access.key.id=minioadmin # The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.secret.access.key=minioadmin # The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.region=cn-north-1 # The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name. resource.aws.s3.bucket.name=dolphinscheduler # You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn resource.aws.s3.endpoint=http://localhost:9000 # alibaba cloud access key id, required if you set resource.storage.type=OSS resource.alibaba.cloud.access.key.id=<your-access-key-id> # alibaba cloud access key secret, required if you set resource.storage.type=OSS resource.alibaba.cloud.access.key.secret=<your-access-key-secret> # alibaba cloud region, required if you set resource.storage.type=OSS resource.alibaba.cloud.region=cn-hangzhou # oss bucket name, required if you set resource.storage.type=OSS resource.alibaba.cloud.oss.bucket.name=dolphinscheduler # oss bucket endpoint, required if you set resource.storage.type=OSS resource.alibaba.cloud.oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path resource.hdfs.root.user=root # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir resource.hdfs.fs.defaultFS=hdfs://ds1:8020 # whether to startup kerberos hadoop.security.authentication.startup.state=false # java.security.krb5.conf path java.security.krb5.conf.path=/opt/krb5.conf # login user from keytab username login.user.keytab.username=hdfs-mycluster@ESZ.COM # login user from keytab path login.user.keytab.path=/opt/hdfs.headless.keytab # kerberos expire time, the unit is hour kerberos.expire.time=2 # resourcemanager port, the default value is 8088 if not specified resource.manager.httpaddress.port=8088 # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty yarn.resourcemanager.ha.rm.ids=ds1 # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address=http://ds1:19888/ws/v1/history/mapreduce/jobs/%s # datasource encryption enable datasource.encryption.enable=false # datasource encryption salt datasource.encryption.salt=!@#$%^&* # data quality option data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar #data-quality.error.output.path=/tmp/data-quality-error-data # Network IP gets priority, default inner outer # Whether hive SQL is executed in the same session support.hive.oneSession=true # use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions sudo.enable=true setTaskDirToTenant.enable=false
由于我这边配置的元数据存储中心是mysql,所以首先需要将mysql驱动拷贝
dolphinscheduler每个模块的libs目录下,其中包括api-server/libs、alert-server/libs、master-server/libs、worker-server/libs和tools/libs;
在mysql数据库中需要先创建dolphinscheduler数据库,如果需要指定用户,需要为该用户赋权,相关命令如下
提示:mysql5和mysql8版本语法有差异,请根据自身版本做修改,下面的例子是mysql8版本
CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE USER '{user}'@'%' IDENTIFIED BY '{password}';
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
FLUSH PRIVILEGES;
进入到dolphinscheduler-3.1.8目录下,执行bash tools/bin/upgrade-schema.sh命令进行初始化,此时注意是否有错误信息,初始化成功后,可执行bin/install.sh命令,dolphinscheduler即可自行安装到配置文件里的安装路径,并将安装服务复制到定义的worker-server节点,接着输入以下地址看看是否能够登录http://ds2:12345/dolphinscheduler/ui(此处的ds2是配置文件中定义的apiserver),当看到以下界面是证明启动成功,初始账号密码为admin/dolphinscheduler123
进入系统后,首先需要创建项目
创建项目后,点击项目名称,即可进入到工作流定义界面
点击工作流定义,创建工作流,左侧列表中有拖拽自己任务的类型,这里以shell任务为例,输入节点名称以及脚本命令,点击保存
保存完成后,会弹出定义工作流的名称、租户以及执行策略等,点击确定后,该工作流定义完成
工作流右侧的按钮分别是编辑、运行、定时、上线、复制、定时管理、工作流树形图、导出、版本信息,需先点击上线后,才能运行该程序
点击运行时,弹出提示框,有通知策略、流程优先级、分组、环境名称等信息可根据自身需求自行定义,点击确定以运行该工作流
可在工作流实例中查看工作流的运行情况
可在任务实例中,查看工作流里面的任务实例的日志信息
任务运行成功后,可通过工作流定义里面的定时功能,对该工作流定义一个自动运行的时间及频率,点击确定后,还需要点击工作流定义中的定时管理,对刚才定义的定时进行上线,此时该工作流的定时功能才算完成
试用dolphinscheduler已经有一段时间了,从之前的2.7到现在的3.x版本,部署的方式有了些许的改变,之前的2.x版本,各个模块都是在一块的,到了3.0版本之后,api-server、work-server、master-server、alter-server都分开的,有了调度平台之后,编写的spark、flink任务部署起来就会直观很多,不用到服务器上逐个任务排查了,由于篇幅有限,其中的资源管理(可以上传脚本以及编写的程序jar包等)、数据源配置以及数据质量等功能就不一一展示了,具体的细节,大家可以下载安装部署,试试它的功能。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。