赞
踩
Flink 官方doc地址: https://ci.apache.org/projects/flink/flink-docs-release-1.8/
官方下载地址:https://flink.apache.org/downloads.html
一般来说,flink部署常见的有三种模式,下面就举例就如何启动最简单的example wordcount来说说。
其中最多的是第三种。
1)Flink入门系列1----windows安装flink并启动
2) Flink入门系列2----Linux安装flink并启动
1) Flink入门系列3----多机部署standalone集群
Yarn上面配置HA模式有两种:session方式和单个job方式。
首先,你必须设置 YARN_CONF_DIR
或者 HADOOP_CONF_DIR
环境变量去指向hadoop 的配置文件目录。
- #vi /etc/profile
-
- 然后加入下面的一行
-
- export HADOOP_CONF_DIR=/etc/hadoop/conf
-
- # . /etc/profile
或者直接在shell console中,执行下面的语句也行。
# export HADOOP_CONF_DIR=/etc/hadoop/conf
目的就是让flink yarn client能够找到 hadoop集群的配置信息。
其次:要如果在CDH,你要设置用户为hdfs (因为CDH就是用这个用户管理hdfs的,否则就会出现读写HDFS 权限问题)
- # vi yarn-session.sh
-
- # add user begin
- export HADOOP_USER_NAME=hdfs
- #add user end
-
- JVM_ARGS="$JVM_ARGS -Xmx512m"
-
- CC_CLASSPATH=`manglePathList $(constructFlinkClassPath):$INTERNAL_HADOOP_CLASSPATHS`
Step 1: 生成session
- // cd <你的flink 安装目录>
- # cd flink-1.8.1
- # bin/yarn-session.sh -n 3 -s 3 -jm 1024 -tm 1024 -nm pinpoint-flink-job
//example output 如下
[root@sltuenym6bh flink-1.8.1]# bin/yarn-session.sh -n 3 -s 3 -jm 1024 -tm 1024 -nm pinpoint-flink-job SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/tanghb2/flink/flink-1.8.1/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2019-08-25 11:02:49,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2019-08-25 11:02:49,564 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2019-08-25 11:02:49,564 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2019-08-25 11:02:49,564 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2019-08-25 11:02:49,564 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2019-08-25 11:02:49,564 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2019-08-25 11:02:50,121 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to hdfs (auth:SIMPLE) 2019-08-25 11:02:50,346 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - The argument n is deprecated in will be ignored. 2019-08-25 11:02:50,494 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, numberTaskManagers=3, slotsPerTaskManager=3} 2019-08-25 11:02:50,950 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-08-25 11:02:50,967 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration directory ('/home/tanghb2/flink/flink-1.8.1/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2019-08-25 11:02:52,178 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1565019835797_0516 2019-08-25 11:02:52,204 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1565019835797_0516 2019-08-25 11:02:52,204 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2019-08-25 11:02:52,206 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2019-08-25 11:02:57,250 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. 2019-08-25 11:02:57,771 INFO org.apache.flink.runtime.rest.RestClient - Rest client endpoint started. Flink JobManager is now running on sltlrdwgty2.novalocal:44648 with leader id 00000000-0000-0000-0000-000000000000. JobManager Web Interface: http://sltlrdwgty2.novalocal:44648
注意上面给出了Job manager GUI地址:
Flink JobManager is now running on sltlrdwgty2.novalocal:44648 with leader id 00000000-0000-0000-0000-000000000000.
JobManager Web Interface: http://sltlrdwgty2.novalocal:44648
你的输出和你的YARN 集群配置有关。你可以根据这个GUI 地址去查看Flink Job mgr的信息了,样子如下。
注意:部署长期运行的flink on yarn实例后,在flink web上看到的TaskManager以及Slots都为0。只有在提交任务的时候,才会依据分配资源给对应的任务执行。
Step 2: 提交Job到长期运行的flink on yarn实例上
- # cd <你的flink安装目录>
- # wget -O LICENSE-2.0.txt http://www.apache.org/licenses/LICENSE-2.0.txt
- # hadoop fs -put ./LICENSE-2.0.txt hdfs://nameservice1:/billflink/
- # bin/flink run ./examples/batch/WordCount.jar --input hdfs://nameservice1:/billflink/LICENSE-2.0.txt --output hdfs://nameservice1:/billflink/wordcount_result.txt
输出的样子如下
- SLF4J: Class path contains multiple SLF4J bindings.
- SLF4J: Found binding in [jar:file:/home/tanghb2/flink/flink-1.8.1/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
- SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
- 2019-08-25 11:22:22,965 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-root.
- 2019-08-25 11:22:22,965 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - Found Yarn properties file under /tmp/.yarn-properties-root.
- 2019-08-25 11:22:23,385 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 9
- 2019-08-25 11:22:23,385 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - YARN properties set default parallelism to 9
- YARN properties set default parallelism to 9
- 2019-08-25 11:22:23,523 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
- 2019-08-25 11:22:23,523 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
- 2019-08-25 11:22:23,588 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Found application JobManager host name 'sltlrdwgty2.novalocal' and port '44648' from supplied application id 'application_1565019835797_0516'
- Starting execution of program
- Program execution finished
- Job with JobID f391357c52e0be2d518d743f858fb15e has finished.
- Job Runtime: 10516 ms
注:输出有下面的信息:2019-08-25 11:33:40,546 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Found application JobManager host name 'sltlrdwgty2.novalocal' and port '44648' from supplied application id 'application_1565019835797_0516'。
所以上面提交job时候,不需要指定job manager的地址,应该flink clinet能找到:job manager 和 application master是在同一个主机上。
去观察flink job manager GUI,如下所示。可以看到有作业提交和完成了。
- # hadoop fs -text hdfs://nameservice1:/billflink/wordcount_result.txt
- 0 3
- 1 2
- 2 4
- 2004 1
- 3 1
- 4 1
- 5 1
可以看到有统计的结果了。
然后可以删除掉上次的统计结果,再次提交作业
- # hadoop fs -rm hdfs://nameservice1:/billflink/wordcount_result.txt
- # bin/flink run ./examples/batch/WordCount.jar --input hdfs://nameservice1:/billflink/LICENSE-2.0.txt --output hdfs://nameservice1:/billflink/wordcount_result.txt
-
这时候,flink job manager 的GUI如下所示,显示有两个作业完成了。
演示完毕,更多的信息可以看下面的文章。
一篇讲得还可以的文章如下:
3) Apache Flink On Yarn模式高可用(HA)集群部署
请注意: client必须要设置YARN_CONF_DIR或者HADOOP_CONF_DIR环境变量,通过这个环境变量来读取YARN和HDFS的配置信息,否则启动会失败。经试验发现,其实如果配置的有HADOOP_HOME环境变量的话也是可以的。HADOOP_HOME ,YARN_CONF_DIR,HADOOP_CONF_DIR 只要配置的有任何一个即可
当然,flink还支持在 Kubernetes (k8s),Docker, AWS, Aliyun等上面运行,详见情况见(点击这里的官方文档)。
整个系统学习,见下面的文章,感觉不错,就当键盘侠收藏一下!
flink学习之十-window&ProcessingTime实例
flink学习之十二-eventTime中的watermark
官方资料:
flink的运行方式:
https://www.jianshu.com/p/c47e8f438291
C:flink1.8填坑笔记
1. Flink-1.8 在 Hadoop yarn cluster 的坑
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。