当前位置:   article > 正文

Flink-1.17.0(Standalone)集群安装-大数据学习系列(四)_flink 1.17.0

flink 1.17.0

前置:集群规划

机器信息

Hostname

k8s-master

k8s-node1

k8s-node2

外网IP

106.15.186.55

139.196.15.28

47.101.63.122

内网IP

172.25.73.65

172.25.73.64

172.25.73.66

master

slave1

slave2

slave3

step1 安装前准备

  1. 安装Scala

从官网(The Scala Programming Language)下载 Scala版本

链接: https://pan.baidu.com/s/1-GAeyyDOPjhsWhIp_VV7yg?pwd=3fws 提取码: 3fws 

2.1 在集群(各机器上都执行!!!)

  1. #创建安装目录
  2. mkdir -p /home/install/scala
  3. mkdir -p /home/module/scala
  4. #最终安装目录为/home/module/scala/scala-2.12.17/
  5. #向 /etc/profile 文件追加如下内容
  6. echo "export SCALA_HOME=/home/module/scala/scala-2.12.17" >> /etc/profile
  7. echo "export PATH=:\$PATH:\${SCALA_HOME}/bin:\${SCALA_HOME}/sbin" >> /etc/profile
  8. #使得配置文件生效
  9. source /etc/profile

2.2  切换到k8s-node1机器上操作(分发环境)

  1. cd /home/install/scala
  2. #上传 scala-2.12.17.tgz
  3. #解压压缩包到 安装目录
  4. tar -xvf /home/install/scala/scala-2.12.17.tgz -C   /home/module/scala/
  5. #测试是否安装成功
  6. scala -version
  7. #最终安装目录为/home/module/scala/scala-2.12.17/ 分发到各机器目录
  8. #复制到k8s-node1
  9. scp -r /home/module/scala/ root@k8s-node1:/home/module/scala/
  10. #复制到k8s-node2
  11. scp -r /home/module/scala/ root@k8s-node2:/home/module/scala/

2.3  切换到k8s-node1、k8s-node2 验证是否安装成功

  1. #测试是否安装成功
  2. scala -version

step2 安装Flink环境

1.下载Flink安装包

可以去官网下载 Apache Flink® — Stateful Computations over Data Streams | Apache Flink

flink-1.17.0-bin-scala_2.12.tgz 、  flink-shaded-hadoop-2-uber-2.8.3-10.0.jar

链接: https://pan.baidu.com/s/1X_P-Q8O_eLADmEOJ438u5Q?pwd=ugwu 提取码: ugwu 

  1. 创建Flink安装目录并解压

2.1 切换到k8s-master执行

  1. #创建安装目录
  2. mkdir -p /home/install/flink
  3. mkdir -p /home/module/flink
  4. #上传  flink-1.17.0-bin-scala_2.12.tgz  
  5. #上传  flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
  6. #进入安装目录
  7. cd /home/install/flink
  8. #解压压缩包  最终的安装目录为 /home/module/flink/flink-1.17.0
  9. tar -zxvf flink-1.17.0-bin-scala_2.12.tgz -C /home/module/flink
  10. #copy flink-shaded-hadoop-2-uber-2.8.3-10.0.jar 到安装目录lib中 如果不做这步 与hadoop有关的操作将会错误
  11. cp /home/install/flink/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar /home/module/flink/flink-1.17.0/lib

2.2 切换到k8s-node1执行

  1. #创建安装目录
  2. mkdir -p /home/install/flink
  3. mkdir -p /home/module/flink

2.3 切换到k8s-node2执行

  1. #创建安装目录
  2. mkdir -p /home/install/flink
  3. mkdir -p /home/module/flink

  1. 修改配置文件

切换到k8s-master执行

3.1 flink-conf.yaml

  1. #进入flink配置文件目录
  2. cd /home/module/flink/flink-1.17.0
  3. #给模版文件做个备份
  4. mv flink-conf.yaml flink-conf.yaml.bak

  1. cat > flink-conf.yaml << EOF 
  2. #指定集群主节点 可用机器名或者IP地址
  3. jobmanager.rpc.address: k8s-master 
  4. #JobManager的RPC访问端口,默认为6123
  5. jobmanager.rpc.port: 6123
  6. #JobManager JVM的堆内存大小,默认1024MB
  7. jobmanager.heap.size2048m
  8. #TaskManager JVM的堆内存大小,默认1024MB
  9. taskmanager.heap.size4096m
  10. #每个TaskManager提供的Task Slot数量(默认为1),Task Slot数量代表TaskManager的最大并行度,建议设置成cpu的核心数
  11. taskmanager.numberOfTaskSlots: 2
  12. #默认是false。指定Flink当启动时,是否一次性分配所有管理的内存
  13. taskmanager.memory.preallocate: false
  14. #系统级别的默认并行度(默认为1
  15. parallelism.default1
  16. #jobmanager端口 此处要注意端口冲突 netstat -anp |grep 端口号检查
  17. jobmanager.web.port: 8081
  18. #配置每个taskmanager 生成的临时文件夹
  19. taskmanager.tmp.dirs: /home/module/flink/tmp
  20. #页面提交
  21. web.submit.enable: true
  22. EOF

3.2 masters

  1. #进入flink的配置文件
  2. cd /home/module/flink/flink-1.17.0/conf
  3. #创建 master 文件
  4. cat > masters << EOF 
  5. k8s-master:8081 
  6. EOF

3.3 workers

workers文件必须包含所有需要启动的TaskManager节点的主机名,且每个主机名占一行

  1. #进入flink的配置文件
  2. cd /home/module/flink/flink-1.17.0/conf
  3. #创建 workers 文件
  4. cat > workers << EOF 
  5. k8s-master
  6. k8s-node1
  7. k8s-node2
  8. EOF

  1. 分发文件

切换到k8s-master执行

  1. #复制到k8s-node1
  2. scp -r /home/module/flink/flink-1.17.0 root@k8s-node1:/home/module/flink/flink-1.17.0
  3. #复制到k8s-node2
  4. scp -r /home/module/flink/flink-1.17.0 root@k8s-node2:/home/module/flink/flink-1.17.0

  1. 启动flink集群验证

  1. #启动集群
  2. /home/module/flink/flink-1.17.0/bin/start-cluster.sh
  3. #关闭集群
  4. #/home/module/flink/flink-1.17.0/bin/stop-cluster.sh
  5. #查看进程
  6. Jps -m

step3  Flink UI 环境验证

step4  Flink 任务执行验证

4.1 向hdfs上传文件

  1. #创建用于test的文件夹 并进入
  2. mkdir -p /home/test/flink
  3. cd /home/test/flink
  4. #创建计数用的文本
  5. cat > wordcount.txt << EOF 
  6. Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream.
  7. Data can be processed as unbounded or bounded streams.
  8. Unbounded streams have a start but no defined end. They do not terminate and provide data as it is generated. Unbounded streams must be continuously processed, i.e., events must be promptly handled after they have been ingested. It is not possible to wait for all input data to arrive because the input is unbounded and will not be complete at any point in time. Processing unbounded data often requires that events are ingested in a specific order, such as the order in which events occurred, to be able to reason about result completeness.
  9. Bounded streams have a defined start and end. Bounded streams can be processed by ingesting all data before performing any computations. Ordered ingestion is not required to process bounded streams because a bounded data set can always be sorted. Processing of bounded streams is also known as batch processing.
  10. EOF
  11. #在hdfs上创建测试目录
  12. hadoop fs -mkdir -p /mytest/input
  13. hadoop fs -put /home/test/flink/wordcount.txt /mytest/input

http://106.15.186.55:9870/

可以看到wordcount.txt 已经在HDFS上了

  1. #进入flink的执行目录
  2. cd /home/module/flink/flink-1.17.0/bin
  3. #执行测试任务
  4. ./flink run /home/module/flink/flink-1.17.0/examples/batch/WordCount.jar  --input hdfs://k8s-master:8020/mytest/input/wordcount.txt  --output hdfs://k8s-master:8020/mytest/output
  5. #获取结果
  6. cd /home/test/flink
  7. hadoop fs -get hdfs://k8s-master:8020/mytest/output
  8. cat output

至此测试成功

错误解决:

1.内存分配过小导致的错误

jobmanager.heap.size 建议大于2G

taskmanager.heap.size 建议大于4G

否则内存过小导致启动报错:

  1. INFO  [] - 'taskmanager.memory.flink.size' is not specified, use the configured deprecated task manager heap value (1024 bytes) for it.
  2. INFO  [] - The derived from fraction network memory (102 bytes) is less than its min value 64.000mb (67108864 bytes), min value will be used instead
  3. Exception in thread "main" org.apache.flink.configuration.IllegalConfigurationException: TaskManager memory configuration failed: Sum of configured Framework Heap Memory (128.000mb (134217728 bytes)), Framework Off-Heap Memory (128.000mb (134217728 bytes)), Task Off-Heap Memory (0 bytes), Managed Memory (409 bytes) and Network Memory (64.000mb (67108864 bytes)) exceed configured Total Flink Memory (1024 bytes).
  4.         at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:166)
  5.         at org.apache.flink.runtime.util.bash.BashJavaUtils.getTmResourceParams(BashJavaUtils.java:85)
  6.         at org.apache.flink.runtime.util.bash.BashJavaUtils.runCommand(BashJavaUtils.java:67)
  7.         at org.apache.flink.runtime.util.bash.BashJavaUtils.main(BashJavaUtils.java:56)
  8. Caused by: org.apache.flink.configuration.IllegalConfigurationException: Sum of configured Framework Heap Memory (128.000mb (134217728 bytes)), Framework Off-Heap Memory (128.000mb (134217728 bytes)), Task Off-Heap Memory (0 bytes), Managed Memory (409 bytes) and Network Memory (64.000mb (67108864 bytes)) exceed configured Total Flink Memory (1024 bytes).
  9.         at org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:178)
  10.         at org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:42)
  11.         at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalFlinkMemory(ProcessMemoryUtils.java:103)
  12.         at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:80)
  13.         at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:163)
  14.         ... 3 more

2.未放入对应hadoop插件导致的错误

  1. [root@k8s-master bin]# ./flink run /home/module/flink/flink-1.17.0/examples/batch/WordCount.jar  --input hdfs://k8s-master:8020/mytest/input/wordCount.txt  --output hdfs://k8s-master:8020/mytest/output
  2. ------------------------------------------------------------
  3.  The program finished with the following exception:
  4. org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster.
  5.         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
  6.         at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
  7.         at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
  8.         at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
  9.         at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
  10.         at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
  11.         at org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
  12.         at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
  13.         at org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
  14.         at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
  15. Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster.
  16.         at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
  17.         at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061)
  18.         at org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
  19.         at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
  20.         at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:93)
  21.         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  22.         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  23.         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  24.         at java.lang.reflect.Method.invoke(Method.java:498)
  25.         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
  26.         ... 9 more
  27. Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster.
  28.         at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
  29.         at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
  30.         at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
  31.         ... 17 more
  32. Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster.
  33.         at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
  34.         at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
  35.         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
  36.         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
  37.         at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
  38.         at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
  39.         at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
  40.         at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
  41.         at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
  42. Caused by: org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster.
  43.         at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97)
  44.         at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
  45.         at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
  46.         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
  47.         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609)
  48.         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  49.         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  50.         at java.lang.Thread.run(Thread.java:750)
  51. Caused by: java.util.concurrent.CompletionException: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://k8s-master:8020/mytest/output, delimiter:  ))': Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. For a full list of supported file systems, please see https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
  52.         at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
  53.         at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
  54.         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
  55.         ... 3 more
  56. Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://k8s-master:8020/mytest/output, delimiter:  ))': Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. For a full list of supported file systems, please see https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
  57.         at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
  58.         at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:114)
  59.         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
  60.         ... 3 more
  61. Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://k8s-master:8020/mytest/output, delimiter:  ))': Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. For a full list of supported file systems, please see https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
  62.         at org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder.buildGraph(DefaultExecutionGraphBuilder.java:189)
  63.         at org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:163)
  64.         at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:365)
  65.         at org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:210)
  66.         at org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:136)
  67.         at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:152)
  68.         at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:119)
  69.         at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:371)
  70.         at org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:348)
  71.         at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:123)
  72.         at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
  73.         at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
  74.         ... 4 more
  75. Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. For a full list of supported file systems, please see https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
  76.         at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:543)
  77.         at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409)
  78.         at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)
  79.         at org.apache.flink.api.common.io.FileOutputFormat.initializeGlobal(FileOutputFormat.java:288)
  80.         at org.apache.flink.runtime.jobgraph.InputOutputFormatVertex.initializeOnMaster(InputOutputFormatVertex.java:113)
  81.         at org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder.buildGraph(DefaultExecutionGraphBuilder.java:180)
  82.         ... 15 more
  83. Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
  84.         at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:55)
  85.         at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:526)
  86.         ... 20 more

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/548789
推荐阅读
相关标签
  

闽ICP备14008679号