当前位置:   article > 正文

Flume实时读取目录文件到HDFS案例

Flume实时读取目录文件到HDFS案例

【尚硅谷】大数据技术之Flume教程从入门到实战_哔哩哔哩_bilibili

目录

flume简介

flume案例

1、监控端口数据官方案例

2、实时读取目录文件到HDFS案例


flume简介

Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统。Flume基于流式架构,灵活简单。

flume案例

1、监控端口数据官方案例

[atguigu@node001 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a1 --conf-file job/flume-telnet-logger.conf -Dflume.root.logger=INFO,console

  1. [atguigu@node001 ~]$ cd /opt/software/telnet
  2. [atguigu@node001 telnet]$ ll
  3. 总用量 224
  4. -rw-rw-r-- 1 atguigu atguigu 59332 4月 10 14:53 telnet-0.17-48.el6.x86_64.rpm
  5. -rw-rw-r-- 1 atguigu atguigu 37912 4月 10 14:53 telnet-server-0.17-48.el6.x86_64.rpm
  6. -rw-rw-r-- 1 atguigu atguigu 124812 4月 10 14:53 xinetd-2.3.14-40.el6.x86_64.rpm
  7. [atguigu@node001 telnet]$ sudo rpm -ivh xinetd-2.3.14-40.el6.x86_64.rpm
  8. 警告:xinetd-2.3.14-40.el6.x86_64.rpm: 头V3 RSA/SHA1 Signature, 密钥 ID c105b9de: NOKEY
  9. 准备中... ################################# [100%]
  10. 正在升级/安装...
  11. 1:xinetd-2:2.3.14-40.el6 ################################# [100%]
  12. [atguigu@node001 telnet]$ sudo rpm -ivh telnet-0.17-48.el6.x86_64.rpm
  13. 警告:telnet-0.17-48.el6.x86_64.rpm: 头V3 RSA/SHA1 Signature, 密钥 ID c105b9de: NOKEY
  14. 准备中... ################################# [100%]
  15. 正在升级/安装...
  16. 1:telnet-1:0.17-48.el6 ################################# [100%]
  17. [atguigu@node001 telnet]$ sudo rpm -ivh telnet-server-0.17-48.el6.x86_64.rpm
  18. 警告:telnet-server-0.17-48.el6.x86_64.rpm: 头V3 RSA/SHA1 Signature, 密钥 ID c105b9de: NOKEY
  19. 准备中... ################################# [100%]
  20. 正在升级/安装...
  21. 1:telnet-server-1:0.17-48.el6 ################################# [100%]
  22. [atguigu@node001 telnet]$ sudo netstat -tunlp | grep 44444
  23. tcp6 0 0 127.0.0.1:44444 :::* LISTEN 3139/java
  24. [atguigu@node001 telnet]$
  25. [atguigu@node001 telnet]$
  26. [atguigu@node001 telnet]$
  27. [atguigu@node001 telnet]$ telnet localhost 44444
  28. Trying ::1...
  29. telnet: connect to address ::1: Connection refused
  30. Trying 127.0.0.1...
  31. Connected to localhost.
  32. Escape character is '^]'.
  33. hello
  34. OK
  35. ‘’^Hshidhsidaskdhkasjhdkjshalkdhksjhasjhdjkasd
  36. OK
  37. ''
  38. OK
  39. 你好,我是xxx,今年xxx岁。
  40. OK

2、实时读取目录文件到HDFS案例

flume-dir-hdfs.conf

  1. a3.sources = r3
  2. a3.sinks = k3
  3. a3.channels = c3
  4. # Describe/configure the source
  5. a3.sources.r3.type = spooldir
  6. a3.sources.r3.spoolDir = /opt/module/flume/flume-1.7.0/uploads
  7. a3.sources.r3.fileSuffix = .COMPLETED
  8. a3.sources.r3.fileHeader = true
  9. #忽略所有以.tmp结尾的文件,不上传
  10. a3.sources.r3.ignorePattern = ([^ ]*\.tmp)
  11. # Describe the sink
  12. a3.sinks.k3.type = hdfs
  13. a3.sinks.k3.hdfs.path = hdfs://node001:8020/flume/upload/%Y%m%d/%H
  14. #上传文件的前缀
  15. a3.sinks.k3.hdfs.filePrefix = upload-
  16. #是否按照时间滚动文件夹
  17. a3.sinks.k3.hdfs.round = true
  18. #多少时间单位创建一个新的文件夹
  19. a3.sinks.k3.hdfs.roundValue = 1
  20. #重新定义时间单位
  21. a3.sinks.k3.hdfs.roundUnit = hour
  22. #是否使用本地时间戳
  23. a3.sinks.k3.hdfs.useLocalTimeStamp = true
  24. #积攒多少个Event才flush到HDFS一次
  25. a3.sinks.k3.hdfs.batchSize = 100
  26. #设置文件类型,可支持压缩
  27. a3.sinks.k3.hdfs.fileType = DataStream
  28. #多久生成一个新的文件
  29. a3.sinks.k3.hdfs.rollInterval = 600
  30. #设置每个文件的滚动大小大概是128M
  31. a3.sinks.k3.hdfs.rollSize = 134217700
  32. #文件的滚动与Event数量无关
  33. a3.sinks.k3.hdfs.rollCount = 0
  34. #最小冗余数
  35. a3.sinks.k3.hdfs.minBlockReplicas = 1
  36. # Use a channel which buffers events in memory
  37. a3.channels.c3.type = memory
  38. a3.channels.c3.capacity = 1000
  39. a3.channels.c3.transactionCapacity = 100
  40. # Bind the source and sink to the channel
  41. a3.sources.r3.channels = c3
  42. a3.sinks.k3.channel = c3
  1. [atguigu@node001 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/enterpriseDevelopmentCases/flume-dir-hdfs.conf
  2. Info: Sourcing environment configuration script /opt/module/flume/flume-1.7.0/conf/flume-env.sh
  3. Info: Including Hadoop libraries found via (/opt/module/hadoop/hadoop-3.1.3/bin/hadoop) for HDFS access
  4. Info: Including HBASE libraries found via (/opt/module/hbase/hbase-2.0.5/bin/hbase) for HBASE access
  1. [atguigu@node001 hive-3.1.2]$ cd /opt/module/flume/flume-1.7.0/uploads/
  2. [atguigu@node001 uploads]$ ll
  3. 总用量 0
  4. [atguigu@node001 uploads]$ touch 1.txt
  5. [atguigu@node001 uploads]$ vim 2.txt
  6. [atguigu@node001 uploads]$ cat 2.txt
  7. cat: 2.txt: 没有那个文件或目录
  8. [atguigu@node001 uploads]$ ll
  9. 总用量 4
  10. -rw-rw-r-- 1 atguigu atguigu 0 4月 10 15:58 1.txt.COMPLETED
  11. -rw-rw-r-- 1 atguigu atguigu 22 4月 10 16:00 2.txt.COMPLETED
  12. [atguigu@node001 uploads]$

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/2023面试高手/article/detail/431053
推荐阅读
相关标签
  

闽ICP备14008679号