当前位置:   article > 正文

Spark项目---- 模拟互联网网站用户行为实时分析系统(第二部分)_网络用户行为模拟

网络用户行为模拟

1)安装HBASE

https://blog.csdn.net/hailunw/article/details/119057361

2)在HBASE中创建表

  1. [user@NewBieSlave1 hbase-2.3.5]$ hbase shell
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/home/user/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/home/user/hbase-2.3.5/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  6. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  7. HBase Shell
  8. Use "help" to get list of supported commands.
  9. Use "exit" to quit this interactive shell.
  10. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
  11. Version 2.3.5, rfd3fdc08d1cd43eb3432a1a70d31c3aece6ecabe, Thu Mar 25 20:50:15 UTC 2021
  12. Took 0.0014 seconds
  13. hbase(main):001:0> create 'course_clickcount','info'
  14. Created table course_clickcount
  15. Took 1.1887 seconds
  16. => Hbase::Table - course_clickcount
  17. hbase(main):002:0> create 'course_search_clickcount','info'
  18. Created table course_search_clickcount
  19. Took 0.6424 seconds
  20. => Hbase::Table - course_search_clickcount
  21. hbase(main):003:0> list
  22. TABLE
  23. category_clickcount
  24. course_clickcount
  25. course_search_clickcount
  26. helloWorld
  27. 4 row(s)
  28. Took 0.0186 seconds
  29. => ["category_clickcount", "course_clickcount", "course_search_clickcount", "helloWorld"]
  30. hbase(main):004:0> describe 'course_clickcount'
  31. Table course_clickcount is ENABLED
  32. course_clickcount
  33. COLUMN FAMILIES DESCRIPTION
  34. {NAME => 'info', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VE
  35. RSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
  36. 1 row(s)
  37. Quota is disabled
  38. Took 0.1509 seconds
  39. hbase(main):005:0> scan 'course_clickcount'
  40. ROW COLUMN+CELL
  41. 0 row(s)
  42. Took 0.0960 seconds

3) 创建实体类ClickLog,CourseClickCount 以及CourseSearchClickCount

 

4)创建日期格式 转换工具类(Scala实现)

5)创建 HBASE的DAO类 CourseClickCountDAO 和 CourseSearchClickCountDAO

6) 修改 Kafka集群的SparkStream读取类,增加数据清洗的逻辑

7)修改 Kafka集群的SparkStream读取类,增加数据清洗,以及统计后写入数据库的逻辑

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/813507
推荐阅读
相关标签
  

闽ICP备14008679号