赞
踩
1、用户ID(user_id)
2、时间(act_time)
3、操作(action,可以是:点击:click,收藏:job_collect,投简历:cv_send,上传简历:cv_upload)
4、对方企业编码(job_code)
1、HTML可以理解为拉勾的职位浏览页面
2、用户的操作会由Web服务器进行响应。
3、同时用户的操作也会使用ajax向Nginx发送请求,nginx用于收集用户的点击数据流。
4、Nginx收集的日志数据使用ngx_kafka_module将数据发送到Kafka集群的主题中。
5、只要数据保存到Kafka集群主题,后续就可以使用大数据组件进行实时计算或其他的处理了,比如职位推荐,统计报表等。
HTML+Nginx+ngx_kafka_module+Kafka
ngx_kafka_module网址:https://github.com/brg-liuwei/ngx_kafka_module
注意问题:由于使用ngx_kafka_module,只能接收POST请求,同时一般Web服务器不会和数据收集的Nginx在同一个域名,会涉及到使用ajax发送请求的跨域问题,可以在nginx中配置跨域来解决。
kafka集群搭建
编写 HTML(有个4个button)
配置 Nginx(基本配置和 ngx_kafka_module)
配置 Kafka,创建 Topic
监听 Topic,查看消息
参考:https://segmentfault.com/a/1190000023379555
<!DOCTYPE html> <html lang="en"> <head> <title>kafka_test</title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1,shrink-to-fit=no"> <script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.5.1/jquery.js"></script> <script> function current() { var d = new Date(), str = ''; str += d.getFullYear() + '-'; str += d.getMonth() + 1 + '-'; str += d.getDate() + ' '; str += d.getHours() + ':'; str += d.getMinutes() + ':'; str += d.getSeconds(); return str; } function operate(action) { var json = { 'user_id': 'rpp', 'act_time': current().toString(), 'action': action, 'job_code': 'job_test' }; $.ajax({ type: "POST", url: "http://47.94.80.41:9090/kafka/log", dataType: "json", crossDomain: true, data: JSON.stringify(json), //允许跨域的cookie访问 xhrFields: { withCredentials: true }, success: function (data) { alert("success") }, error: function (data) { alert("success") } }) } </script> </head> <body> <div class="row" style="text-align: center"> <div> <button type="button" id="click" onclick="operate('click')">点击</button> </div> <div> <button type="button" id="job_collect" onclick="operate('job_collect')">收藏职位</button> </div> <div> <button type="button" id="resume_upload" onclick="operate('cv_upload')">上传简历</button> </div> <div> <button type="button" id="resume_send" onclick="operate('cv_send')">投递简历</button> </div> </div> </body> </html>
# 1. 安装git $ yum install -y git # 2. 安装相关依赖 $ yum install -y gcc gcc-c++ zlib zlib-devel openssl openssl-devel pcre pcre-devel # 3. kafka的客户端源码 $ cd /root/software $ git clone https://github.com/edenhill/librdkafka # 4. 编译 $ cd /root/software/librdkafka $ ./configure $ make && make install # 5. 安装 $ cd /root/software $ wget http://nginx.org/download/nginx-1.18.0.tar.gz # 6. 解压 $ tar -zxf nginx-1.18.0.tar.gz # 7. 下载模块源码 $ cd /root/software $ git clone https://github.com/brg-liuwei/ngx_kafka_module # 8. 编译 $ cd /root/software/nginx-1.18.0 $ ./configure --add-module=/root/software/ngx_kafka_module/ $ make && make install
修改 nginx.conf 配置
# 1. 修改 nginx.conf 配置
$ vi /usr/local/nginx/conf/nginx.conf
# 启动 nginx
$ cd /usr/local/nginx/sbin
$ ./nginx
nginx.conf 如下
#pid logs/nginx.pid; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; #log_format main '$remote_addr - $remote_user [$time_local] "$request" ' # '$status $body_bytes_sent "$http_referer" ' # '"$http_user_agent" "$http_x_forwarded_for"'; #access_log logs/access.log main; sendfile on; #tcp_nopush on; #keepalive_timeout 0; keepalive_timeout 65; #gzip on; kafka; kafka_broker_list rpp:9092 rpp:9093 rpp:9094; server { listen 9090; server_name localhost; #charset koi8-r; #access_log logs/host.access.log main; #------------kafka相关配置开始------------ location = /kafka/log { #跨域相关配置 add_header 'Access-Control-Allow-Origin' $http_origin; add_header 'Access-Control-Allow-Credentials' 'true'; add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS'; kafka_topic topic_rpp; } #------------kafka相关配置结束------------ #error_page 404 /404.html; } }
# 创建topic
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --create --topic topic_rpp --partitions 1 --replication-factor 1
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic "topic_rpp".
# 查看topic列表
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --list
topic_rpp
# 查看topic详情
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --describe --topic topic_rpp
Topic:topic_rpp PartitionCount:1 ReplicationFactor:1 Configs:
Topic: topic_rpp Partition: 0 Leader: 0 Replicas: 0 Isr: 0
启动消费者,监听消息
./kafka-console-consumer.sh --bootstrap-server rpp:9092 --topic topic_rpp --from-beginning
html页面如下
点击按钮,查看接收到的消息,如下
[root@rpp bin]# ./kafka-console-consumer.sh --bootstrap-server rpp:9092 --topic topic_rpp --from-beginning
{"user_id":"rpp","act_time":"2020-11-9 13:52:27","action":"click","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 13:52:41","action":"job_collect","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 14:2:48","action":"cv_send","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 14:3:4","action":"cv_upload","job_code":"job_test"}
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。