实战:HTML+Nginx+ngx_kafka_module+Kafka实现日志收集系统_ngx kafka module

1. 需要收集的信息
  • 1、用户ID(user_id)

  • 2、时间(act_time)

  • 3、操作(action,可以是:点击:click,收藏:job_collect,投简历:cv_send,上传简历:cv_upload)

  • 4、对方企业编码(job_code)

2. 工作流程
  • 1、HTML可以理解为拉勾的职位浏览页面

  • 2、用户的操作会由Web服务器进行响应。

  • 3、同时用户的操作也会使用ajax向Nginx发送请求,nginx用于收集用户的点击数据流。

  • 4、Nginx收集的日志数据使用ngx_kafka_module将数据发送到Kafka集群的主题中。

  • 5、只要数据保存到Kafka集群主题,后续就可以使用大数据组件进行实时计算或其他的处理了,比如职位推荐,统计报表等。

3. 架构




4. 实战步骤

编写 HTML(有个4个button)
配置 Nginx(基本配置和 ngx_kafka_module)
配置 Kafka,创建 Topic
监听 Topic,查看消息

5.1 kafka集群搭建


5.2 编写 HTML
<!DOCTYPE html>
<html lang="en">

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1,shrink-to-fit=no">
    <script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.5.1/jquery.js"></script>

        function current() {
            var d = new Date(),
                str = '';
            str += d.getFullYear() + '-';
            str += d.getMonth() + 1 + '-';
            str += d.getDate() + ' ';
            str += d.getHours() + ':';
            str += d.getMinutes() + ':';
            str += d.getSeconds();
            return str;

        function operate(action) {

            var json = {
                'user_id': 'rpp',
                'act_time': current().toString(),
                'action': action,
                'job_code': 'job_test'

                type: "POST",
                url: "",
                dataType: "json",
                crossDomain: true,
                data: JSON.stringify(json),
                xhrFields: {
                    withCredentials: true
                success: function (data) {
                error: function (data) {
<div class="row" style="text-align: center">
        <button type="button" id="click" onclick="operate('click')">点击</button>
        <button type="button" id="job_collect" onclick="operate('job_collect')">收藏职位</button>
        <button type="button" id="resume_upload" onclick="operate('cv_upload')">上传简历</button>
        <button type="button" id="resume_send" onclick="operate('cv_send')">投递简历</button>
5.3 配置 nginx
# 1. 安装git
$ yum install -y git

# 2. 安装相关依赖
$ yum install -y gcc gcc-c++ zlib zlib-devel openssl openssl-devel pcre pcre-devel

# 3. kafka的客户端源码
$ cd /root/software
$ git clone https://github.com/edenhill/librdkafka

# 4. 编译
$ cd /root/software/librdkafka 
$ ./configure
$ make && make install

# 5. 安装
$ cd /root/software
$ wget http://nginx.org/download/nginx-1.18.0.tar.gz

# 6. 解压
$ tar -zxf nginx-1.18.0.tar.gz

# 7. 下载模块源码
$ cd /root/software
$ git clone https://github.com/brg-liuwei/ngx_kafka_module

# 8. 编译
$ cd /root/software/nginx-1.18.0
$ ./configure --add-module=/root/software/ngx_kafka_module/
$ make && make install

修改 nginx.conf 配置

# 1. 修改 nginx.conf 配置
$ vi /usr/local/nginx/conf/nginx.conf

# 启动 nginx
$ cd /usr/local/nginx/sbin
$ ./nginx 
nginx.conf 如下

#pid        logs/nginx.pid;

events {
    worker_connections  1024;

http {
    include       mime.types;
    default_type  application/octet-stream;

    #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    #                  '$status $body_bytes_sent "$http_referer" '
    #                  '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  logs/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;

    kafka_broker_list rpp:9092 rpp:9093 rpp:9094;

    server {
        listen       9090;
        server_name  localhost;

        #charset koi8-r;

        #access_log  logs/host.access.log  main;

        location = /kafka/log {
                add_header 'Access-Control-Allow-Origin' $http_origin;
                add_header 'Access-Control-Allow-Credentials' 'true';
                add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';

                kafka_topic topic_rpp;

        #error_page  404              /404.html;
# 创建topic
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --create --topic topic_rpp --partitions 1 --replication-factor 1
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic "topic_rpp".
# 查看topic列表
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --list
# 查看topic详情
[root@rpp bin]# ./kafka-topics.sh --zookeeper rpp:2181/myKafka --describe --topic topic_rpp 
Topic:topic_rpp PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: topic_rpp        Partition: 0    Leader: 0       Replicas: 0     Isr: 0

./kafka-console-consumer.sh --bootstrap-server rpp:9092 --topic topic_rpp --from-beginning
5.5 测试


[root@rpp bin]# ./kafka-console-consumer.sh --bootstrap-server rpp:9092 --topic topic_rpp --from-beginning
{"user_id":"rpp","act_time":"2020-11-9 13:52:27","action":"click","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 13:52:41","action":"job_collect","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 14:2:48","action":"cv_send","job_code":"job_test"}
{"user_id":"rpp","act_time":"2020-11-9 14:3:4","action":"cv_upload","job_code":"job_test"}

