当前位置:   article > 正文

redis slow log采集_slowlog get

slowlog get

价值意义

1.便于统计分析,以往只能一个节点一个节点看,采集到一起方便分析

2.慢日志持久化,类比MySQL,mysql是有日志文件的,而redis是内存队列FIFO

3.redis是单线程的slowlog的监控分析很有意义,严重会导致系统雪崩

设计概要

 

实现原理

slowlog get [len] 和 slowlog reset

slowlog get [len]命令获取指定长度的慢查询列表。

redis 127.0.0.1:6379> slowlog get 2

1) 1) (integer) 14

   2) (integer) 1309448221

   3) (integer) 15

   4) 1) "ping"

2) 1) (integer) 13

   2) (integer) 1309448128

   3) (integer) 30

   4) 1) "slowlog"

      2) "get"

      3) "100"

核心代码,关于采集的参数代码中有详细注释

  1. public void slowLogProducer(Cluster cluster) {
  2.     try {
  3.         List<Host> addressList = NetUtil.getHostByAddress(cluster.getAddress());
  4.         for (Host host : addressList) {
  5.             List<byte[]> slowList = JedisUtil.getClusterSlowLog(new ConnectionParam(host.getIp(), host.getPort(), cluster.getRedisPassword()));
  6.             List<ClusterBinarySlowLog> binarySlowLogs = ClusterBinarySlowLog.from(slowList);
  7.             Map<String, String> redisInfo = JedisUtil.getMapInfo(new ConnectionParam(host.getIp(), host.getPort(), cluster.getRedisPassword()));
  8.             for (ClusterBinarySlowLog log : binarySlowLogs) {
  9.                 List<String> args = log.getArgs();
  10.                 String type = args.get(0);
  11.                 List<String> commands = args.subList(1, args.size());
  12.                 String command = StringUtils.join(commands, " ");
  13.                 RedisBinarySlowLog redisBinarySlowLog = new RedisBinarySlowLog();
  14.                 redisBinarySlowLog.setRedisName(cluster.getClusterName());
  15.                 redisBinarySlowLog.setRedisRole(redisInfo.get("role"));
  16.                 redisBinarySlowLog.setRedisVersion(redisInfo.get("redis_version"));
  17.                 redisBinarySlowLog.setServerHost(host.getIp() + ":" + host.getPort());
  18.                 redisBinarySlowLog.setSlowDateTime(DateUtil.getFormatDateTime(log.getTimeStamp() * 1000));
  19.                 redisBinarySlowLog.setRunTime(log.getExecutionTime());
  20.                 redisBinarySlowLog.setType(type);
  21.                 redisBinarySlowLog.setCommand(command);
  22.                 redisBinarySlowLog.setClientHost(log.getClientAddress());
  23.                 redisBinarySlowLog.setClientName(log.getClientName());
  24.                 slowLogger.info(JSONObject.fromObject(redisBinarySlowLog).toString());
  25.             }
  26.         }
  27.     } catch (Exception e) {
  28.         logger.error("SlowLogSchedule SlowLogProducer get SlowLog from redis exception", e);
  29.     }
  30. }
  31. public class RedisBinarySlowLog {
  32. // redis集群的名称
  33. private String RedisName;
  34. // redis节点的角色
  35. private String RedisRole;
  36. // redis集群的版本
  37. private String RedisVersion;
  38. // redis的节点ip:port
  39. private String serverHost;
  40. // slowlog产生的时间
  41. private String slowDateTime;
  42. // 慢查询的执行时长
  43. private long runTime;
  44. // 慢查询命令的类型
  45. private String type;
  46. // 慢查询命令参数
  47. private String command;
  48. // 发起慢查询命令的客户端ip:port
  49. private String clientHost;
  50. // 发起慢查询命令的客户端name
  51. private String clientName;
  52. ……

 

 

通过logback.xml配置落盘,EFK采集展示

上面返回了两个慢查询命令,其中每行的含义如下:

  1. 第一行是一个慢查询id。该id是自增的,只有在 redis server 重启时该id才会重置。
  2. 第二行是慢查询命令执行的时间戳
  3. 第三行是慢查询命令执行耗时,单位为微秒
  4. 第四行是慢查询命令的具体内容。

在redis cluster中又会多出来两行参数,发起redis slow log的慢查询客户端

  1. 2020-09-04 10:36:14.518|{"redisVersion":"4.0.10","clientHost":"10.10.202.166:38822","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2020-03-23 22:37:39","runTime":26364,"type":"INFO","command":"","serverHost":"10.10.52.100:24692"}
  2. 2020-09-04 10:36:14.519|{"redisVersion":"4.0.10","clientHost":"10.10.202.166:58979","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2020-03-23 20:36:39","runTime":17576,"type":"INFO","command":"","serverHost":"10.10.52.100:24692"}
  3. 2020-09-04 10:36:14.519|{"redisVersion":"4.0.10","clientHost":"10.10.202.166:33727","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2020-03-23 10:19:39","runTime":20188,"type":"INFO","command":"","serverHost":"10.10.52.100:24692"}
  4. 2020-09-04 10:36:15.319|{"redisVersion":"4.0.10","clientHost":"192.168.64.110:57824","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2019-10-10 14:51:30","runTime":29046,"type":"bgsave","command":"","serverHost":"10.10.52.100:24696"}
  5. 2020-09-04 10:36:15.319|{"redisVersion":"4.0.10","clientHost":"192.168.64.110:53587","redisName":"xueqiu-push-user-sep","clientName":"","redisRole":"slave","slowDateTime":"2019-10-10 14:28:43","runTime":44932,"type":"bgsave","command":"","serverHost":"10.10.52.100:24696"}
  6. 2020-09-04 10:36:15.319|{"redisVersion":"4.0.10","clientHost":"192.168.64.51:58326","redisName":"xueqiu-push-user-sep","clientName":"lettuce#ClusterTopologyRefresh","redisRole":"slave","slowDateTime":"2019-08-29 18:08:05","runTime":20747,"type":"CLUSTER","command":"NODES","serverHost":"10.10.52.100:24696"}
  7. 2020-09-04 10:36:15.319|{"redisVersion":"4.0.10","clientHost":"192.168.64.31:59840","redisName":"xueqiu-push-user-sep","clientName":"lettuce#ClusterTopologyRefresh","redisRole":"slave","slowDateTime":"2019-08-27 14:09:32","runTime":22212,"type":"CLUSTER","command":"NODES","serverHost":"10.10.52.100:24696"}
  8. 2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2017-01-02 20:51:33","runTime":12224,"type":"bgsave","command":"","serverHost":"10.10.201.100:21201"}
  9. 2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 08:11:44","runTime":10121,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
  10. 2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"stock-quote","clientName":"","redisRole":"master","slowDateTime":"2020-05-14 23:47:44","runTime":11084,"type":"SETEX","command":"accer_5m:GE 1728000 0.04","serverHost":"10.10.27.110:18501"}
  11. 2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 07:37:09","runTime":23488,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
  12. 2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2017-01-02 18:41:31","runTime":23067,"type":"bgsave","command":"","serverHost":"10.10.201.100:21201"}
  13. 2020-09-10 15:02:15.194|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 05:59:46","runTime":18328,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
  14. 2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2017-01-02 18:27:11","runTime":71880,"type":"bgsave","command":"","serverHost":"10.10.201.100:21201"}
  15. 2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"stock-quote","clientName":"","redisRole":"master","slowDateTime":"2020-05-14 02:00:06","runTime":11504,"type":"SETEX","command":"accer_5m:KEY 1728000 -0.05","serverHost":"10.10.27.110:18501"}
  16. 2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 04:44:08","runTime":18322,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
  17. 2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2015-12-24 13:51:49","runTime":19850,"type":"INFO","command":"","serverHost":"10.10.201.100:21201"}
  18. 2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"stock-quote","clientName":"","redisRole":"master","slowDateTime":"2020-05-13 09:52:05","runTime":12074,"type":"ZREMRANGEBYSCORE","command":"price_statistics:SZ002888 20.2 20.2","serverHost":"10.10.27.110:18501"}
  19. 2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2015-12-23 20:59:28","runTime":10008,"type":"INFO","command":"","serverHost":"10.10.201.100:21201"}
  20. 2020-09-10 15:02:15.195|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-17 00:05:39","runTime":19104,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
  21. 2020-09-10 15:02:15.196|{"redisVersion":"2.8.14","clientHost":"","redisName":"snowx-push","clientName":"","redisRole":"master","slowDateTime":"2015-12-18 03:00:48","runTime":15257,"type":"INFO","command":"","serverHost":"10.10.201.100:21201"}
  22. 2020-09-10 15:02:15.196|{"redisVersion":"2.8.14","clientHost":"","redisName":"status","clientName":"","redisRole":"master","slowDateTime":"2019-10-16 23:52:09","runTime":20171,"type":"INFO","command":"","serverHost":"10.10.201.100:6402"}
  23. 2020-09-10 15:02:15.196|{"redisVersion":"2.8.14","clientHost":"","redisName":"stock-quote","clientName":"","redisRole":"master","slowDateTime":"2020-05-12 14:51:00","runTime":14154,"type":"ZREMRANGEBYSCORE","command":"capital_net_flows_day_l2:SZ002008 1.5892128E12 1.5892128E12","serverHost":"10.10.27.110:18501"}

slowlog reset命令是清空慢日志队列。

容量估算

按照最大的估算:
一个redis节点一天0.3MB,
那么线上sentinel节点356个master,按照一个master两个slave
线上cluster集群28个,按照几个集群主从共8个节点

那么就是0.3*(356*3+28*8)=387.6MB,这个还是按照每天每个节点都有慢查询估算了,实际1/10都没有,目标就是消灭慢查询

问题风险

slowlog  get和reset是非原子的,会不会漏?

slowlog多长时间采集一次合适?

相关知识

slowlog-log-slower-than: 慢查询阀值,单位微秒. 默认100000(10毫秒); 
生产环境设置1ms,因为Redis是single thread,如果命令都是1ms以上,则实例的吞吐量只有1000QPS.
slowlog-max-len: 慢查询存储的最大个数,我们默认1000;
生产设置设置大于1024,因为slowlog会省略过多的参数,慢查询不会占用过多的内存;
慢查询队列满后,淘汰最老的慢查询实体。

 

如MySQL/MongoDB等常见数据库,慢查询的query_time都会包含命令所有耗时,包含锁等待这类时间;

而Redis的慢查询query_time只记录自己“被cpu服务的时间”,不包含排队等待、IO等待(如AOF SYNC)这类时间。

理解这点非常重要定:请求的响应时间(R),服务时间(S), 排队延时(Q).     R = S + Q

# Clients
connected_clients:10
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

  1. The Redis Slow Log is a system to log queries that exceeded a
  2. specified execution time. The execution time does not include I/O
  3. operations like talking with the client, sending the reply and so forth,
  4. but just the time needed to actually execute the command (this is the only
  5. stage of command execution where the thread is blocked and can not serve
  6. other requests in the meantime).

所以RD和DBA在设计keyspace和访问模式时,应尽量避免使用耗时较大的命令。

根据经验在实际生产中,Redis单实例能处理4~5w的QPS, 如果大量的redis命令大量耗时大于1ms, 其实QPS只能达到1000基于几百。

Redis出现耗时大的命令,导致其他所有请求被阻塞等待,redis处理能力急剧退化,易导致整个服务链雪崩。

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/514210
推荐阅读
相关标签
  

闽ICP备14008679号