当前位置:   article > 正文

图文并茂地带你了解kafka分区Rebalance机制_kafka 分区 rebalance

kafka 分区 rebalance

目录

目标

Consumer API版本

技术支持

分区重平衡及相关概念

什么时候可能发生分区重平衡

分区重平衡的意义

分区重平衡策略

range策略(按范围顺序分配)

round-robin策略(按轮询方式分配)

sticky策略(粘性策略)


目标

        了解kafka分区重平衡的概念,了解分区重新分配的三种策略。


Consumer API版本

  1. <dependency>
  2. <groupId>org.apache.kafka</groupId>
  3. <artifactId>kafka-clients</artifactId>
  4. <version>3.2.0</version>
  5. </dependency>

技术支持

        本文相关概念参考了kafka官网以及官网推荐的书籍Kafka The Defi nitive Guide(kafka定义指南)。


分区重平衡及相关概念

所有权

        消费者和分区之间的映射通常称为消费者对分区的所有权。

分区重平衡

        将分区所有权从一个消费者移动到另一个消费者被称为重新平衡。

group coordinator

        每个消费者组都有一个组协调员(group coordinator)。

group leader
        第一个加入消费者组的消费者成为这个消费者组的组长(group leader)
        group leader从group coordinator获取组内消费者列表信息,并负责为每个消费者分配分区子集。它使用PartitionAssignor的实现来决定哪些分区应由哪个消费者处理。


什么时候可能发生分区重平衡

  • 向消费者组中添加新的消费者;
  • 当消费者宕机或者性能不足时,因为kafka服务器和消费者之间的心跳机制,消费者组会剔除出该消费者;
  • 主题扩容了分区;
  • 消费者组订阅了其他主题。

分区重平衡的意义

        重新平衡很重要,因为它们为消费者组提供了高可用性和可扩展性(允许我们轻松安全地添加和删除消费者),但在正常情况下,它们是相当不受欢迎的。在重新平衡期间,消费者不能使用消息,这使消费者组处于一个不可用的短期状态。此外,当分区从一个消费者移动到另一个消费者时,消费者将丢失其当前状态;如果它正在缓存任何数据,则需要刷新其缓存,这会减慢应用程序的速度,直到消费者再次设置其状态。


分区重平衡策略

官网文档

partition.assignment.strategy

A list of class names or class types, ordered by preference, of supported partition assignment strategies that the client will use to distribute partition ownership amongst consumer instances when group management is used. Available options are:

  • org.apache.kafka.clients.consumer.RangeAssignor: Assigns partitions on a per-topic basis.
  • org.apache.kafka.clients.consumer.RoundRobinAssignor: Assigns partitions to consumers in a round-robin fashion.
  • org.apache.kafka.clients.consumer.StickyAssignor: Guarantees an assignment that is maximally balanced while preserving as many existing partition assignments as possible.
  • org.apache.kafka.clients.consumer.CooperativeStickyAssignor: Follows the same StickyAssignor logic, but allows for cooperative rebalancing.

The default assignor is [RangeAssignor, CooperativeStickyAssignor], which will use the RangeAssignor by default, but allows upgrading to the CooperativeStickyAssignor with just a single rolling bounce that removes the RangeAssignor from the list.

Implementing the org.apache.kafka.clients.consumer.ConsumerPartitionAssignor interface allows you to plug in a custom assignment strategy.

Type:list
Default:class org.apache.kafka.clients.consumer.RangeAssignor,class org.apache.kafka.clients.consumer.CooperativeStickyAssignor
Valid Values:non-null string
Importance:medium

根据官网描述,分区重平衡共有三种策略,分别是:

  • range
  • round-robin
  • sticky

range策略(按范围顺序分配)

官方文档对该策略的定义

Class RangeAssignorhttps://kafka.apache.org/32/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html

截取部分描述

The range assignor works on a per-topic basis. For each topic, we lay out the available partitions in numeric order and the consumers in lexicographic order. We then divide the number of partitions by the total number of consumers to determine the number of partitions to assign to each consumer. If it does not evenly divide, then the first few consumers will have one extra partition.

翻译过来的意思

        该策略按照顺序分配分区给消费者,具体的做法是:分区数量➗消费者总数=每个消费者分配的分区数量,如果分配不均衡,则优先给前面的消费者多分配一个分区。概念图如下:


round-robin策略(按轮询方式分配)

官方文档对该策略的定义

Class RoundRobinAssignorhttps://kafka.apache.org/32/javadoc/org/apache/kafka/clients/consumer/RoundRobinAssignor.html

截取部分描述

The round robin assignor lays out all the available partitions and all the available consumers. It then proceeds to do a round robin assignment from partition to consumer. If the subscriptions of all consumer instances are identical, then the partitions will be uniformly distributed. (i.e., the partition ownership counts will be within a delta of exactly one across all consumers.)

For example, suppose there are two consumers C0 and C1, two topics t0 and t1, and each topic has 3 partitions, resulting in partitions t0p0t0p1t0p2t1p0t1p1, and t1p2.

The assignment will be:

  • C0: [t0p0, t0p2, t1p1]
  • C1: [t0p1, t1p0, t1p2]

When subscriptions differ across consumer instances, the assignment process still considers each consumer instance in round robin fashion but skips over an instance if it is not subscribed to the topic. Unlike the case when subscriptions are identical, this can result in imbalanced assignments. For example, we have three consumers C0C1C2, and three topics t0t1t2, with 1, 2, and 3 partitions, respectively. Therefore, the partitions are t0p0t1p0t1p1t2p0t2p1t2p2C0 is subscribed to t0C1 is subscribed to t0t1; and C2 is subscribed to t0t1t2.

That assignment will be:

  • C0: [t0p0]
  • C1: [t1p0]
  • C2: [t1p1, t2p0, t2p1, t2p2]
Since the introduction of static membership, we could leverage  group.instance.id to make the assignment behavior more sticky. For example, we have three consumers with assigned  member.id  C0C1C2, two topics  t0 and  t1, and each topic has 3 partitions, resulting in partitions  t0p0t0p1t0p2t1p0t1p1, and  t1p2. We choose to honor the sorted order based on ephemeral  member.id.

The assignment will be:

  • C0: [t0p0, t1p0]
  • C1: [t0p1, t1p1]
  • C2: [t0p2, t1p2]
After one rolling bounce, group coordinator will attempt to assign new  member.id towards consumers, for example  C0 ->  C5  C1 ->  C3C2 ->  C4.

The assignment could be completely shuffled to:

  • C3 (was C1): [t0p0, t1p0] (before was [t0p1, t1p1])
  • C4 (was C2): [t0p1, t1p1] (before was [t0p2, t1p2])
  • C5 (was C0): [t0p2, t1p2] (before was [t0p0, t1p0])
This issue could be mitigated by the introduction of static membership. Consumers will have individual instance ids  I1I2I3. As long as 1. Number of members remain the same across generation 2. Static members' identities persist across generation 3. Subscription pattern doesn't change for any member

The assignment will always be:

  • I0: [t0p0, t1p0]
  • I1: [t0p1, t1p1]
  • I2: [t0p2, t1p2]

翻译过来的意思

        分区按照循环的方式分配给消费者,如果所有消费者订阅的主题相同,则分区将均匀分布。概念图如下:

         如果消费者订阅的主题不同则可能导致分配不平衡。假如:

  • 有三个消费者={c0,c1,c2};
  • 有三个主题={t0,t1,t2};
  • 每个主题的分区数量t0={p0},t1={p0,p1},t2={p0,p1,p2};
  • 消费者订阅的主题:c0={t0},c1={t0,t1},c2={t0,t1,t2}。

        每个消费者则分区分配如下:


sticky策略(粘性策略)

官网文档对该策略的定义

Class StickyAssignorhttps://kafka.apache.org/32/javadoc/org/apache/kafka/clients/consumer/StickyAssignor.html详解

        该策略初始分配分区给消费者和round-robin相似,但是一旦重平衡,则round-robin策略会重新初始化分配,而sticky则根据两项原则分配,原则一优于原则二:

  • 原则一:使分区尽可能平均分配给消费者;
  • 原则二:主题分区尽可能地保留在其先前分配的消费者中。

从原则二来看,这或许就是该策略命名为sticky(粘性的)的原因。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/125339
推荐阅读
相关标签
  

闽ICP备14008679号