赞
踩
目录
二. 添加和删除 Topic(Adding and removing topics)
本节将回顾您在 Kafka 集群上执行的常见 Topic 的操作。本节中介绍的所有工具都可以在 Kafka 发行版的 bin/ 目录下使用,如果在没有参数的情况下运行,每个工具都会打印所有可能的命令行选项的详细信息。
原文引用:You have the option of either adding topics manually or having them be created automatically when data is first published to a non-existent topic. If topics are auto-created then you may want to tune the default topic configurations used for auto-created topics.
Topics are added and modified using the topic tool:
您可以选择手动添加 Topic,也可以在数据首次发布到不存在的主题时自动创建 Topic。如果 Topic 是自动创建的,则可能需要调整用于自动创建的Topic 的默认 Topic 配置。
使用 Topic 工具添加和修改 Topic:
- > bin/kafka-topics.sh --bootstrap-server broker_host:port --create --topic my_topic_name \
- --partitions 20 --replication-factor 3 --config x=y
原文引用:The replication factor controls how many servers will replicate each message that is written. If you have a replication factor of 3 then up to 2 servers can fail before you will lose access to your data. We recommend you use a replication factor of 2 or 3 so that you can transparently bounce machines without interrupting data consumption.
副本控制每条消息在服务器中的备份。如果您的复制因子为3,那么最多允许有2个节点宕掉而不丢失数据。我们建议您设置复制因子为2或3,这样您就可以透明地反弹机器,而不会中断数据消费。
原文引用:The partition count controls how many logs the topic will be sharded into. There are several impacts of the partition count. First each partition must fit entirely on a single server. So if you have 20 partitions the full data set (and read and write load) will be handled by no more than 20 servers (not counting replicas). Finally the partition count impacts the maximum parallelism of your consumers. This is discussed in greater detail in the concepts section.
分区数控制 Topic 将被分片成多少 log。关于分区数的影响,首先每个分区必须完整的存储在单个的服务器上。因此,如果你有20个分区的话,那么完整的数据集(读和写的负载)将不超过20个服务器(不包括副本)。最后,分区数会影响消费者的最大并发。这个将在概念部分进行更详细的讨论。
原文引用:Each sharded partition log is placed into its own folder under the Kafka log directory. The name of such folders consists of the topic name, appended by a dash (-) and the partition id. Since a typical folder name can not be over 255 characters long, there will be a limitation on the length of topic names. We assume the number of partitions will not ever be above 100,000. Therefore, topic names cannot be longer than 249 characters. This leaves just enough room in the folder name for a dash and a potentially 5 digit long partition id.
每个分片的分区日志都放在 Kafka 日志目录下自己的文件夹中。此类文件夹的名称由 Topic 名称和分区 ID 组成,Topic 名称后面加一个短划线(-)。由于文件夹名称长度不能超过255个字符,因此 Topic 名称的长度会受到限制。我们假设分区的数量永远不会超过100000。因此,主题名称的长度不能超过249个字符。这在文件夹名称中只留下足够的空间来放置短划线和可能为5位数长的分区 ID。
原文引用:The configurations added on the command line override the default settings the server has for things like the length of time data should be retained. The complete set of per-topic configurations is documented here.
在命令行上添加的配置会覆盖服务器在数据应保留的时间长度等方面的默认设置。这里记录了Topic 的完整配置。
原文引用:You can change the configuration or partitioning of a topic using the same topic tool.
To add partitions you can do
您可以使用相同的 Topic 工具更改 Topic 的配置或分区。
要添加分区,可以执行以下操作。
Kafka版本 >= 2.2:
- > bin/kafka-topics.sh --bootstrap-server broker_host:port --alter --topic my_topic_name \
- --partitions 40
Kafka版本 < 2.2:
- > bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name
- --partitions 20 --replication-factor 3 --config x=y
原文引用:Be aware that one use case for partitions is to semantically partition data, and adding partitions doesn't change the partitioning of existing data so this may disturb consumers if they rely on that partition. That is if data is partitioned by hash(key) % number_of_partitions then this partitioning will potentially be shuffled by adding partitions but Kafka will not attempt to automatically redistribute data in any way.
To add configs:
请注意,分区的一个用例是对数据进行语义分区,添加分区不会改变现有数据的分区,因此如果消费者依赖该分区,这可能会干扰他们。也就是说,如果数据是通过 hash(key)%number_of_dartitions 进行分区的,那么这个分区可能会通过添加分区而被打乱,但 Kafka 不会试图以任何方式自动重新分配数据。
添加配置:
> bin/kafka-configs.sh --bootstrap-server broker_host:port --entity-type topics --entity-name my_topic_name --alter --add-config x=y
移除配置(To remove a config):
> bin/kafka-configs.sh --bootstrap-server broker_host:port --entity-type topics --entity-name my_topic_name --alter --delete-config x
最后删除 Topic(And finally deleting a topic):
> bin/kafka-topics.sh --bootstrap-server broker_host:port --delete --topic my_topic_name
原文引用:Kafka does not currently support reducing the number of partitions for a topic.
Instructions for changing the replication factor of a topic can be found here.
Kafka 目前不支持减少 Topic 的分区数量。
有关更改 Topic 的复制因子的说明可以在此处找到。
原文引用:The Kafka cluster will automatically detect any broker shutdown or failure and elect new leaders for the partitions on that machine. This will occur whether a server fails or it is brought down intentionally for maintenance or configuration changes. For the latter cases Kafka supports a more graceful mechanism for stopping a server than just killing it. When a server is stopped gracefully it has two optimizations it will take advantage of:
- It will sync all its logs to disk to avoid needing to do any log recovery when it restarts (i.e. validating the checksum for all messages in the tail of the log). Log recovery takes time so this speeds up intentional restarts.
- It will migrate any partitions the server is the leader for to other replicas prior to shutting down. This will make the leadership transfer faster and minimize the time each partition is unavailable to a few milliseconds.
Kafka 集群会自动检测任意一个 Broker 关闭或故障,并为该机器上的分区选举新的 Leader。无论服务器发生故障,还是为了维护或配置更改而故意关闭,都会发生这种情况。对于后一种情况,Kafka 支持一种更优雅的机制来停止服务器,而不仅仅是杀死它。当服务器优雅地停止时,它有两个优化:
原文引用:Syncing the logs will happen automatically whenever the server is stopped other than by a hard kill, but the controlled leadership migration requires using a special setting:
当发生服务器停止而不是通过强制 kill,都会自动同步日志,但受控的 Leader 迁移需要使用特殊设置:
controlled.shutdown.enable=true
原文引用:Note that controlled shutdown will only succeed if all the partitions hosted on the broker have replicas (i.e. the replication factor is greater than 1 and at least one of these replicas is alive). This is generally what you want since shutting down the last replica would make that topic partition unavailable.
请注意,只有 Broker 上托管的所有分区都有副本(即,复制因子大于1,并且其中至少有一个副本处于活动状态),受控关闭才会成功。这通常是您想要的,因为关闭最后一个副本会使 Topic 分区不可用。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。