Rebalance Big Data Service Kafka clusters to define the number of copies of the topic across the cluster.
In a Kafka cluster, brokers ensure high availability to process new events. Kafka, being fault-tolerant, replicas of the messages are maintained on each broker and are made available in case of failures. With the help of the replication factor, you can define the number of copies of the topic across the cluster.
Add new brokers or disks to an existing Kafka broker by assigning a unique broker ID, listeners, and a log directory from Ambari configurations for Kafka. However, these brokers/disks aren't assigned any data partitions of the existing topics in the cluster. Unless you move the partitions or create new topics, brokers wonβt be doing much work. To overcome this problem, the kafka-reassign-partitions tool can be used.
Creating the Topics-to-Move JSON File
Create a topics-to-move JSON file to specify the topics to be reassigned.
topics-to-move tells the kafka-reassign-partitions tool which partitions to look at when generating a proposal for the reassignment configuration. You must create the topics-to-move JSON file from scratch. The format of the file is the following:
This JSON file is a configuration file that contains the parameters used in the reassignment process. You create this file, however, a proposal for its contents is generated by the tool. When the kafka-reasssign-partitions tool is executed with the --generate option, it generates a proposed configuration that can be fine-tuned and saved as a JSON file. Creating the file this way is the reassignment configuration JSON. To generate a proposal, the tool requires a topics-to-move file as input. The format of the file is the following:
The reassignment configuration contains multiple properties.
Properties
Description
topic
Specifies the topic.
partition
Specifies the partition.
replicas
Specifies the brokers that the selected partition is assigned to. The brokers are listed in order, which means that the first broker in the list is always the leader for that partition. Change the order of brokers to resolve any leader-balancing issues among brokers. Change the broker IDs to reassign partitions to different brokers.
log_dirs
Specifies the log directory of the brokers. The log directories are listed in the same order as the brokers. By default any is specified as the log directory, which means that the broker is free to choose where it places the replica. By default, the current broker implementation selects the log directory using a round-robin algorithm. An absolute path beginning with a / can be used to explicitly set where to store the partition replica.
Running the Reassign Partitions with the kafka-reassign-partitions-tool π
For a Kafka cluster with large data, use this tool carefully. To move many partitions, we recommend you to run the tool in batches of three or four partitions at a time.
Ensure the Brokers are healthy before running this tool.
This tool can't be used to make an out-of-sync replica into the leader partition.
Redistribute the load when the system is at 70% capacity.
SSH to one of the broker nodes in Big Data Service cluster. The kafka-reassign-partitions-tool is located in /usr/odh/current/kafka-broker/bin.
Create a topics-to-move JSON file that specifies the topics you want to reassign. Use the following format:
In this example, the tool proposed a configuration that reassigns existing partitions on brokers 1, 2, and 3 to brokers 4 and 5.
Copy and paste the proposed partition reassignment configuration into an empty JSON file.
Review, and if required, modify the suggested reassignment configuration. Save the file.
Start the redistribution process with the following command:
Copy
kafka-reassign-partitions --reassignment-json-file <path to reassignment configuration.json> --bootstrap-server <bootstrap servers> --execute
To verify the partition movement, run
kafka-reassign-partitions --reassignment-json-file <path to reassignment configuration.json> --bootstrap-server <bootstrap servers> --verify
The tool prints the reassignment status of all partitions.
Status of partition reassignment:
Reassignment of partition topic2-1 completed successfully
Reassignment of partition topic1-0 completed successfully
Reassignment of partition topic2-0 completed successfully
Reassignment of partition topic1-2 completed successfully
Reassignment of partition topic1-1 completed successfully