Show Menu
Cheatography

Kafka Certification (CCDAK) Cheat Sheet (DRAFT) by

Notes for the fast revision

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Base Concepts

A cluster consists of brokers=nodes that stores topics; each topic consists of 1+ partitions
Each topic has a replic­ation factor that defines how many copies of each partions are stored across different brokers
Producers send data to the leader partition, other folower partitions are stored for fault tolerance
Consumers usually read from the leader partition, but could be configured to read from replicas
Smart clients in producers and consumers; clients pull cluster meta data from any broker and know who is the leader and where they need to send or fetch data.

Archit­ecture

Kafka cluster consists of brokers. Everything else are outside of a cluster.
Smart clients in producers and consumers; clients pull cluster meta data from a broker and decide where they need to send or request data. via Consumer and Producer APIs
Kafka Connect for export­/import data to/from external systems like databases
Kafka Streams provides client based streaming framework, that involves RockDB client library + additional topics, made by streams framework, for statefull streams are stores

Producer Config­uration

 

Broker Config­uration

 

Schema Evolution

 

Kafka Tools

 

Kafka Availa­bility

 

Kafka Throughput Config­uration

 
 

Internals

A cluster consists of a control plane and data plane. Both these planes are processes that run on broker nodes;
Control Plane respon­sible for cluster metadata: new KRaft consensus protocol, old Zookeeper. One controller is active (an elected leader) - does not manipulate partit­ions, but only cluster meta-data
Data plane respon­sible for client request processing and data=p­art­ition replic­ation.
Data Replic­ation: ISR - In-Sync Replicas is a list of "­broker ids" that are fully in-sync with a leader portition. So further, only brokers from ISR can be partic­ipated in leader election.
Data Replic­ation: partition leader doesn't send data to followers, yet followers replicate from the leader.
Data Replic­ation: a broker will be removed from ISR when it is lagging behind the leader for longer than replic­a.l­ag.t­im­e.m­ax.ms
Data Replic­ation: Consumers could fetch records only by a HW (high watermark offset). Follower brokers get HW with fetch response from a leader.
Partition Leader Balancing: as a leader does more work than followers, kafka tries evenly distribute partition leaders across all brokers.
Partition Leader Balancing: auto.l­ead­er.r­eb­ala­nce.enable enables Kafka automa­tically rebalance partition leaders to their preferred replicas.
Partition Leader Balancing: If manual rebalance is desired, the kafka-­pre­fer­red­-re­pli­ca-­ele­cti­on.sh. It explicitly triggers a preferred replica election for specified partitions or the entire cluster, forcing leaders back to their preferred replicas.
Partition Distri­bution: kafka-­rea­ssi­gn-­par­tit­ions.sh redist­ribute partitions across brokers, which inherently influences leader distri­bution, especially when adding new brokers or addressing specific load imbala­nces.

Consumer Config­uration

 

Security

 

Kafka Streams

 

Kafka Connect

 

Kafka Low Latency Config­uration

 

Kafka Low Latency Config­uration