- Overview
- Kafka Architecture
- Topics, Partitions, and Offsets
- Producers
- Consumers
- Replication & Fault Tolerance
- Commit Log, Segments & Retention
- Kafka Connect
- Kafka Streams
- Kafka CLI Commands
- Spring Boot Kafka Example
- Final Summary
- Kafka Interview Pro Tips
Apache Kafka is a distributed system for sending, storing, and reading messages in real time. It is widely used to build fast, reliable, and scalable data pipelines.
✅ Key Points:
- High throughput
- Fault-tolerant
- Scalable
- Real-time processing
- Kafka Broker – Individual server storing topic partitions and handling producer/consumer requests.
- Controller Broker – One broker manages partition leadership among brokers.
- ZooKeeper (older versions) – Manages cluster metadata, leader election, and broker health.
- KRaft Mode (newer versions) – Raft-based internal quorum replaces ZooKeeper.
Key Notes:
- Each broker has a unique broker ID.
- Clusters usually have multiple brokers → enables replication and fault tolerance.
- One broker acts as Controller.
- Metadata coordination: ZooKeeper (old) or KRaft (new).
Kafka Cluster
|
----------------------------------------
| | |
Kafka Broker 1 Kafka Broker 2 Kafka Broker 3
(Controller)
| | |
------------- ------------- -------------
| Partition 1 |(L)| Partition 1 |(F) | Partition 1 |(F)
| Partition 2 |(F)| Partition 2 |(L) | Partition 2 |(F)
-------------
Legend: (L) → Leader, (F) → Follower
Message Flow:
-
Producers → send to Leader Partition
-
Consumers → read from Leader Partition
-
Followers replicate leader for fault tolerance
-
Topic – Logical channel storing related messages.
-
Partition – Ordered log of messages, each identified by a unique offset.
-
Replication – Partitions are replicated across brokers to ensure fault tolerance.
Old Kafka: uses ZooKeeper for metadata/leader election New Kafka: uses KRaft for internal metadata management
-
Built-in topic storing last committed offsets for each partition.
-
Enables fault tolerance and replayability.
Example:
__consumer_offsets (50 partitions)
│
├── Partition 12 → Group1
│ ├─ (Group1, TopicA, P0) → offset 150
│ ├─ (Group1, TopicA, P1) → offset 245
│ └─ (Group1, TopicA, P2) → offset 98
│
├── Partition 27 → Group2
├─ (Group2, TopicA, P0) → offset 120
├─ (Group2, TopicA, P1) → offset 250
└─ (Group2, TopicA, P2) → offset 89
-
Sends messages to topic partitions (Leader).
-
Leader replicates to followers for fault tolerance.
-
Producer is notified when write succeeds.
Without Key (Round-Robin)
Message1 -> P0
Message2 -> P1
Message3 -> P2
Message4 -> P0
-
✅ Good load balancing
-
⚠️ Order not guaranteed across partitions
With Key (Hash-based)
Copy code
Key = customer123
Hash(Key) -> Partition 2
All messages with same key -> Partition 2
- ✅ Guarantees order for messages with same key
-
Subscribe to topics and read sequentially by offset.
-
Offsets track progress → supports fault tolerance.
Example:
TopicA: P0, P1, P2, P3
Group1: C1-1, C1-2
Group2: C2-1, C2-2
Assignments:
Group1:
C1-1 -> P0, P1
C1-2 -> P2, P3
Group2:
C2-1 -> P0, P1
C2-2 -> P2, P3
-
✅ Each group is independent
-
✅ Partition assignment occurs within each group
Copy code
[Consumer1 reads 151–160]
151 152 153 154 155 156 157 158 159 160
↑ ↑
Start reading CRASH before commit
State:
Last committed = 150
Processed 151–155 (not committed)
Rebalance → Partition assigned to Consumer2
Consumer2 resumes from 151 → duplicates possible
After commit → offset = 160
-
✅ At-least-once guarantee
-
⚠️ Avoid duplicates: commit after processing -
✅ Exactly-once: use producer+consumer transactions
Consumer → JoinGroupRequest → Coordinator (Group Broker)
Coordinator elects group leader → collects members
Leader → assigns partitions → SyncGroupRequest → Coordinator
Coordinator → forwards assignment to all consumers
Consumers start → send Heartbeats to coordinator
If heartbeats stop → trigger rebalance
-
Number of copies per partition across brokers.
-
Minimum: 1, typical: 3
-
Ensures durability and high availability
-
ISR = Replicas fully caught up with leader
-
Only ISR members can become leader during failure
Example:
Leader = B1
Followers = B2, B3
ISR = {B1, B2, B3}
If B2 lags → ISR = {B1, B3}
New leader selected from ISR → ensures no data loss
Related Settings:
-
min.insync.replicas → minimum ISR for write success
-
unclean.leader.election=false → prevents stale replica as leader
-
Each partition = commit log, immutable
-
Segments → smaller log files for manageability (.log + .index)
-
Retention Policy
-
Time-based → retention.ms
-
Size-based → retention.bytes
-
Log compaction → keeps latest value per key
-
-
Moves data between Kafka and external systems
-
Source Connector → external → Kafka
-
Sink Connector → Kafka → external
-
Java library for real-time stream processing
-
Supports filter, join, aggregate directly on Kafka topics
1. Start Zookeeper (if using old versions < Kafka 2.8)
zookeeper-server-start.bat ..\..\config\zookeeper.properties
2. Start Kafka Broker
kafka-server-start.bat ..\..\config\server.properties
3. List all Topics
kafka-topics.bat --list --bootstrap-server localhost:9092
4. Create Topics with 3 partiton and 1 replication factor
kafka-topics.bat --create --topic my-topic --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3
5. Describe a Topic -Shows details (leader, partitions, replication)
kafka-topics.bat --describe --topic test-topic --bootstrap-server localhost:9092
6. Produce Messages to a Topic - Opens a console where you can type messages, and they go to Kafka
kafka-console-producer.bat --broker-list localhost:9092 --topic my-topic
7. Consume Messages from a Topic - Reads all messages from the beginning of the topic
kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic my-topic --from-beginning
8. List Consumer Groups (Shows all consumer groups)
kafka-consumer-groups.bat --list --bootstrap-server localhost:9092
9. Describe a Consumer Group - Shows lag, partitions, offsets for the consumer group
kafka-consumer-groups.bat --describe --group my-group --bootstrap-server localhost:9092
10. Delete a Topic - Deletes a topic (if delete.topic.enable=true in config)
kafka-topics.bat --delete --topic test-topic --bootstrap-server localhost:9092
11. Check Kafka Broker Logs - Debugging broker issues
tail -f logs/server.log
12. Alter Topic (e.g., partitions) - Increases partitions of an existing topic.
kafka-topics.bat --alter --topic test-topic --partitions 5 --bootstrap-server localhost:9092
- Maven Dependency
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
- application.yml
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
group-id: demo-group
auto-offset-reset: earliest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
- Producer Service
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;
@Service
public class ProducerService {
private final KafkaTemplate<String, String> kafkaTemplate;
private static final String TOPIC = "demo-topic";
public ProducerService(KafkaTemplate<String, String> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
public void sendMessage(String message) {
kafkaTemplate.send(TOPIC, message);
System.out.println("Produced message: " + message);
}
}
- Consumer Listener
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;
@Service
public class ConsumerService {
@KafkaListener(topics = "demo-topic", groupId = "demo-group")
public void consumeMessage(String message) {
System.out.println("Consumed message: " + message);
}
}
- Controller to Trigger Producer
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class KafkaController {
private final ProducerService producerService;
public KafkaController(ProducerService producerService) {
this.producerService = producerService;
}
@GetMapping("/publish")
public String publishMessage(@RequestParam String message) {
producerService.sendMessage(message);
return "Message sent: " + message;
}
}
✅ Usage
1- Start Kafka (Zookeeper or KRaft).
2- Run Spring Boot app.
3- Publish message:
http://localhost:8080/publish?message=hello-kafka
Consumed message: hello-kafka
-
Kafka is a distributed, fault-tolerant messaging system.
-
Topics are split into partitions; partitions are replicated for durability.
-
Producers push data → Consumers pull data → Commit offsets track progress.
-
Consumer Groups enable scaling and load balancing.
-
ISR ensures only up-to-date replicas serve as leaders.
-
Kafka Connect and Streams provide integration & real-time processing.
-
CLI + Spring Boot examples demonstrate practical usage.
Core Concepts
- Topics, partitions, offsets
- Replication & ISR (In-Sync Replicas)
- Consumer groups & offset management
- Retention policies (
log.retention.hours,log.retention.bytes) - Zookeeper vs KRaft mode
Producer & Consumer Configs
- Producer:
acks,batch.size,linger.ms - Consumer:
auto.offset.reset,enable.auto.commit - Understand how misconfiguration can cause duplicates or message loss
CLI Knowledge
- List, create, describe, delete, alter topics
- Produce & consume messages from console
- Consumer group management (
--list,--describe,--reset-offsets) - Check consumer lag & offsets
Real-World Scenarios
- Message replay & offset resets
- High-throughput topics, batching & compression
- Failures: leader down, out-of-sync replicas, broker failures
- Exactly-once semantics: idempotent & transactional producers
Performance & Scaling
- Partitioning strategy affects throughput & ordering
- Producer batching & compression reduce network overhead
- Consumer parallelism depends on partition count
- Adding partitions increases throughput but limits max consumers per group
Troubleshooting
- Consumer not reading: check topic, partitions, offsets, lag
- Messages lost: check retention, acks, replication factor
- Broker down: check leader election & ISR
- Step-by-step debugging demonstrates hands-on experience
Bonus Topics
- Kafka Streams basics
- Kafka Connect for ETL
- Schema Registry & Avro for data compatibility
- Monitoring metrics (
BytesInPerSec,UnderReplicatedPartitions)
Pro Tips
- Draw diagrams of topics, partitions, offsets, consumer groups
- Use analogies: topic = channel, partition = file, offset = line, consumer group = team