ShaharAmir
← Back to Blog
Kafka4 min read

Kafka Streams: Real-Time Data Processing Made Simple

Learn how Kafka Streams transforms complex event streaming into declarative, maintainable code. From basic concepts to practical examples.

S
Shahar Amir

Apache Kafka has become the backbone of modern data infrastructure. But while Kafka handles the heavy lifting of storing and transmitting events, Kafka Streams is what makes processing those events actually enjoyable.

What is Kafka Streams?

Kafka Streams is a client library for building real-time streaming applications. Unlike Spark Streaming or Flink, it doesn't require a separate cluster—your application runs as a regular Java process.

> Key insight: Kafka Streams is an abstraction over producers and consumers that lets you focus on what you want to do, not how to do it.

The Problem It Solves

Imagine you're processing sensor data from a production line. Each sensor sends readings like this:

json
12345678
{
"reading_ts": "2026-02-05T12:19:27Z",
"sensor_id": "aa-101",
"production_line": "w01",
"widget_type": "acme94",
"temp_celsius": 23,
"widget_weight_g": 100
}

You need to filter readings where temperature exceeds a threshold. With vanilla Kafka, you'd write something like this:

java
12345678910111213141516171819202122232425
public static void main(String[] args) {
try(Consumer<String, Reading> consumer =
new KafkaConsumer<>(consumerProps());
Producer<String, Reading> producer =
new KafkaProducer<>(producerProps())) {
consumer.subscribe(List.of("sensor-readings"));
while (true) {
ConsumerRecords<String, Reading> records =
consumer.poll(Duration.ofSeconds(5));
for (ConsumerRecord<String, Reading> record : records) {
Reading reading = record.value();
if (reading.getTempCelsius() > 30) {
ProducerRecord<String, Reading> alert =
new ProducerRecord<>("temp-alerts",
record.key(), reading);
producer.send(alert);
}
}
}
}
}

That's a lot of boilerplate. Now here's the same logic with Kafka Streams:

java
1234567
final StreamsBuilder builder = new StreamsBuilder();
builder.stream("sensor-readings",
Consumed.with(Serdes.String(), readingSerde))
.filter((key, reading) -> reading.getTempCelsius() > 30)
.to("temp-alerts",
Produced.with(Serdes.String(), readingSerde));

Three lines vs thirty. That's the power of declarative stream processing.

Core Concepts

1. KStream vs KTable

Kafka Streams has two main abstractions:

  • KStream: An unbounded stream of events (click events, logs, sensor data)
  • KTable: A changelog stream showing the latest value per key (user profiles, inventory counts)

java
1234567
// KStream - every event matters
KStream<String, Click> clicks =
builder.stream("user-clicks");
// KTable - only latest state per key
KTable<String, UserProfile> profiles =
builder.table("user-profiles");

2. Stateless Operations

These don't need to remember previous events:

java
1234567
stream
.filter((k, v) -> v.isValid())
.filterNot((k, v) -> v.isSpam())
.map((k, v) -> KeyValue.pair(
v.getUserId(),
v.toEvent()))
.flatMap((k, v) -> v.getItems())

3. Stateful Operations

These maintain state across events:

java
1234567891011
// Aggregation - count events per key
KTable<String, Long> clickCounts = stream
.groupByKey()
.count();
// Windowed aggregation - count per hour
KTable<Windowed<String>, Long> hourlyClicks = stream
.groupByKey()
.windowedBy(TimeWindows.ofSizeWithNoGrace(
Duration.ofHours(1)))
.count();

4. Joins

Combine streams and tables:

java
123
KStream<String, EnrichedClick> enriched = clicks
.join(profiles,
(click, profile) -> new EnrichedClick(click, profile));

Setting Up Kafka Streams

Dependencies (Maven)

xml
12345
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>3.6.0</version>
</dependency>

Basic Application Structure

java
123456789101112131415161718192021
public class StreamProcessor {
public static void main(String[] args) {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG,
"my-stream-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");
StreamsBuilder builder = new StreamsBuilder();
builder.stream("input-topic")
.filter((k, v) -> v != null)
.to("output-topic");
KafkaStreams streams =
new KafkaStreams(builder.build(), props);
streams.start();
Runtime.getRuntime().addShutdownHook(
new Thread(streams::close));
}
}

Real-World Example: Fraud Detection

java
1234567891011121314151617181920212223
StreamsBuilder builder = new StreamsBuilder();
KStream<String, Transaction> transactions =
builder.stream("transactions");
// High value alerts
transactions
.filter((userId, tx) -> tx.getAmount() > 10000)
.to("high-value-alerts");
// Velocity check - more than 5 tx per minute
transactions
.groupByKey()
.windowedBy(TimeWindows.ofSizeWithNoGrace(
Duration.ofMinutes(1)))
.count()
.toStream()
.filter((windowedKey, count) -> count > 5)
.map((windowedKey, count) ->
KeyValue.pair(
windowedKey.key(),
"Velocity alert: " + count + " tx/min"))
.to("velocity-alerts");

For Node.js Developers

Kafka Streams is Java-only, but you have options:

  • kafkajs - Full Kafka client for Node.js
  • ksqlDB - SQL interface for stream processing
  • Kafka Connect - Handle transformations in connectors

typescript
123456789101112131415
import { Kafka } from 'kafkajs';
const kafka = new Kafka({ brokers: ['localhost:9092'] });
const consumer = kafka.consumer({ groupId: 'my-group' });
await consumer.subscribe({ topic: 'sensor-readings' });
await consumer.run({
eachMessage: async ({ message }) => {
const reading = JSON.parse(message.value.toString());
if (reading.temp_celsius > 30) {
console.log('Alert!', reading);
}
},
});

When to Use Kafka Streams

Good fit:

  • Real-time ETL and data enrichment
  • Event-driven microservices
  • Fraud/anomaly detection
  • Real-time analytics
  • CDC (Change Data Capture) processing

Consider alternatives:

  • Batch processing → use Spark
  • Need Python/Node.js → use Flink or custom consumers
  • Simple pub/sub → just use Kafka directly
  • Key Takeaways

    1. Kafka Streams = declarative stream processing - Say what, not how
  • No separate cluster needed - Runs as a regular Java app
  • Exactly-once semantics - Built-in reliability
  • Scalable - Just run more instances
  • Fault-tolerant - State is backed by Kafka topics
  • Stream processing doesn't have to be complex. With Kafka Streams, you get the power of distributed systems with the simplicity of a library.

    #kafka#streaming#java#backend#data

    Stay Updated 📬

    Get the latest tips and tutorials delivered to your inbox. No spam, unsubscribe anytime.