January 28, 2026
Moving to Event-Driven Architecture with Kafka
Event-driven architecture sounds great in theory. In practice, at a large enterprise with existing systems, legacy integrations, and strict compliance requirements, the implementation gets complicated fast.
When we implemented Kafka event streaming at PNC, we learned a lot about what works and what doesn't.
Start with a real problem
Don't adopt Kafka because it's trendy. Adopt it because you have a real problem that event streaming solves better than alternatives.
For us, the problem was clear: we needed real-time UI updates driven by backend events. Our existing polling-based approach was creating unnecessary load and delivering a subpar user experience. Kafka gave us a clean, scalable way to push updates.
Topic design matters
Topic design is the schema design of event streaming. Get it wrong and you'll be living with the consequences for years.
We learned to design topics around business domains, not technical boundaries. A topic should represent a meaningful business event, not an internal system state change. This makes topics reusable across consumers and keeps the event model clean.
Consumer patterns
Not every consumer needs every event. We use consumer groups extensively to parallelize processing and ensure each event is handled exactly once within a group.
For our UI updates, we built a lightweight service that reads from Kafka topics, transforms events into UI-friendly payloads, and pushes them through WebSocket connections. Keeping this layer thin and focused was critical. It's a translator, not a business logic engine.
Error handling and dead letters
In a financial services context, losing events is not acceptable. Every event must be accounted for.
We implemented dead letter topics for events that fail processing. A separate monitoring service watches dead letter topics and alerts on anomalies. Failed events get retried with exponential backoff before landing in the dead letter topic for manual review.
Monitoring is essential
Kafka gives you a lot of knobs to turn, and you need visibility into all of them. Consumer lag, partition distribution, throughput, error rates. All of these need dashboards and alerts.
We invested heavily in monitoring from day one. It paid off immediately by surfacing configuration issues that would have been invisible otherwise.