engineering-design-and-analysis
How to Implement Event Driven Architecture with Apache Pulsar
Table of Contents
Understanding Event Driven Architecture with Apache Pulsar
Event Driven Architecture (EDA) has become a cornerstone of modern distributed systems, enabling loosely coupled, scalable, and responsive applications. Instead of components calling each other synchronously, they emit and consume events — significant state changes or actions — through a central messaging backbone. Apache Pulsar, a cloud-native messaging and streaming platform, provides the performance, durability, and flexibility required to implement EDA at scale. This article walks you through the key concepts, setup process, and best practices for building an event-driven system with Apache Pulsar.
Core Concepts of Apache Pulsar
Before diving into implementation, it’s essential to understand the fundamental components that make Pulsar suitable for EDA. Pulsar is built around a layered architecture that separates serving and storage layers, offering unique advantages over traditional brokers.
Topics
A topic is a named channel where messages — events — are published and consumed. Topics are the primary abstraction for organizing event streams. For example, you might create topics like orders.created, payments.processed, or user.signup. Topics can be partitioned to enable parallelism and higher throughput.
Partitions
Partitions divide a topic into multiple shards, each handled by a separate broker. This allows horizontal scaling: multiple consumers can read from different partitions concurrently. Partitions are assigned a range of message keys or simply round-robin distribution.
Producers
Producers are clients that publish messages to topics. They can send events with a key, which determines the partition assignment, ensuring related events (e.g., all events for a specific user) land on the same partition for ordering. Producers can also batch messages for efficiency.
Consumers and Subscriptions
Consumers subscribe to topics to receive events. Pulsar offers four subscription types, each suited for different consumption patterns:
- Exclusive: Only one consumer can subscribe to a topic; ensures strict ordering but limits parallelism.
- Failover: One primary consumer handles all messages; if it fails, a backup takes over.
- Shared: Multiple consumers split messages evenly (round-robin); ordering is not preserved across consumers.
- Key_Shared: Messages with the same key are delivered to the same consumer, maintaining per-key ordering while allowing parallelism.
For most EDA implementations, the Shared or Key_Shared subscriptions are recommended to balance throughput and ordering requirements.
Storage Layer (Apache BookKeeper)
Pulsar separates serving (brokers) from storage (BookKeeper). This design allows independent scaling, low latency, and strong durability. Messages are stored in ledgers on BookKeeper, which replicates data across multiple nodes for fault tolerance.
Pulsar Functions
Pulsar Functions are lightweight compute processes that consume messages, transform them, and publish results to another topic. They eliminate the need for separate stream processors for simple transformations, enabling a truly serverless EDA.
Setting Up Apache Pulsar
You can install Pulsar locally, on-premises, or use a managed service like StreamNative Cloud. For development, the local standalone mode is sufficient.
Installation Options
- Local Standalone: Download the binary from Apache Pulsar Downloads, extract it, and run
bin/pulsar standalone. This starts a single instance with all components (broker, BookKeeper, ZooKeeper) in one process. - Docker: Use the official Docker image:
docker run -it -p 6650:6650 -p 8080:8080 apachepulsar/pulsar:latest bin/pulsar standalone. - Kubernetes: Deploy via the Pulsar Helm chart for production clusters.
Configuring the Broker
Key configuration settings include:
- managedLedgerDefaultEnsembleSize: Number of BookKeeper nodes to store each ledger entry (default 2).
- managedLedgerDefaultWriteQuorum: Number of acknowledgments required for writes (default 2).
- managedLedgerDefaultAckQuorum: Number of acknowledgments for consistency (default 1).
- maxUnackedMessagesPerConsumer: Prevents a slow consumer from overwhelming the broker.
For production, tune these based on your durability and latency requirements.
Implementing Event Driven Architecture Step by Step
Let’s walk through a concrete example: an e-commerce system where an order placement event triggers payment processing, inventory updates, and notification dispatch.
1. Design Topics and Schemas
Define clear event types. For each type, design a schema using Pulsar Schema Registry. Using Avro or Protobuf provides schema evolution and validation.
// Example Avro schema for OrderCreated event
{
"type": "record",
"name": "OrderCreated",
"fields": [
{"name": "orderId", "type": "string"},
{"name": "userId", "type": "string"},
{"name": "items", "type": {"type": "array", "items": "string"}},
{"name": "totalAmount", "type": "double"},
{"name": "timestamp", "type": "long"}
]
}
Create topics using the Pulsar CLI or Admin API:
bin/pulsar-admin topics create persistent://public/default/orders.created --partitions 4
2. Create Producers (Publishing Events)
The order service publishes an event when an order is placed. Using the Java client:
PulsarClient client = PulsarClient.builder()
.serviceUrl("pulsar://localhost:6650")
.build();
Producer<OrderCreated> producer = client.newProducer(Schema.AVRO(OrderCreated.class))
.topic("orders.created")
.create();
OrderCreated event = new OrderCreated("order-123", "user-456", items, 99.99, System.currentTimeMillis());
producer.newMessage().key("user-456").value(event).send();
The key ensures all events for a user go to the same partition, maintaining order per user.
3. Develop Consumers (Reacting to Events)
Different services subscribe to the topic with appropriate subscription types:
- Payment Service: Uses a shared subscription to process payments in parallel.
- Inventory Service: Uses a key_shared subscription to ensure each order’s items are processed by the same consumer for consistency.
- Notification Service: Uses a failover subscription for high availability.
// Payment service consumer
Consumer<OrderCreated> consumer = client.newConsumer(Schema.AVRO(OrderCreated.class))
.topic("orders.created")
.subscriptionName("payment-service-sub")
.subscriptionType(SubscriptionType.Shared)
.subscribe();
while (true) {
Message<OrderCreated> msg = consumer.receive();
try {
processPayment(msg.getValue());
consumer.acknowledge(msg);
} catch (Exception e) {
consumer.negativeAcknowledge(msg); // re-deliver later
}
}
4. Use Pulsar Functions for Simple Transformations
Instead of writing a separate service to enrich events, you can use a Pulsar Function:
import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.Function;
public class EnrichOrderFunction implements Function<OrderCreated, OrderEnriched> {
@Override
public OrderEnriched process(OrderCreated input, Context context) {
// Lookup user details from external service
User user = userService.getUser(input.getUserId());
return new OrderEnriched(input, user.getEmail(), user.getAddress());
}
}
Deploy the function to run on a Pulsar cluster, consuming from orders.created and publishing to orders.enriched.
Advanced Considerations for Production EDA
Schema Registry and Evolution
Pulsar Schema Registry supports Avro, JSON, Protobuf, and custom schemas. It enforces compatibility checks (backward, forward, full) so producers and consumers don’t break when schemas evolve. For example, adding an optional field is backward-compatible; modifying a required field to required is not.
Message Ordering and Idempotency
When using shared subscriptions, ordering is not guaranteed across consumers. To maintain per-key ordering, use Key_Shared subscriptions. However, even with ordering, duplicate messages can occur due to broker failovers or client retries. Implement idempotent consumers by tracking processed message IDs (e.g., using a dedup store like Redis or a database).
Error Handling and Dead Letter Queues
Configure a dead letter topic for messages that repeatedly fail after a maximum number of delivery attempts:
Consumer<OrderCreated> consumer = client.newConsumer(...)
.deadLetterPolicy(DeadLetterPolicy.builder()
.maxRedeliverCount(3)
.deadLetterTopic("orders.created-dlq")
.build())
.subscribe();
This ensures that problematic messages don’t block normal event processing and can be analyzed later.
Security: Authentication and Authorization
Secure your Pulsar cluster with JWT or TLS mutual authentication. Set up authorization at the topic level using role-based access control (RBAC). For example, only the order service can produce to orders.created, while the payment service can only consume it.
Monitoring and Observability
Pulsar exposes metrics via Prometheus and has a built-in dashboard (Pulsar Manager). Key metrics to monitor:
- Backlog size per subscription (latency indicator)
- Message rates (produce/consume)
- Consumer lag
- Broker health (CPU, memory, GC)
- Storage usage in BookKeeper
Set up alerts for backlog growth or consumer disconnections.
Geo-Replication
For multi-region deployments, Pulsar supports geo-replication out of the box. Configure multiple clusters and replicate specific topics across regions. Producers write to their local cluster, and clusters asynchronously replicate events. This enables disaster recovery and low-latency event consumption globally.
Comparing Apache Pulsar with Other Event Brokers
While many messaging systems can support EDA, Pulsar offers distinct advantages:
- Unified model: Supports both streaming (persistent topics) and queuing (non-persistent topics) in one system.
- Layer separation: Storage and serving can scale independently; you can add more brokers without rebalancing stored data.
- No consumer group rebalancing overhead: Unlike Kafka, adding new consumers does not require rebalancing partitions; each consumer simply gets a fair share of messages in shared subscriptions.
- Built-in Functions: Eliminates the need for an external stream processing framework for lightweight transformations.
However, Kafka may be a better fit if your organization already has deep investment in the Kafka ecosystem (Kafka Connect, Kafka Streams, Confluent Platform) and does not require Pulsar’s unique features like geo-replication or functions.
Best Practices for EDA with Apache Pulsar
- Design event schemas with evolution in mind: Use Avro or Protobuf with compatibility checks to avoid breaking changes.
- Choose the right subscription type: Use
Key_Sharedfor per-key ordering with parallelism,Sharedfor maximum throughput without ordering guarantees, andFailoverfor high availability with ordering. - Implement idempotent consumers to handle at-least-once delivery semantics.
- Partition topics based on key cardinality — at least 4-8 partitions per broker node, but avoid too many partitions (consume memory in brokers).
- Monitor backpressure with consumer acknowledgment timeouts and broker maxUnackedMessages limits.
- Use Pulsar Functions for simple logic to reduce operational overhead; reserve external services for complex stateful processing.
- Secure all endpoints in production: enable TLS, authentication, and authorization.
- Test schema evolution before deploying to production; use a schema registry with a test environment.
Conclusion
Event Driven Architecture with Apache Pulsar enables highly decoupled, scalable, and resilient systems. By understanding Pulsar’s core concepts — topics, partitions, subscription types, and its separation of serving and storage — you can design event flows that handle millions of messages per second with low latency. This article covered the steps to set up Pulsar, publish and consume events, leverage Pulsar Functions, and implement production-grade features like schema management, idempotency, and security. For further details, refer to the official Apache Pulsar documentation and explore advanced topics such as Pulsar IO connectors for integrating with databases and other systems. With careful design and adherence to best practices, your event-driven system will be well-positioned to evolve with growing business needs.