What is Event Streaming? A Deep Dive

Event streaming is widespread and growing by the day. Engineers, architects, and executives use it when talking about their systems. There are many conference talks and blog posts on the topic, and books have been written about it. But like many things in our industry, it's an overloaded term. What does it mean? When should I use it? What are its trade offs?

This article is an event streaming deep dive. We start from its elemental components, describe its use cases, alternatives, and review some open-source and commercial implementations.

Copy link to clipboardAn overview of event streaming

Event streaming is an implementation of the publish/subscribe architecture pattern (pub/sub) with certain specific characteristics. The pub/sub pattern involves the following elements:

  • Message: discrete data a publisher wishes to communicate to a subscriber.

  • Publisher: puts messages in a message broker in a particular topic.

  • Subscriber: reads messages from a message broker's particular topic.

  • Message broker: a system with the ability to store messages from publishers and make them available to subscribers

  • Channel or Topic: a subset of events that share a category. The terms topic and channel are used interchangeably in the industry. In this blog post, we'll refer to these as topics.

Publisher Subscriber in action

What makes event streaming particular as an implementation of pub/sub is that:

  • Messages are events.

  • Messages (within a topic) are ordered, typically based on a relative or absolute notion of time.

  • Subscribers/Consumers can subscribe to read events from a particular point in the topic.

  • The messaging framework supports temporal durability. Events are not removed once they are "processed". Instead, the removal happens after a certain amount of time, such as "after 1 hour".

Copy link to clipboardWhat is an event?

An event stores information about a state change in the system that has "business meaning". 

An event does not itself describe what processing must take place as part of a subscription. Below, we have two examples of messages that might be added to a message broker:

1. Not an event

   "message_type": "send_welcome_email",
   "to": "anne@example.com",
   "content": "Hi, thanks for..."

2. Event

   "message_type": "user_signup",
   "email": "anne@example.com",
   "content": "Hi, thanks for..."

The difference is that the first message specifically describes what should happen. It is reasonable to conclude an email needs to be sent by the subscriber to "process" the message.

The second message is subscriber agnostic. Any number of subscribers could read that message and take different actions. One subscriber could send a welcome email, and another subscriber might add anne@example.com to a CRM system. 

Events represent a change of state, but they do not specify how that state change should modify the system. Instead, that is up to consumers to implement. 

Copy link to clipboardWhat is an event stream?

An event stream is a sequence of events, meaning order between events matters. Events flow through the system in a specific direction: from producers to consumers.

Publish-Subscribe at Scale

As producer events occur in producers, they are published to a topic. A topic is typically a categorization for a particular type of event. Subscribers interested in events of a particular category subscribe to that topic.

As events are published to the topic, the broker identifies the subscribers for the topic and makes the events available to them. Publishers and subscribers need not know about each other.

Copy link to clipboardProtocols and delivery models

The set of rules that components of a system follow to communicate with each other are known as the protocol. In our industry, there are many choices for implementing event-driven architectures. 

An event is always published by a single publisher, but it might need to be processed by multiple subscribers. Handling subscriber failures and determining which subscriber to deliver messages to is a big part of the protocol implementation. Because of that, one of the most important factors of these protocols is how they handle broker/subscriber communication.

Copy link to clipboardConnection initiation

The connection between broker and subscriber can be either subscriber initiated or broker initiated.

Broker initiated models require the broker to know about existing subscribers. This can be achieved either through static registration (e.g., Webhooks) or a service discovery mechanism.

In subscriber initiated models, subscribers need to be aware of the broker's address. These models are more convenient when there are many dynamic subscribers, especially in environments when service discovery is not feasible (e.g., a WAN).

Copy link to clipboardPush vs. pull model

In the pull model, subscribers send a message to the broker when they want to receive events, and only then the broker sends them data. Subscribers need to maintain the state to know what topics exist and what data to ask for (for example: where to read from in a particular topic). For example, Kafka Consumers pull data from brokers.

Pull model: subscriber sending a message to the broker.

In the push model, the broker determines when to send data to subscribers based on its own logic. Subscribers do not need to maintain state or implement special logic to coordinate amongst themselves. For example, SSE and Webhooks use a push model.

Push model: broker sending a message to the subscriber

Some technologies like Google Pub/Sub and Amazon Kinesis Data Streams support both models.

Copy link to clipboardBackpressure

Whenever a streaming pattern is used, events can be published faster than subscribers can process them. If that occurs, newly arrived events are typically added to a buffer, in memory, or durable. If that capacity is exhausted, events are dropped using a predefined policy (FIFO, LIFO, etc.).

Regardless of the action taken, the system needs a mechanism to let the broker know that consumers cannot currently process more events. Otherwise, either consumers or downstream services would suffer from resource exhaustion. This mechanism is commonly known as backpressure.

When consumers use a pull-based mechanism, the most common backpressure approach is pulling for events only when no events are actively being processed. This ensures systems downstream from the broker are not accumulating a backlog.

Push-based systems need a mechanism for subscribers to let the broker know that they cannot handle new events, sending an error instead of a normal ACK or temporarily pausing delivery.

In addition to applying back pressure, many systems are also architected to increase their consumption and processing capacity when resource exhaustion is near. This ability to scale dynamically, combined with backpressure is fundamental. It prevents the system from suffering issues and/or outages and gives it time to provision capacity to handle the increased publishing rate.

Copy link to clipboardMessage delivery

Another essential aspect of technologies implementing the pub/sub pattern is determining which subscribers get each event. The options are:

All subscribers to a topic get all events for the topic, which is useful for use cases where there is no shared state across consumers. Chat applications running on client devices are good fits for this model. Event processing requires updating different states (that of the chat app in client devices) whenever a new chat message is sent. For example, MQTT uses this model.

Only one of the topic subscribers gets an event: This model is commonly used for cases when event processing involves shared state, as multiple subscribers processing an event would result in inconsistencies. A common example would be subscribers that send an email when an event occurs (like a new user sign up). If multiple subscribers were running and all subscribers processed the event, multiple emails would be sent.

Having subscribers grouped into "groups" and having all groups and only one subscriber per group get an event. The benefit is that it allows multiple processing actions to occur at different points in time and has many instances of each subscriber type per group to scale processing capacity. A drawback compared to the previous models is that it requires additional coordination mechanisms between brokers and consumers for group membership and group message processing. Kafka Consumer Groups support this model.

Copy link to clipboardWhat are the advantages of event streaming?

Copy link to clipboardDecoupling

  • As an implementation of the pub/sub pattern, you decouple publishers from subscribers. There's no need for publishers and subscribers to know about each other, keep up to date membership lists, etc.

  • Decoupling of events and processing actions. Because events are not "removed from the broker", and because events do not specify their processing actions, multiple subscribers could get the same event and perform different actions, as explained before.

Copy link to clipboardTeam independence

  • Decoupling brings team independence. Different teams can work on subscribers for the same type of events without needing to coordinate. Keeping coordination costs down is extremely valuable as companies grow. Many service-oriented/microservices architectures share an event streaming broker as a communication mechanism instead of having services communicate directly.

Copy link to clipboardReliability

  • Event Streams are durable. Suppose consumers down events are not lost. You might still incur processing delays if you lack consumer (or downstream) capacity.

  • The best Event Streaming technologies will automatically deal with subscribers' failure and help you implement exactly-once processing semantics, which is fundamental when building reliable systems. Building those capabilities requires a careful architecture of all components, including the message broker, protocol, and publisher and consumer libraries.

Copy link to clipboardReal-time feedback

  • Continuously applying system state changes (events) to an entity's state allows for real-time experiences. Users don't have to wait minutes or hours to see their operations take place. See the stream vs batch section below for more on this.

Copy link to clipboardStream vs. batch

Historically, many systems used a technique known as "batch processing", which is still commonly used nowadays due to resource constraints. Batch processing involves picking a time batch window (typically of "low" system activity) to process data of a batch size (typically large). For example, at the end of the day, banks typically processed all transactions from the past 24 hours and updated accounts with the appropriate balance.

As time went by, computers and storage resources became cheaper. Users also started valuing and expecting "real-time" feedback in their applications. Thus, batch processing is no longer the answer for many applications. After all, think about it: isn't it a lot better to see your balance update 1 minute after you've made a transaction instead of after 24 hours?

Running batch processes more often, e.g., every hour instead of every day would reduce the time it takes to reflect the system's right balances. Suppose we are looking to provide "real-time" feedback. In that case, one could conceptually think of consuming an event stream as making the batch process run whenever there is an event, i.e., the batch process becomes a stream consumer. The batch size becomes 1, and the batch window becomes "all the time".

In the above case mentioned, each transaction could be thought of as an event, so transactions from the previous 24 hours can be read in order and their operations applied to the original balances.  However, there are cases when not using batch processes that might require reconciling/syncing two or more data sources.  

Imagine a scenario where an offline data warehouse (DWH) used for business analytics needs to be updated based on data from a production database that stores Users. If the production database stores each user as a separate row, and the goal is to have a table in the DWH with a row per user, then the batch process:

  • Either needs to 

    • only read users that were updated since the last batch run

    • diff the DWH and DB users to figure out what has changed 

    • and figure how to handle any errors that occur while a batch is processed, as those users need to be re-synced

  • Or needs to recreate a whole new table in the DWH by reading all users from scratch and dump the old one.

Both of those approaches are workable and they might do fine for a while, but they are less than practical if the dataset becomes too big, increasing both errors and batch job processing time.

Like everything in software, the decision of whether to use batch jobs or event streams is based on tradeoffs. Acceptable processing delay, initial investment, operational cost, complexity, throughput, and other factors need to be considered when deciding which approach to use.

Copy link to clipboardEvent streaming technologies

There are a number of event streaming technologies. Apache Kafka is probably one of the better-known ones. It is open source, and you can choose to run it yourself. It is also offered as a service by most cloud providers and many vendors, so it is useful to avoid vendor lock-in from a business perspective.

Cloud providers have their event streaming technology implementation and surrounding ecosystem: Amazon Kinesis, Google Cloud Pub/Sub, and Azure Event Hubs. These allow you to dynamically configure (and pay for) capacity, support both push and pull models and integrate easily with other cloud provider products, e.g., to use Serverless functions for event processing.

Ably is an event streaming technology for far-edge event streaming, which we describe in more detail here.

Copy link to clipboardEvent stream processing

As its name states, Event Stream Processing involves processing a sequence of events. The key here is operating on the sequence of events and not just one event at a time. Examples of Event Stream Processing are:

  • Joining two streams of events, transforming a field value, and filtering a subset of the join

Example of event stream processing: joining two streams of events.
  • Performing an aggregate calculation (max, avg, etc.) over a moving time window

Example of Event Stream Processing: performing an aggregate calculation

Event Stream Processing technologies are useful as they allow you to work with event streams, transformations, filters, and joins directly.  For example, you could write the following short and legible code snippet to count how many Twitter mentions Ably had in a sliding window:

// Tweets with a comp


The example above was inspired by a sample from https://spark.apache.org/streaming/

Without Stream Processing technologies, you would have to:

  1. Consume events from the event stream

  2. Store them in a database (or in memory)

  3. Run batches every N minutes to process the last M minutes of events

  4. Publish the results to another event stream

Apache Flink, Apache Spark (Streaming), Apache Beam, Apache Kafka Streams are some of the open-source technologies you can use for stream processing. Cloud providers offer these capabilities as a service, some of the options being Amazon Kinesis Data Analytics, Google Cloud Dataflow, and Azure Stream Analytics.

Copy link to clipboardAbly and event streaming

Ably is an enterprise far-edge event streaming solution. 

Ably is an enterprise far-edge event streaming solution.

What does that mean? Let's dissect it:

  • Far-edge: Unlike solutions like Apache Kafka, Amazon Kinesis, etc. Ably runs outside your network perimeter. By operating on the edge and receiving traffic before your infrastructure does, Ably can handle traffic spikes by scaling its infrastructure and guarantee delivery to edge consumers, so you don't have to deal with it.

  • Enterprise: enterprises want vendors they can depend on. For an event streaming platform, this means predictable and low latency delivery. If Ably confirms it received a message, it will be delivered, and fast! Ably is a pub/sub platform that is engineered around four pillars of dependability and provides:

    • Reliability through infrastructure redundant at the regional and global level, and a 99.999% uptime SLA.

    • Predictable latency even when operating under unreliable network conditions, such as that of a WAN. The Round trip latency from any of its 205 PoPs globally that receive at least 1% of its global traffic is < 65ms for 99th percentile.

    • Integrity for message order and delivery, including exactly once processing.

    • Scalability for traffic variations, including 50% global capacity margin for surge and the ability to double capacity every 5 minutes.

  • Event Streaming, as stated in the introduction, requires order, the ability to read from a point in time, and durability. Ably supports this out of the box. All you have to do is use one of the many Ably Client Library SDKs.

Ably has many integrations that allow you to integrate with third-party services, including queuing technologies, serverless functions, and others. Ably Reactor Firehose seamlessly integrates with some of the best existing Event Streaming providers to get data to them and implement Event Stream Processing.

In essence, Ably enables you to implement event streaming architectures in scenarios where you have millions of devices generating millions of events. To learn more, get in touch with Ably's technical experts.

Copy link to clipboardFurther reading

Damian Schenkelman

Damian is a Principal Engineer at Auth0, where he focuses on making its architecture reliable and scalable. He loves to solve hard problems of any type, especially those related to making software and teams scale. Before Auth0, Damian spent many years working for Microsoft on Azure, Media, and patterns & practices related initiatives. He spends his spare time with family and friends, exercising, and catching up on all things NBA.