Why are you using an asynchronous IO framework on a reactive application? This is a question I am often asked when presenting a design I use a lot in RxPy based applications. This a very interesting question because the answer is not obvious: Asynchronous and Reactive programming are both event-driven programming tools. So at first, they look more like competitors than allies.
This article clarifies what each technology is exactly, and shows why they shine when being combined. I make some focus on Python AsyncIO and RxPY, but most of the following explanations apply to any programming language and framework.
Asynchronous IO Programming
Let’s start with asynchronous IO programming. For a long time, it has been neglected by programmers. Probably the main reason was that it was hard to use correctly due to callback-based code. This started to change with the support of futures, and then with the availability of the async/await syntax in several programming languages. Still, asynchronous IO programming with async/await is far from being natural for many programmers. So what problem does it solve that is worth the effort? Asynchronous IO programming handles IO concurrency more efficiently than thread-based IO management (The other way to work with IO).
The first thing to understand with Asynchronous IO frameworks is that — as the name implies — they are useful on applications that deal with IO resources. In most cases, this means applications that do a lot of network communications. If an application is CPU bound, then asynchronous programming not the adapted tool to maximize performances. However, if an application is IO bound, then using asynchronous IOs may perform better than multi-threading.
Let’s illustrate this with two figures. Here is how a multi-threaded application deals with IO concurrency.
The horizontal rectangular lines represent execution units (threads). The longest one is the main thread of the process. The rounded rectangles represent the IO operations. In this figure, the application uses synchronous IO APIs. As a consequence, when an IO operation is on-going, the CPU is stalled, waiting for the IO operation to complete (the Gray squares). Since IO operations can take a long time to execute, multiple threads are used to execute IO operations in parallel. This works well up to some point: One thread is needed for each concurrent IO context. When an application needs to maintain tens of thousands of network connections, then as many threads must be created. Creating a very big number of threads implies a significant overhead in most operating systems.
Asynchronous IO APIs address this issue by dealing with this concurrency via multiplexing:
Here a single thread is used, and IO operations are multiplexed. When an operation is pending, other IO operations can be started and the CPU is available for computations. There is no more a need for threads. However, some mechanic is needed to keep the state of the on-going operations and resume them when they complete.
Asynchronous IO frameworks are designed especially for this: Expose high-level APIs so that these asynchronous operations are as easy to use as possible. These frameworks implement an event loop system and APIs to access IOs asynchronously. Most of the time they also implement applicative protocols such as HTTP, WebSocket, MQTT, Kafka.
Reactive Programming is a way to do event-driven programming. The foundation of reactive frameworks are streams of events (often named Observables, and Items), and APIs to manipulate them. A reactive application is in fine is a computation graph where each node is a computation (an operator) and the edges represent the data flow.
Let’s consider the map operator:
It is a very simple operator but also a great example of how reactive programming works: Items come in, they are processed, and other items are emitted, available for other computation units. Complex behaviors can be implemented by combining operators and observables.
If you are not familiar with reactive programming, see my introduction article on this topic.
It should be more clear now that Asynchronous and Reactive programming aim at solving different problems.
On the one hand, Asynchronous programming is a way to maximize IO utilization with minimal system overhead. Asynchronous frameworks implement IO multiplexing via dedicated event loops. Some frameworks provide explicit support for streams but with limited APIs. However, they usually provide a lot of network protocols implementations, designed with efficiency and ease of use in mind. So basically, Asynchronous frameworks are designed to deal with side effects (IO), in an optimized way.
On the other hand, Reactive programming is a way to write computation graphs. Most if not all of these frameworks are heavily influenced by functional programming. As a consequence, they provide limited support for side effects. It does not mean that you cannot use IO with them, but that they have limited built-in support of IO. Reactive programming frameworks excel in dealing with streams of events. They contain many operators, and they can be easily extended with custom operators.
So in the end, these two technologies that seemed similar at first are very different and complementary. Now let’s come back to the original question and see why using both is a good design. If you have limited usage of IOs in your application, maybe you never really bothered about this. However, if your application does some significant amount of IO operations, you probably looked for a design that separates the data-flow from the side effects (the IOs). Separating them is interesting for several reasons:
- It makes testing easier. No mocks are needed, you simply emit events and check the output events. You can also test any error condition.
- It may act as an IO abstraction, easing the migration from one library to another one.
Moreover, in a Reactive application, you certainly want to use asynchronous APIs for IOs: Using blocking APIs will block the dataflow. Blocking the dataflow will block the whole application, defeating the reason to use reactive programming. Multithreading can be a solution, but as explained before it has some limitations, and it also adds complexity to the code. Still, this is a perfectly fine solution in many situations.
If you aim at implementing a reactive system then you have to use asynchronous IOs. This is where the combination of both worlds seems like a good match. Unfortunately, implementing it is not technically obvious.
Let’s now consider RxPy and AsyncIO as an example. AsyncIO is the asynchronous framework of the Python standard library. It is fully based on the async/await syntax, and many AsyncIO based packages provide implementations for various network protocols (you can find several of them here). RxPY is the Python implementation of ReactiveX. It contains some support for AsyncIO, but quite limited. Typically, few operators accept coroutines as arguments. As a consequence, it is difficult to mix RxPY code with AsyncIO code. This constraint is a chance to make a clear separation between side effects and pure code. CycleJS was the first framework to propose such a design. Since then, other similar frameworks have been implemented including one that I developed specifically for RxPY and AsyncIO: Cyclotron.
The design is this one:
On the top part is a pure dataflow, the RxPY application. On the bottom part are the side effects. This is where AsyncIO code can be implemented. Both parts communicate together via observables and items. you can find more information on this design in another article.
On an application that uses many different network protocols, this design allows adding features with minimal complexity for each of them. Here is a structure that I use a lot on simple micro-services:
Such a service consumes events from some Kafka topics, processes them, and emits other events on other Kafka topics. This application does a lot of different network IOs: Kafka consumer, Kafka producer, HTTP client (with long polling) for Consul, HTTP server for Prometheus. Since all the IO are exposed with RxPy friendly APIs (i.e. Observables), cohabitation is natural and efficient.
I hope that the difference between Asynchronous programming and Reactive programming is now clear. These two technologies are definitively not in competition but complementary. Asynchronous frameworks are designed to deal with IOs, while Reactive frameworks are designed to deal with streams of events. Luckily Python provides great support for both of them. So let’s use the best of each one!
Originally published at https://blog.oakbits.com on October 21, 2020.