More and more data science use cases are done in real-time: Alerting, defect detection, prediction, automated recovery are some examples. Yet implementing and deploying them can be very challenging. Maki Nage is a Python framework that aims at simplifying the development and deployment of real-time processing applications. This article introduces how to use it, showing that stream processing should be simplified!
Before we get our hands on some examples, let me briefly explain why we created Maki Nage. In my DataLab, we often work on real-time streaming services. In practice, the algorithms and machine learning models that we build are fed by Kafka streams. However, the data we use for study and training is retrieved from data lakes. These are batch data, while the inference is running in streaming mode. As a consequence the code that we write for data preparation cannot be used for deployment: We typically use Pandas and Spark to work on batch data, and there is no way to re-use it for deployment. So in the end another team must re-implement the feature engineering part, usually in Java with Kafka Streams. This situation is quite common in data science. …
Why are you using an asynchronous IO framework on a reactive application? This is a question I am often asked when presenting a design I use a lot in RxPy based applications. This a very interesting question because the answer is not obvious: Asynchronous and Reactive programming are both event-driven programming tools. So at first, they look more like competitors than allies.
This article clarifies what each technology is exactly, and shows why they shine when being combined. I make some focus on Python AsyncIO and RxPY, but most of the following explanations apply to any programming language and framework.
Let’s start with asynchronous IO programming. For a long time, it has been neglected by programmers. Probably the main reason was that it was hard to use correctly due to callback-based code. This started to change with the support of futures, and then with the availability of the async/await syntax in several programming languages. Still, asynchronous IO programming with async/await is far from being natural for many programmers. So what problem does it solve that is worth the effort? …
ReactiveX is a wonderful framework that allows to write event based code in a very elegant and readable way. Still, getting started in it can be challenging, and intimidating. In practice once you understand few key principles of ReactiveX, you can start writing reactive code easily.
The aim of this article is to explain these keys principles, and show how they apply though a simple example. By the way, the first half of this article is language agnostic. So feel free to read it even if you use another programming languages than Python.
Before reading on, be aware of one important thing: Reactive Programming is addictive ! Once you start thinking as a data flows instead of control flows, you trend to consider that it solves problems better than other programming approaches, and you use reactive programming more and more. …
Reactive Programming is really great to write event driven applications, data driven applications, and asynchronous applications. However, it can be difficult to find a good way to structure the code. Fortunately for JavaScript developers, there is CycleJs. So I developed a similar framework for Python and RxPY: Cyclotron.
edit 2020/10: Updated examples to RxPY v3, and cyclotron drivers list.
Cyclotron is a functional and reactive framework. It allows structuring reactive code in a functional way. More specifically, functional means the following:
Moreover, it is specifically designed to write reactive…
About