Kafka + ReactiveX = Maki Nage

Image for post
Image for post
Photo by Mike Lewis HeadSmart Media on Unsplash

More and more data science use cases are done in real-time: Alerting, defect detection, prediction, automated recovery are some examples. Yet implementing and deploying them can be very challenging. Maki Nage is a Python framework that aims at simplifying the development and deployment of real-time processing applications. This article introduces how to use it, showing that stream processing should be simplified!

Addressing The Inference Gap

Before we get our hands on some examples, let me briefly explain why we created Maki Nage. In my DataLab, we often work on real-time streaming services. In practice, the algorithms and machine learning models that we build are fed by Kafka streams. However, the data we use for study and training is retrieved from data lakes. These are batch data, while the inference is running in streaming mode. As a consequence the code that we write for data preparation cannot be used for deployment: We typically use Pandas and Spark to work on batch data, and there is no way to re-use it for deployment. So in the end another team must re-implement the feature engineering part, usually in Java with Kafka Streams. This situation is quite common in data science. …

Image for post
Image for post
Photo by Naneen Photography Services on Cap D’antibes Carpentry

Why are you using an asynchronous IO framework on a reactive application? This is a question I am often asked when presenting a design I use a lot in RxPy based applications. This a very interesting question because the answer is not obvious: Asynchronous and Reactive programming are both event-driven programming tools. So at first, they look more like competitors than allies.

This article clarifies what each technology is exactly, and shows why they shine when being combined. I make some focus on Python AsyncIO and RxPY, but most of the following explanations apply to any programming language and framework.

Asynchronous IO Programming

Let’s start with asynchronous IO programming. For a long time, it has been neglected by programmers. Probably the main reason was that it was hard to use correctly due to callback-based code. This started to change with the support of futures, and then with the availability of the async/await syntax in several programming languages. Still, asynchronous IO programming with async/await is far from being natural for many programmers. So what problem does it solve that is worth the effort? …

Image for post
Image for post
Photo by Louise Eckerström on Unsplash

ReactiveX is a wonderful framework that allows to write event based code in a very elegant and readable way. Still, getting started in it can be challenging, and intimidating. In practice once you understand few key principles of ReactiveX, you can start writing reactive code easily.

The aim of this article is to explain these keys principles, and show how they apply though a simple example. By the way, the first half of this article is language agnostic. So feel free to read it even if you use another programming languages than Python.

Before reading on, be aware of one important thing: Reactive Programming is addictive ! Once you start thinking as a data flows instead of control flows, you trend to consider that it solves problems better than other programming approaches, and you use reactive programming more and more. …

Image for post
Image for post
Photo by Crystal Kwok on Unsplash

Reactive Programming is really great to write event driven applications, data driven applications, and asynchronous applications. However, it can be difficult to find a good way to structure the code. Fortunately for JavaScript developers, there is CycleJs. So I developed a similar framework for Python and RxPY: Cyclotron.

edit 2020/10: Updated examples to RxPY v3, and cyclotron drivers list.

Cyclotron is a functional and reactive framework. It allows structuring reactive code in a functional way. More specifically, functional means the following:

  • Pure code and side effects are clearly separated
  • All the code is implemented via functions (no class)

Moreover, it is specifically designed to write reactive…


Romain Picard

Data Scientist, former software architect, author, core developer of RxPY.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store