What happens when you turn an iterator inside out?

Sun 18 September 2016

This post is an extended version of a lightning talk I gave at *PyConUK 2016* on 2016-09-17

I want share with you an interesting connection between two programming paradigms that, at first glance, seem completely different. This is, of course, not my idea but material on the web is relatively difficult to come by, particularly if you are not into theory. This treatment tries to be more approachable and look from the Python perspective.

So what is an iterator?

As Python programmers we feel we've got a pretty good understanding of iterators. They are a stalwart of imperative programming, allowing us to express many tasks efficiently as loops rather than gathering sets of things into intermediate data structures. Abstractly we have objects that can be iterated over, which we will call iteratables and objects that represent the process of iterating called iterators

iter(iterable)  -> iterator
next(iterator)  -> value

The point being that you can continually call next(iterator) to yield multiple values. Also, if you call iter(iterable) a second time you will get a fresh iterator which will yield values independently (usually, but not necessarily, the same values).

If we are going to perform surgery on this definition we need to be a little more precise. We need to capture all the possible results of calling these functions. In particular, calling next() have several outcomes depending on whether it raises an exception and why.

iter(iterable)  -> iterator
next(iterator)  -> (value | StopIteration | Exception)

We can simplify this notation by removing our names to just leave an expression of what is happening. Both iterable and iterator interfaces have only one method, so we can replace them with single functions:

iterable = () -> iterator
iterator = () -> (value | StopIteration | Exception)

resulting in the expression

() -> ( () -> (value | StopIteration | Exception) )

Inside out?

We are now in a position to perform our manauver. All we have to do is reverse the arrows, turning inputs into outputs and vica versa. Since we have a higher order function in here the result will be a little more subtle than just going backwards.

Our original expression

() -> ( () -> (value | StopIteration | Exception) )

becomes

==> () <- ( () <- (value | StopIteration | Exception) )

or rewriting it left to right

==> ( (value | StopIteration | Exception) -> () ) -> ()

But what does this mean? We have a function that takes another function as argument, that second function takes either a value, StopIteration or Exception.

To make what is going on a little more explicit we can notice that any function which takes one of several types could be replaced by a tuple of functions taking each type. In our case we can split our inner function into 3 separate functions:

(value | StopIteration | Exception) -> ()

    == ( (value)     -> (),      # Emit value
         ()          -> (),      # No move values
         (Exception) -> ()       # Error
       ) -> ()

These functions look a bit odd because they don't return anything, they only receive a value or, in the case of StopIteration, there is no need to pass the value at all because there is only one possible reason for calling it. Presumably, something is done to these values though, in that they trigger some sort of computation.

In fact, they are exactly like event handlers. What we have done is replaced an object which allows us to pull values out of it until there are no more values left with an object which you send values until there is an error or there are no more values.

Finally, lets put the names back that we removed earlier. Clearly we don't have iterators/iterables any more so we'll need some different names. What we want is observer/observable:

( (value)     -> (),
  ()          -> (),
  (Exception) -> ()  )  ->  ()


observer   = ( (value) -> (), () -> (), (Exception) -> () )

observable = observer -> ()

This is Reactive Programming with ReactiveX

Observables came out of C# and LINQ and was subsequently been ported to the JVM world and many other languages, inluding JavaScript and Python. They have recently become popular for creating event-driven web apps and are central to events in Angular 2. They can be a higher-level abstraction than Promises supporting a wealth of operators to combine and filter streams of events.

But they are just iterators inside out!

A full discussion of the power of Rx would require it's own blog post but I should include at least one example in Python. Consider it a Hello World for Rx.

from rx import Observable

ob = Observable.from_('Spam egg Spam Spam bacon Spam'.split())


def on_next(value):
    print('Got {}'.format(value))

def on_completed():
    print('Done')

subscription = ob.subscribe(on_next, on_completed=on_completed)

Got Spam
Got egg
Got Spam
Got Spam
Got bacon
Got Spam
Done

Why care?

Whether you are developing dynamic web applications or large-scale data pipelines, we are being confronted with new paradigms that should solve our problems. Events, Streams, Asynchronous, Reactive -- all these things are undoubtably good but how are we going to start using them with our existing business logic? One thing we are told is that async and blocking code do not mix and yet everyone knows and understands imperative, iterator-style code.

It's good to realise events and iterators are not so different. One is simply the dual of the other. This changes our perspective allowing us to design our architecture differently. Producers are in control rather than consumers.

I am just beginning to explore the possibilities of Rx. It's clearly useful in JavaScript web apps but what about data pipelines? What about Python?

If you are interested you will like

Or to completely blow your mind this video is a fun summary of duality from the inventor of ReactiveX, Eric Meijer

"Duality and the end of Reactive" by Eric Meijer (video)

Category: Software Tagged: python reactivex category_theory