Michael Sparks, the author of Kamaelia, took a look at Pypes this weekend and sent out some tweets mentioning the similarities, limitations, and possible synergies with Kamaelia. I'm a huge fan of Kamaelia and so the similarities should not be too surprising. I like the style of messaging passing used in Kamaelia and the only real difference there is that I chose the Flow-Based Programming (FBP) terminology, calling them "ports" rather than "in/out boxes". At the same time, there are some fundamental differences both in design and objectives.
Pypes was designed with a specific goal in mind; to quickly process high volumes of data in a completely modular and service oriented way. When I was first faced with this task, I naturally turned to Kamaelia but the "service oriented" requirement proved to be a problem. It's possible that I just didn't understand Kamaelia well enough to address the issue but considering it uses generators, it inherently provides lazy data-flow.
By lazy data-flow, what I mean to say is that generators do not compute any data until it's asked for. This essentially means data is pulled through the system by creating components, that at some point, generate data. This is in contrast to eager data-flow models where data is pushed into the system without ever being requested.
Why does this matter? Pypes was originally designed to be a content processing framework capable of processing large volumes of data prior to being indexed for search (normalization, classification, information extraction, etc.). When it comes to indexing content, there are lots of systems/sources in which data resides (CMS, DB, Filesystem, Email, etc.). It's quite inefficient, and typically fragile, to poll these systems for new data. Pypes allows these systems to push new data out to a service oriented processing framework.
Another big concern here is that many of these systems have language specific APIs. Writing a component that pulls data from FileNet using Python isn't possible. I felt it was important that this "connector" functionality be isolated from the task of content processing. Pypes uses the notion of Adapters to adapt various incoming data formats to a unified data model called a Packet (using FBP terminology). Adapters can even consume batches of packets that are disassembled and streamed through the system.
Why not just create a linear pipeline? Quite simply, we wanted the ability to publish to multiple locations. An editor might create a article that needs to be indexed but also needs to be pushed out to some location as an Atom feed. At the same time, we might process packets containing certain attributes slightly different than others so the ability to branch helps solve this.
This leads me to the limitation that Michael mentioned regarding cycles. Everything I've described to this point is collectively referred to as Batch-Type Networks in FBP. This topology means that the network generally has a left-to-right/top-to-bottom flow, with packets being created on the left/top side and disposed of on the right/bottom side of the network.
When we introduce cycles then the topology changes to a Loop-Type Network. This sort of topology adds some additional complexity and overhead to the system that I just haven't wrapped my brain around yet. My ambition right now is focused more on building a richer component library and better documentation for Visual Design Studio.
I envision pypes (Visual Design Studio) as more of a mashup or ETL tool/framework. Similar to Yahoo Pipes with the ability to write custom components while also scaling to handle large volumes of data. I actually did a demo at a search conference a few months back that dealt completely with RSS feeds. The basic concept was to mashup a bunch of RSS content for indexing which was a pretty trivial task for pypes. Visual Design Studio provides an interface that allows business users (non-developers) to wire up these components to produce different applications.
See the Python concurrency wiki for a simple example of pypes as well as a comparison of the same solution across several different frameworks. You can also see a few examples at pypes.org where we're in the process of adding more.
skip to main |
skip to sidebar
Design by Luka Cvrk and Released under a Creative Commons Licence




1 comments:
Hi, I'm WireIt's creator.
How come nobody told me about this great project :)
! Of course this kind of applications is what I had in mind when I created WireIt.
I just released my own pipes web-app : http://neyric.github.com/webhookit/docs/index.html
I hope the project isn't dead
Post a Comment