Jumbo

Today, I'm releasing something that I've wanted to release for a very long time. It's a project that I worked on during my Ph.D., and while I don't think it'll be terribly useful to anyone, a lot of work went into it that I want to preserve, even if just for myself.

That project is Jumbo, and it's now availabe on GitHub in two flavors: Jumbo for .Net 6+, and the original for .Net Framework and Mono. If you want to play around with it or learn more about it, you probably want the former.

Jumbo is an experimental large-scale distributed data processing system, inspired by MapReduce and in particular Hadoop 1.0. Jumbo was created as a way for me to learn about these systems, and should be treated as such. It's not production quality code, and you probably shouldn't entrust important data to it.

Basically, back when I was getting started with my Ph.D. in 2008, I found myself staring at the code of Hadoop (which wasn't even at version 1.0 yet at the time), and finding I wasn't really getting a good feel of how the whole thing fit together, and what really goes into designing a system like that.

So, some people at my lab suggested I should try building something for myself, which I did. I built, from the ground up, a distributed file system and data processing system, which is Jumbo. It was heavily inspired by Hadoop, and definitely borrows from its design (although no actual code was borrowed). In some aspects, I deviate from Hadoop quite a lot (especially since Jumbo isn't constrained to only using MapReduce).

Building Jumbo taught me a lot: about software design, about distributed processing, about decisions that affect scalability, and more. It's my hope that maybe, someone else interested in these topics might want to look at it and find what I did interesting. If nothing else, I just want to preserve this massive project that I did (still the biggest project I've done where I'm the sole contributor), and have its history available.

I did end up using Jumbo for some research efforts, which you can read about in a few papers as well as my dissertation under the University section of my site.

Jumbo is also the origin of one of my most widely used libraries, Ookii.CommandLine, so it's significant in that respect as well.

Like I said, I've wanted to release Jumbo for a long time. If you look through the original project's commit history you can see a bunch of work done in early 2013 (as I was nearing the end of my Ph.D.) like cleaning stuff up and adding documentation, but I never quite reached a level where I was comfortable doing so. The project, which primarily targeted Mono to run on Linux, wasn't that easy to set up and run.

In 2019, I ported the project to .Net Core, just to see if I could. That version was easier to play around with, and I wanted to release it then too, but I never quite got around to finishing it, until now.

So now, you can look at Jumbo and play around with it on .Net 6+, thanks to this new version. I've also expanded the documentation significantly, so it should be easy to get started and to learn more about how it works. The original Jumbo project for Mono and .Net Framework is only provided to preserve the original history of the project (the new repository only contains the history of the port). You probably shouldn't try and run it (though I obviously can't stop you).

If you want to comment on Jumbo or ask any questions, please use the discussions page on GitHub.

Additional posts about Jumbo

Categories: University, Software, Programming
Posted on: 2022-09-20 23:54 UTC.

Comments

No comments here...

Add comment

Comments are closed for this post. Sorry.

Latest posts

Categories

Archive

Syndication

RSS Subscribe