Open source web services
Lately I’ve been trying to figure how we can have open source web services. Not just the software code, but the running of a service built with the software. How can we get the same benefits at the service level?
The reason I’ve been thinking about this is now that we’re starting to have the interconnective essence of the web we once had only at the content level now at the application level, we have the opportunity to build out shared application-level infrastructure. Amazon is the best example of a company trying to build this out, providing web services that aren’t specific to any particular domain, but provide resources that we all need, like storage and computing power.
As we continue in the direction of a component-oriented web, or a web pipeline, we’re going to have a shared interest in building out more low-level pieces that we can all use and benefit from. I see a rise in micro web services, which do very simple things very well, but were also made to be used in some sort of pipeline.
Mailhook is an example of this. It’s an email to HTTP adapter. There are others, like RssFwd, which takes an RSS feed and emails you the new articles. Both of these are open source, however, the actual service running may or may not be able to take advantage of that.
What I’d like is a system for building these kinds of services, allowing them to benefit from outside contribution, and decentralizing the responsibility of running and maintaining them. At it’s core, this means somehow opening up access to the server running them in a sane way.
Let’s run through a scenario. Take something like RssFwd. It’s open source, but you don’t really care because you just want to use it. But you find a bug. You can complain to the maintainers, but they might not be able to do anything about it because this service doesn’t make them any money and they’re busy with their day jobs. That’s not necessarily the case, but it could be, depending on what kind of people happened to make this thing.
It’s open source, so you could get the source, make a patch and send it to them. Then it’s just a matter of them applying the patch and re-deploying. This could still be too much work for them, especially if it’s really buggy and they get a lot of patches. Even the idea of making a patch might not ever cross somebody’s mind because there’s no assumption that the maintainers of the service are even accepting patches to the service’s code because nothing like this has been done before!
But what if it’s something else. Something that’s not in the code. What if it’s an issue of configuring the application or the server it’s running on. What then!? Well, that part isn’t open, so you would have no idea what to tell them to do to fix it. Unless maybe you knew what they’re running and had enough experience with it to happen to know exactly how to solve the situation. If that’s not the case, the best you can do in is just complain about the problem.
A possible step in the right direction would be to open up access to the server. But you wouldn’t actually want to do that. That’s just insane. But what if it was read-only access? You could get SSH access, but you can’t touch anything! That would help people diagnose the problem and then they could just say “hey, maintainer, run this script I made to fix this bug.” The script would almost be the equivalent of a patch.
There’s a problem with this. User data. Read-only can still be too much. You don’t want people to get access to all the email addresses stored in something like RssFwd. That would be baaaaad.
One solution that’s sort of off the wall is to provide an image of the operating system you use to run the service, which includes self-configuration scripts to build or install everything it needs to run the code… while you’re at it, checkout a copy of the code and properly install it. Yeah, that would be nice. However, it would be a lot of work…
But just think, as the user trying to fix a bug (though at this point, you’d have to pretty much be a devoted user that wished you ran this thing), you could just set up a virtual machine, install the image and you’d get pretty close copy of the web service’s server environment to toy around with. With this you could diagnose the problem and then submit a script, or in this case, maybe a patch to the self-installation script.
Patching an installation script would work if you were willing to wipe and rebuild the server environment from scratch again, but that would be a hassle with user data and waiting for an operating system to install every time you want to patch the service. So that’s pretty ridiculous.
Unless!
Have you ever heard of a configuration engine? There’s one major implementation I’ve heard of called cfengine. The idea is to use basic mechanics of cybernetics to self-maintain an ideal state (like a thermostat). I’ve seen very, very specific implementations of this idea, and maybe you have too: daemons that watch a process and start it again if the process dies. That’s the basic idea behind a configuration engine.
You define a state you want the entire system to be in (i.e. processes running, entries to have in configuration files, etc) in a giant meta configuration file that also defines processes or scripts to run that will either put the system in that state, or will get it into that state over time. Then you have a process running that checks the state based on the configuration file, and if something doesn’t match, it runs the process you specified until it gets back into the right state.
It gives you something of an autonomous immune system. IBM is working on a more intelligent implementation in what they call autonomic computing. However, this simpler version is all you need in most cases… just like the simplicity of most home thermostats. But this is getting off the point.
If you were to have the service running on a server that was maintained by a configuration engine, you could put the configuration file that defines the state in with the rest of the source because it’s simple plain text (wow, a plain text representation of effectively the entire state of a server…). That way, you could patch it like you would source code, and all you’d have to do is deploy the source and the configuration daemon would reload the file and do whatever you specified to make it in the new ideal state. The configuration engine would replace the self-install scripts you’d bundle with the OS image.
Okay, so this is great, but what about the fact that most of these services (Mailhook included) aren’t on dedicated machines? Well, we’d have to change that. Right now, if you were to build a Mailhook or an RssFwd, you’d probably just host it on a server you already have that probably hosts a few other things. That would mean you probably don’t want to go out of your way and risk affecting those other apps just to do this.
You could take advantage of the increasingly useful technology of virtualization (it’s the future btw) and build a server that just runs a bunch of virtual machines. That way you’d keep these sort of things nice and separate, and whenever you get an idea for a new micro web service, you could throw it on a new virtual machine. But in that case you’d have to set it up, which isn’t just time, but money in most cases.
Well if state (as in user data) wasn’t an issue, you could just run the app on Amazon’s EC2 service. In fact, you could scale horizontally pretty well with that (which is important just for redundancy because single computing nodes aren’t necessarily reliable on EC2). Then you’d just have the issue of shared data… unless you don’t care about access time, because there’s always S3 and other magic hashes in the sky (maybe a distributed hash table swarm, but then you’re giving up ensured data availability).
Sure, you’d have to pay for it, but only for what you use. You could have sponsors and big donate buttons to keep the thing running, but at that point it would pretty much be this magic service in the sky. There’s a few details to iron out, but hopefully this gives you a vision that it could be possible. We could have decentralized, open source web services. We just need enough people that want them.
Unfortunately, that might not be until this web pipelining stuff takes off. I’m still confident that we need web hooks for that to really work… and—crap, I forgot to write more about web hooks in light of this Yahoo! Pipes nonsense.
Anyway, until then, it would be cool to get some prototypes of this working. Anybody interested in helping me make Mailhook the first truly open source web service?
March 24th, 2007 at 7:05 pm
Interesting idea.
Like 2 hear the next interation of this.
Lal
April 4th, 2007 at 10:59 am
Interesting. I’ve been thinking about a more open architecture for such utility services as well, but obviously not arriving at any solution. I don’t know much about configuration engine, but from what you write, it seem to be more of a distraction from the original problem.
Or maybe its just because i’m seeing a different problem
Cheers!