Deployment research
In the future, deployment will happen at the touch of a button. If you want a new server set up, you will just touch this button, several magical woodland fairies will be sacrificed to the elder gods, and a server will just appear set up for you, completely automagical. When something goes wrong, you just destroy the server, and maybe send apologetic cards to the families of the etherial woodland creatures. This will greatly improve our scalability.
However we have no idea how this will happen yet. Watch here for details as this is researched.
Goals
In short:
- Failover / load balancing.
- If a server goes down, another should pick it up.
- If things are slow, another server should be able to pick things up.
- Boot and shoot approach
- Quick and mindless spin-up of a new server
- Ability to just take down a server / node if it's not working well
- Still have the ability to step in and debug things as much as we like
- An approach for spinning up live & devel servers using mostly the same setup
The cc.engine stack has some advantages. For the most part, there simply is no "changing database"... everything is stored in RDF files that are in a git repository / in python packages.
We also need to support CC Wordpress, but this can also be checked out of an svn repository on the fly. Assuming we do edits somewhere else and just push to the server, no need to do backups of these "node" servers, even!
(However maybe we will eventually want to use this setup with things that *do* have a database that matters, like cc network?)
Cleanness is also a goal. It should be clear enough what's happening in the deployment setup and how to adjust / reproduce things. From the talk Continuous Deployment, by Laurens Van Houtven, he talks about a setup wherein anything goes, and perl scripts wrap scheme scripts alongside erlang alongside java and PHP...:
"And developers' sense of decency and morals just completely falter, and they start implementing just completely anything, and then you have some giant shambling lovecraftian horror of a deployment system..."
That's the deployment setup we don't want :)
Deployment
Deployment covers:
- Server creation (spinning up a new server with our setup)
- Server management (which stuff is on which machine?)
- Server updating (update software / data on server)
The following tools are being considered:
plain ol' ssh and bash
Of course, we could always just have ssh commands that run remote commands, like:
$ ssh webadmin@someserver run_command.sh
or even
$ ssh webadmin@someserver run_command.py
... this is the most minimalist solution! :) But also, maybe the least "powerful".
Fabric
I really like the the idea of Fabric because it doesn't do "too much". It's just a system for running a lot of remote commands from your machine. Combine this with being able to run local commands on your machine, and maybe you have a good system for checking some local information and pushing a lot of updates at once to a number of machines? In this way, it's kind of like a python-wrapped "plain ol' ssh and bash" solution. Not a bad thing, we can combine the power of python's functions / etc with remote script execution, and nice output, etc.
On the other hand, the downside is that Fabric doesn't do too much. Unlike Puppet, there's no "description" of what our remote systems should be, so if we start changing things manually on servers there's no way to automagically propagate that setup to all our running servers.
But back to the first hand, automagic is sometimes just not nice, and generally confusing and hard to debug anyway!
Silver Lining
I really like the approach that Silver Lining takes in many ways. The idea of being able to just take a virtualenv and push it to a new server is quite appealing. It's also fairly declarative and uses a config file format that's like paste deploy.
However, we need some very custom stuff. Our apache config files are large and bloated. Maybe they could be a little bit less bloated, but there is a ton of stuff in there like all the rewriting and static file serving we're doing that Silver Lining doesn't support and doesn't want to support. So for cc.engine at least, silver lining is right out.
Puppet
Puppet seems similar to Silver Lining in the sense that it has an abstracted config file setup that "describes" the server and you use that to push to a bunch of slave nodes. Unlike Silver Lining though it appears to be a lot less "simple" and allows things like having your own apache configs.
As I understand it, Puppet kind of has its own config "language" to describe the server setup. It also has a full node management system, which would be useful.
There are some advantages of this in that, as I understand it, you can have some existing nodes running, and if you change the description of the server, it can reconfigure the server for you to match the new "described" setup.
Or so I think.
Anyway that all seems pretty cool but also pretty complicated and abstract, maybe a lot more than we need initially.
Oh yeah, and something cool... apparently blueprint is great for reverse-engineering a puppet setup.
puppet advice from the FSF
<cure> paroneayea: yes, it's awesome <cure> if you keep a few things in mind, that is <cure> 1. don't run it as a daemon, run it in cron <cure> (on the managed hosts) <cure> 2. run the server with passenger <cure> 3. enable stored config mode from the start <cure> 4. build your config out gradually <cure> 5. set up a test environment from day one <cure> that last one is really important <paroneayea> cure: cool, thanks, noted! :) <cure> that way you can test on one node (puppetd -t --environment=name-of-your-test-env) <cure> without risking having faulty rules being pushed out to a lot of machines
Chef
I don't know much about Chef except that it's apparently a lot closer to Puppet. But insted of its abstracted config system and etc you write a lot of ruby.
For now, that makes me pretty disinterested in Chef. I don't understand yet how it could be better than Fabric on that end.
bcfg2
Looks a lot like puppet/chef except it uses Python? You describe goals for what your server should look like, it installs to those goals, upgrades when not matching those goals, validates on server goals.
http://trac.mcs.anl.gov/projects/bcfg2/
Also apparently not as many users, way less hip, less people running it also riding fixie bikes.
The ChiPy video on this page is pretty good: http://trac.mcs.anl.gov/projects/bcfg2/wiki/AudioVideo
openstack
Rackspace Cloud, for example, runs this.
Actually, OpenStack isn't a technology, it's a bunch of technologies. And much in the way that "cloud computing" is confusing because it seems to encapsulate so many ideas, OpenStack encapsulates a whole bunch of technologies in a way that I look at it and am currently confused. But I don't think it's actually a deployment technology, so maybe it doesn't belong here, though it has some deployment technologies in it.
I think I feel about OpenStack the way I felt about Pylons' documentation the first time I looked at it. Holy cow, there are a lot of things going on here, and as a complete newbie I have no idea what is going on or how all this stuff relates together. Also the documentation seems all at once congealed and fragmented. What?
Maybe also like Pylons I will eventually look back at the documentation after I come to understand a lot of components individually and think, "Oh this isn't so confusing, but also no wonder I was confused."
libcloud
libcloud seems pretty general and great for spinning up nodes on the fly in a standardized way.
This page even shows an example at the bottom of starting a new server and pushing your puppet setup over to it.
Amazon Cloudformation
I'm not super enthused about this idea. It looks like a less flexible, vendor-specific verion of puppet, and I think that maybe libcloud + puppet should get us the same thing, but here it is anyway.
http://aws.amazon.com/cloudformation/
Where to deploy to
This section is about what we might end up deploying to in the end.
Originally I was going to flesh this out with more detail, but since then I think that really our system should be capable of deploying to any system running the latest Debian stable.
load balancing
I'm not sure what load balancing system we'll use exactly.
Summary
It seems to me that stuff like Puppet / bcfg2 are really great ideas in that they can auto-detect the state of a machine and adjust it from there, etc, but that also seems heavily abstracted and I don't think we even really know what the states of our machines are yet.
So I think we should start out with fabric, which is the simplest setup, and from there get automatic site updating commands, then automatic machine setup, and after all that we can consider an abstracted sever deployment system, possibly using something like Puppet / bcfg2, or possibly iterating on the tools we will have already built with fabric.
Milestones
Here are the steps I think we should take towards getting to a sane deployment strategy. I've broken them up into milestones. Since Mike Linksvayer has indicated that abstract release names are his favorite I've decided to take a hint from Debian and use characters from 3d animated films as milestone names. Since this is Creative Commons though, I'll restrict myself to stuff the Blender Institute puts out.
Emo
From the film Elephants Dream, Emo is a young man stumbling into a possibly imaginary (and possibly someone else's...) world of automations and magic.
Step one is to make it so that all of our existing "updating the remote installs" tools are done by automated fabric commands.
EG:
$ fab update_ccengine $ fab clear_cache $ fab update_images $ fab update_all # maybe does all of these?
... maybe some of these commands will execute multiple subcommands at once for convenience, like fab update_all here?
Working on these tools first means a few positive things, I think:
- We'll still be able to use these tools later when we branch out to having multiple machines with appropriate load-balanced failover
- We'll develop a good idea of what working with fabric is like before we jump into the next big step
Proog
Also from Elephants Dream, Proog is an older and possibly experienced man claiming to lead Emo through this same world of automations and magic. The world appears to form to his whims. Is it all in his head?
The next step is new server installs that can be completely updated to a creativecommons.org setup at the flip of a switch (or execution of a command). Something like:
$ fab install_ccorg -H newnode.creativecommons.org
At this point we would never have to (and never would) do work on the live server like is done with wordpress and etc currently on creativecommons.org, we could do it on another server installed in this pattern, and then just pushed with our update scripts (from the Emo milestone) over to the live site.
We'll have some experience with working with fabric from the previous milestone. Part of the reason I am thinking of using fabric rather than puppet for this step is because I'm not really sure we know what our server setup looks like, so abstractly describing it seems like it may be a bit hard? Using fabric seems like a more logical step, since we'll have to figure out what commands to run to reproduce our server. But maybe not, because maybe something like blueprint could help us get past the initial confusing steps anyway.
Regardless, if we do this milestone in this way, this step should help us understand how our server works enough to move onto the next milestone...
Kirjuri (self-typing typewriter)
Also from Elephants Dream, Kirjuri is a typewriter that types itself. Automatically automate the automation!
After we have general service updating and server install done, what's left? Maybe this step is not so necessary in a world where we really do boot-and-shoot style deployment, but maybe it would still also be nice: the ability to more abstractly describe the setup of the server, and existing servers that *are* installed can be updated to match the new described setup.
Maybe we would:
- use puppet or bcfg2 and abstractly describe our system in their language / config file setups
- grow our existing fabric system into something that better handles these needs
It's not clear exactly what this milestone will look like. I think we'll have a better idea of it after we finish Proog (assuming we take this route).