OKFN IT rationalisation
High level plan
- establish a config manager host #163
- acquire second EC2 VM: us2 #164
- move eu0's services to us2 (or face #139)
Config mgr host
Prepare eu0 for migration
- enumerate services on eu0 and eu1 #166
fix / investigate eu0 memory issue- move eu0's stuff to WSGI (#105)
determine cancellation cost of eu0- find new home for all large data on eu0 #167
- migrate backup role/scripts to new hosts #168
- acquire requisite permission for ibiblio #169
DNS
- get DNS under direct admin control, if not already
- use CNAMEs where at all possible #170
- enumerate DNS dependencies of eu0 #171
- public-facing DNS servers on bare-metal; slaved off something on EC2
Miscellaneous hosts
- get JWYG's DH system integrated (#150)
- move email services off eu1
New VMs
- use the new EBS based EC2 instance functionality
- migrate us1 to a replacement vm (us3) if necessary #172
Storage / Backup
- Enumerate Save Sets (OS, system config, apps, large datasets) #173
- Backup system config and apps to Amazon S3 #174
- Backup large datasets to archive.org where appropriate
DR plan
- Set up monitoring for hosts and services (#93)
- we could maintain failover systems, this really really saved our ass last time
- Test bare metal recovery of a host #162
Some Useful Tools
- http://boto.s3.amazonaws.com/ -- mature python library for interfacing with AWS
Some Notes on Failure Rates on EBS and S3
- EBS: failure rate of 0.1-0.5% per year
- http://aws.amazon.com/ebs/
- http://developer.amazonwebservices.com/connect/thread.jspa?messageID=131127
- Suggest using EBS "snapshots" (which go to s3) to increase reliability
