The system is down … and up again
26/04/2011
Thanx Amazon.
We’ll keep you posted.
We are up, but on crutches. Took us 5 hrs to restore most of it.



April 26th, 2011 at 6:30 pm
We expect another 2-3 hrs before we restore the back ups.
Trying to piece it together from different parts of the system still available.
They lost some of our data back on Fri. We had enough redundancy in the system to withstand that. So we sat back and waited for them to sort out their problems before we started new replication.
Today we lost one more server and couldn’t cope with lack of data any more.
We have a dozen of terrabyte-size volumes to merge before we get a consistent dataset.
We expect no data loss, but can’t promise anything at this stage. All depends on a few volumes we don’t have access to yet.
Amazon says it’s really 0.07% that was affected. Looking across multiple accounts we support for ourselves and our customers we say it’s bullshit. It’s way way more.
April 26th, 2011 at 7:11 pm
We are generally up, but missing ALL images. Waiting for AWS to make some of our missing drives available again.
April 26th, 2011 at 9:16 pm
We are back up. Most of templates work. The performance is still rock bottom. It will take a while for the system to warm up and cache all the files and needs to cache.
Still missing ALL user files. Any images uploaded from now on should work just fine. All images uploaded before the outage are still missing. We have them spread across backups and caches, but it will take time to re-index everything unless Amazon gives us the faulty drive back soon.
April 26th, 2011 at 9:37 pm
I begin to seriously dislike Amazon. Time to give Rackspace a try.