The system is down … and up again

26/04/2011

Thanx Amazon.

We’ll keep you posted.

We are up, but on crutches. Took us 5 hrs to restore most of it.

4 Responses to “The system is down … and up again”

  1. admin Says:

    We expect another 2-3 hrs before we restore the back ups.
    Trying to piece it together from different parts of the system still available.

    They lost some of our data back on Fri. We had enough redundancy in the system to withstand that. So we sat back and waited for them to sort out their problems before we started new replication.

    Today we lost one more server and couldn’t cope with lack of data any more.

    We have a dozen of terrabyte-size volumes to merge before we get a consistent dataset.

    We expect no data loss, but can’t promise anything at this stage. All depends on a few volumes we don’t have access to yet.

    Amazon says it’s really 0.07% that was affected. Looking across multiple accounts we support for ourselves and our customers we say it’s bullshit. It’s way way more.

  2. admin Says:

    We are generally up, but missing ALL images. Waiting for AWS to make some of our missing drives available again.

  3. admin Says:

    We are back up. Most of templates work. The performance is still rock bottom. It will take a while for the system to warm up and cache all the files and needs to cache.

    Still missing ALL user files. Any images uploaded from now on should work just fine. All images uploaded before the outage are still missing. We have them spread across backups and caches, but it will take time to re-index everything unless Amazon gives us the faulty drive back soon.

  4. admin Says:

    I begin to seriously dislike Amazon. Time to give Rackspace a try.

Leave a Reply