DISQUS

Phanfare Blog: Outage post-mortem

  • anon · 1 year ago
    "strives for" being the key word -- they aren't hitting that. 99.99 is very hard actually.
  • Brad · 1 year ago
    4 nines is a waste of time and money for a photo site. If you are getting 3 you are knocking it out of the park. As long as you aren't losing data (other than any potential losses of new data during the downtime), 2 nines is plenty for me. Most downtime is scheduled for off-hours anyway.
  • Ken Donoghue · 1 year ago
    Customer satisfaction and services availability are understandably Andrew's concerns (Kudos, BTW, for explaining the outage.) Brad's input, as a user of Phanfare, is worth considering though because it's users who often define what an acceptable level of availability is for an application or service. The gap between the two is not nearly as wide as it once was (and ususally the service expectations are reversed! Kudos to Brad). A quick web search will yield a number of solutions that are neither complex nor expensive and can deliver up to four nines ... some better and some with virtualization included. This end of the availability spectrum is changing at an incredible pace, for the better.
  • Andrew Erlichson · 1 year ago
    We are most concerned about the durability of the data versus the availability. Our data is stored at Amazon S3, and they provide excellent durability and very good availability.

    We actually don't need Amazon to be up to serve must of our traffic. In fact, in the big Amazon S3 outage, most Phanfare users were not aware of the issues.
  • Brad · 1 year ago
    If you are using Amazon and are serving up critical information then you have to at least have your own cache if not your own complete copy. That can be prohibitively expensive, especially for something like personal photos where durability is far more important than dealing with a few hours of outage once or twice a year.
  • Andrew Erlichson · 1 year ago
    We actually do cache the recent stuff on our own servers so that we can have Amazon be down and still have Phanfare be mostly up. We cache websized photos and that is only 10% of the data. We can actually cache the entire Phanfare data set in web sized renditions at just 10% of the storage cost of storing the originals; less in fact because our cache does not need to be durable (replicated across datacenters and geographies)
  • Brad · 1 year ago
    So you take uploads into your datacenter, process them and then send them to Amazon? It sounds like you are paying for upload bandwidth 3 times: customer to you, and you to amazon twice (once to your bandwidth provider and once to Amazon). Or do you only do that when Amazon is offline?
  • Andrew Erlichson · 1 year ago
    We pay for the bandwidth 2x in our mind. Once when it hits our datacenter and once when we move it to Amazon. Our datacenter, like many, is more focused on how many bytes you serve than how many bytes you absorb since the bulk of the consumed bandwidth is outbound.