|Jenny S-T (jennyst) wrote in skud,|
We're in need of better incident management as well as release management. We don't have any documented process for managing an incident on production for AO3. I have a post I started drafting a while ago about service operations, IM, 1st line, 2nd line, 3rd line support, how they are different and why having AD&T chair being also IM and 3rd line lead is a recipe for disaster, particularly if they're already tired from being release manager and deployment manager. Someone may be great at all five jobs and capable of doing any of them, but you can't do all of them at the same time in a crisis. In 2009 and 2010, the AD&T chair avoided coding, so that split some of it up, but it was still an issue - I wrote my notes on why we urgently needed a separate, defined role of release manager in late 2010. My previous role in my day job involved deployment management, release management and 3rd line team management, and mixing those three was bad enough, but that had a much smaller volume of change than AO3.
The team have been doing an amazing job in very difficult circumstances - with a small team, people have worked really hard to do many roles at once, and I have nothing but admiration for everyone who's gone before and all the people I've worked with for the past two years. People are awesome, and I really admire those who've done the impossible for so long. But this is not sustainable in the long term.