10+ Deploys Per Day: Dev and Ops Cooperation at Flickr - John Allspaw, Paul Hammond
Speakers
- John Allspaw ran the Operations group at Flickr
- John Allspaw Twitter
- Pauk Hammond ran the Engineering group at Flickr
- Paul Hammond Twitter
- Paul Hammond LinkedIn
Video
Source
Slides
Notes
How Development and Operations fit together.
Devs. vs Ops.
Stereotypes:
- Operations are like old grumpy men who always say
Traditional roles:
- Dev’s job is to add new features
-
Ops’ job is to keep the site stable and fast
- The reality is that Ops job is to enable the business
- The business requires change
- Change is the root cause of most outages.
Two options:
- Discourage change in the interest of stability
- Build tools and culture to Allow change to happen as often as it needs to
Build tools to Reduce the risk of changes. Build tools to be able to easily recover from failures
Developers who think more like operations people Operations people who think more like Developers.
- Automated infrastructure -> consistent platforms for development and production (Infrastructure as code?, Role and configuration management)
- Shared version control for both development and operations
- Set up a one step build system (build and deploy to staging)
- One step deploy
- Have a deploy log (Who? When? What?)
(Continuous Integration / Continuous Deployment)
Each individual deploy involves less change and thus less risk.
Feature flags
Always ship “trunk” (what is usually called “master” in Git)
(They) don’t work in branches, but use feature flags to enable/disable certain feature of the application / web site.
Allow private beta on productions servers.
This Allows bucket testing (A/B testing) Turn on a features to a subset of the users. Allows dark launches.
This takes the fear out of the work. They had a couple of hundred feature flags.
Some of these flags are on/of, others are more gradual knobs.
Shared metrics
- Developers have access the metrics of the operation and application level metrics in one place.
- Show the last site update on the top of every page of every metric in order to have better context.
Adaptive feedback loops
Communication
- IRC ad IM bots (it could be also Slack or Microsoft Team)
Culture
Argumentative, combative culture won?t help?. Having a culture of respect Don?t stereotypes (Not all developers are lazy, not all ops people are grumpy) Everyone is a snowflake Respect the opinions of other people and the responsibilities of other people that means people have different priorities. Don?t just say ?no?. Try to find out what problem they are trying to solve.
Developers and Operations come together to find unique solutions. Don?t hide your solutions from the other people
Developers: Talks to the ops about the impact of your code! What metrics will change, and how (CPU, memory ?) What are the risks What are the signs that something is going wrong What are the contingencies
Trust
Provide knobs and levers that in operation can be tweaked to Ops: Be transparent, give devs (read only) access to the system so they can see what?s going on without going through operations. Its low risk and empowering. They can access all the metrics.
Healthy attitude about failure:
Failure will happen Think about what to do when failure happens (evacuation plans) Ability to respond to problems Fire-drills to prepare for the real fire (real problem)
Avoiding blame, rule of no finger-pointing.
If there is a problem, first fix it and only feel bad about it.
Developers: Remember that someone else will probably get woken up when your code breaks. (better yet, make the developers on-call. Wake them up when the code breaks. Put Devs on call.)
Developers: What would you do differently if there was no one else who could fix the issues when they break in the middle of the night? What would you do differently if it was you who get woken up?
Operations: provide constructive feedback on what?s going on. What are the current aches and pains.
- Automated infrastructure
- Shared Version control
- One step build and deploy
- Feature Flags
- Shared metric
-
IRC and IM robots
- Respect
- Trust
- Healthy attitude about failure
- Avoiding Blame