In Agile, well-established methodologies like Scrum and Extreme programming provide a framework for enforcing Agile in a company. In DevOps, there aren’t as many defined techniques, but some standard methodologies have emerged.
People Over Process Over Tools
One of the first DevOps methodologies is called people over process over tools. This methodology recommends identifying who’s responsible for a job function first, defining the process that needs to happen around them, and then selecting and implementing the tool to perform that process. It may seem rather apparent, but engineers and overeager technology managers under the salesperson are usually tempted to do the reverse. Buy a tool first, and figure out the process and who will implement it.
- People Over Process Over Tools
- Continuous Delivery
- Lean Management
- Change Control
- Infrastructure As Code
- Incident Command System
- Developers On Call
- Status Pages
- Blameless Postmortems
- Embedded Teams
- The Cloud
- Andon Cord
- Dependency Injection
- Blue Green Software Deployment
- Chaos Monkey
Continuous delivery is a standard process that some wrongly perceive to define DevOps. Continuous delivery involves coding, testing, and releasing software frequently and in small batches to improve the overall quality and speed of adoption. It’s been shown in studies that in continuous delivery conditions, the team spends 22% less time on unplanned work and rework. Changes have a three times lower failure rate, and the team recovers 24 times faster from failures. Continuous delivery works for all types of computing, even legacy software deployments.
Lean management uses small clusters of work, work-in-progress limits, feedback loops, and visualization. The same studies showed that lean management practices led to better organizational outputs, including system throughput and stability. Another benefit of lean management is less burnout and greater employee satisfaction at the personal level.
There is a direct correlation between operational success and control over environmental changes. Some legacy change control processes are bulky and dated, which can do more harm than good. A technique named Visible Ops describes a light and practical approach to change control. It emphasized eliminating fragile artifacts, creating a repeatable build process, managing dependencies, and creating an environment of continuous improvement. This is the type of change control that can help businesses succeed.
Infrastructure As Code
Infrastructure as code is a massive win in the modern computing landscape. One of the major realizations of current operations is that systems can and should be treated like code. DevOps professionals should check system specifications into source control through a code review and automated tests. Then we can automatically deploy real systems from the specifications and manage them using the software. With this programmatic system, we can compile, run, and kill run systems again instead of creating permanent fixtures that we maintain manually over time.
Incident Command System
Bad things happen to technology platforms every day. In IT, these things are commonly called incidents. Many legacy incident management processes seem only to apply to large-scale incidents. The real world tends to be a mix of minor incidents with only an occasional large one. Incident Command for IT is a well-known presentation that states what we can learn from the fire department. It explained how Incident Command works. The real world of emergency services has processes that work well for Information Technology. Large and small incidents benefit from the incident command system thanks to its battle-tested approaches. It’s one of those rare processes that help the worker instead of causing more pain while they’re already trying to fix a challenging situation.
Developers On Call
Most Information Technology departments have adopted the philosophy of creating applications to get software into production and then letting other people worry about if it works correctly. You can imagine that this approach hasn’t worked out so great. Software Teams have begun putting developers on call for the service they created. This is smart because it creates a high-speed feedback loop. Logging and deployment are rapidly improved, and core application problems get resolved quickly instead of lingering for what can seem like forever. Having a network operations technician restart a server to restore service is ok. Still, it is better to have a developer get the chance to understand the problem in real-time to prevent future outages.
Everyone knows that Services can go down. There are entire websites dedicated to that exact reason. Online services will have problems. We all know this. The only thing that’s been shown to increase customer satisfaction and retain trust during these outages is communication. The blog, Transparent Uptime was a tireless advocate for creating public status pages and communicating promptly and clearly with service users when an issue occurs. Every widely consumed service must have a status page that updates when there’s an issue. This way, users can be notified of problems, understand what’s being done, and gain confidence that work is being done to prevent such issues in the future.
When things go wrong, it does no good to point the finger at other people. As such, we have Blameless Postmortems as an essential aspect of the DevOps culture. There is rarely a single root cause for an incident, and we can not use human error as an acceptable reason for failure. It is best to examine these failures and learn from them while avoiding logical misconceptions or relying on scapegoating to make ourselves feel better or make our situation worse.
One of the traditional DevOps starter problems is that the Dev team wants to ship new code, but the Ops team wants to keep the service up and running. This creates a conflict of interest. Some groups reorganized to embed an Operations Engineer on each development team and make the team responsible for all its work. Both departments are working to coordinate a common goal, increasing overall success rates.
The DevOps love of automation and the passion for infrastructure code has met a vital partner in the cloud. The most convincing reason to use cloud technologies is that cloud solutions give you an API-driven method to develop and control infrastructure. It also helps that overall costs are less when working in the cloud. This allows you to treat your infrastructure like any other program component in the software lifecycle. You can try it out without waiting on anyone as soon as you devise a new deployment strategy, disaster recovery plan, or something similar. The cloud approach to infrastructure can make your other DevOps changes move along at high speed.
Oftentimes in a DevOps setting, you’re releasing updates fast. Ideally, you have automated testing that catches most problems, but tests aren’t always perfect. This is an innovation initially used by Toyota on its production line. A physical cord like the stop request cord on a bus authorizes anyone on the line to pull to stop ship on the production line because they see a problem. It forms an essential part of their quality control system to this day. This technique is also possible in your software delivery pipeline. You can halt an upgrade or deployment to stop a bug from propagating downstream. An andon cord wire build system can stop shipment if a developer releases a bug to production that he knew about but didn’t have a test to catch. Now everyone can stop ship if they know something’s not right.
Modern applications connect to external services like databases, rest services, or other APIs. They are often the source of many runtime issues. A software design pattern called dependency injection, sometimes called inversion of control, focuses on loosely coupled dependencies. The application shouldn’t know anything about its external dependencies in this pattern. Instead, they’re passed into the application at runtime. This is very significant for a well-behaved application in an infrastructure-as-code environment. Other patterns, like service discovery, can be used to obtain the same goal.
Blue Green Software Deployment
Traditionally, software deployment takes down the software on a server, upgrades it, brings it back up, and then you might even do this in a rolling manner to maintain the system uptime. One alternate deployment pattern is called the blue-green deployment. Instead of testing a release in a staging environment and then deploying it to a production environment and hoping it works, you have two identical systems, blue and green. One system is live, and the other system isn’t. To perform an upgrade, you upgrade the offline system, test it, and then shift production traffic over to it. If there’s a problem, you shift production traffic back. This reduces both downtimes from the change and the risks that the change won’t work when deployed to production.
Legacy systems development theory stresses making each system component highly available. This is done to gain the highest possible uptime, but this doesn’t work. A transaction that relies on a series of five 99% available components will only be 95% available because of how math works. Instead, it would help if you focused on making the overall system highly reliable, even in the face of unreliable components. Netflix is one of the leading companies in new style technology management to ensure they are doing reliability correctly. They invented a piece of software called the chaos monkey. Chaos monkey watches the Netflix system that runs in the Amazon Cloud and occasionally goes out and crashes a server on purpose. This forces the developers and support workers to create resilient designs baked into their services instead of thinking that their infrastructure is always on. These are some excellent practices from various DevOps philosophy areas. We hope you find them helpful and that they get an idea of the new thinking that DevOps professionals can apply to the issues around creating, deploying, and maintaining your applications.