See the parent post, Defining Resiliency in Azure for a broader overview of resiliency.
1. Avoid Adversity in software
Many topics, strategies, and concepts can be applied in “Avoiding Adversity”. I haven’t really seen a mental model that categorizes the different strategies for addressing system changes in “Avoiding Adversity”.
From what I’ve observed, there’s strategies that address before, during, and after the system changes to the “State” of code or configuration. These strategies can be performed both as part of routine and non-routine operations. This will look different depending on the technical domain (e.g. Infrastructure, Data, and Application) you’re operating in, the technical environment you’re operating in (e.g. on-prem, public cloud, hybrid cloud), cultural practices (e.g. low-risk vs high-risk mindset, tendency towards manual vs automation processes, etc).
For lack of a better and publically available diagram to describe these areas, my thoughts on different strategies within software development (applications) mapped out on this mental model.
There are many other strategies for both managing issues on infrastructure and data as it relates to standardization and others. That said, this topic is glossed over within the Azure whitepaper. It refers to configuration and infrastructure standardization tooling (E.g. Chef and Terraform); however, it doesn’t necessarily discuss how those tools are intended to be used and what domains they solve. As the DevOps and InfraOps landscape evolves, understanding what these tools solve, and what automation and standardization they accomplish will be increasingly important.
Ultimately at your firm, these capabilities may exist in some form as a human or automated fashion. If not, they may be good areas to develop expertise in order to reduce overall and future operational efforts. They’re all different disciplines a software developer should know, and be capable of implementing as they generally are the first layer of defense for building resilient systems. Successful execution here will reduce overall customer impact, and non-routine practices in other defense layers.