Every year and, almost, every month technology advances, whether it be enhancements to existing technologies or completely new ones. New process frameworks are evangelized and some begin to trend. Infrastructure hardware has become reliable and highly available in most cases. No IT manager would dream of purchasing a server without a dual power supply.
And yet, service outages are a common and almost daily occurrence. There is much focus on new IT capabilities or providing agility, yet the quality of IT services rendered really has not increased for decades in most companies. One could argue that in the year 2017, service outages should be nearly nonexistent, and if they did occur, the mean time to resolution (MTTR) would be extremely low. Most of the answers to the reason why revolve around how the enterprise is managed and the entire ITIL service life cycle.
The foundation of an IT organization’s services is its infrastructure. This is usually not a major contributor to service outages, but the way the infrastructure is managed does contribute a great deal to the MTTR. First, there is the issue that too many companies do not know what hardware comprises their enterprise. There is no documentation that represents the design, which raises several concerns. How can an organization bring in new hardware and new capabilities if it doesn’t understand what is currently in place? That sets up any IT project to fail before it begins.
Second, monitoring is another concern where many companies still don’t monitor the infrastructure properly. In short, the infrastructure must be documented accurately and have proper monitoring in place to ensure the detection of faults, and the design of the monitoring should be informative enough to lead to the shortest MTTR. A proper design will monitor all the critical path components necessary to deliver a service.
And, finally, infrastructure data must be shared among all the IT support groups. There is still a big issue in nearly all companies where compute, network, and storage monitoring data is not shared between groups. It’s an enterprise where these services work together and are completely interrelated, thus it cannot be looked at as separate services.
- The enterprise must be documented, and the documentation must be kept up to date.
- The infrastructure must be monitored completely, right components, and the right metrics.
- Information must be shared between IT support groups.
My last article in A&G discussed changes and its impact on the enterprise. It is estimated that more than 80 percent of service outages are caused by change with that number probably being conservative. That being the case, if true, it would also hold true that if a company focused on implementing a proper change management program, it could reduce outages by 85 percent. That is a huge return of investment (ROI). The IT change management process should follow the ITIL standard. One of the basics is that again the enterprise must be documented and the configuration of the assets baselined and tracked. To ensure the quality of the enterprise, the configurations of the infrastructure and higher level services must be monitored.
- Implement an effective change management program to ensure desired changes are properly planned, documented, and approved.
- Implement configuration monitoring to ensure that the enterprise baseline is not changed unexpectedly or unapproved.
More than monitoring, service management and delivery is one of the greatest weaknesses in most companies. Service delivery programs are almost nonexistent, and if they do exist, the service delivery managers (SDM) are not aware or properly trained on how to manage a service. Service delivery managers should do much more than just report on availability. SDMs provide a point of governance for a service working with the ITIL process owners, service consumers, service developers, operations, and support organizations. The service delivery methodologies are beyond the scope of this article; however, companies must embrace the ITIL service life cycle to ensure a successful service delivery program. This service life cycle is important because it ensures proper service strategies, design, transition, operation, and improvements.
In figure 1, you can see the service delivery cycle along with the other processes that complement the different phases of the cycle.
- Service delivery must exist in an IT organization.
- IT leaders and the entire organization must understand how each piece fits into the delivery construct.
- SDMs is all important where they drive the different components of the life cycle to ensure quality.
Services are still poorly implemented where even a well-designed service can fail. To put a service into production, there should be a set of acceptance criteria for operations so that the service can be properly supported. This includes:
- The ability of the service to be monitored
- The ability to measure the service quality
- Proper documentation of the service
- The operational and disaster recovery procedures for the service
- The service levels, both RPO and RTO. Not only do these need to be identified but the service design must be able to meet these or it should not be placed into production.
An entire book can be filled with methodologies of building and ensuring quality services. This article just touched on a few common gaps that IT organizations have that impact quality. No matter how much IT costs are lowered or how great an organization introduces new capabilities, they will always be overshadowed by quality.