I've been asked both online and offline regarding the relationship between IT Governance and Virtualization (Hyper-V). In this respect, I fall back onto the standardized frameworks of ITIL and MOF to provide guidance on how to reliably operate a virtualized infrastructure. These frameworks were created to help IT deliver more stable, reliable, and controlled infrastructures. In order to accomplish this, the frameworks include reoccurring processes and service management functions which assist IT in taking a proactive and manageable approach to IT operations. These concepts fit well into a virtualized infrastructure and I intend to provide a launching pad as to how these frameworks will improve the quality of services delivered.
Microsoft Operations Framework and Virtualization
The Microsoft Operations Framework (MOF) is made up of the "IT Service Lifecycle", as well as "Service Management Functions". These aspects of IT governance formalize the processes associated with solid IT service management.
IT Service Lifecycle
The IT Service Lifecycle is made up of four major phases, including the "Plan Phase", "Deliver Phase", "Operate Phase", and "Manage Layer". The intention of these phases is to place IT in a state of continuous improvement.
The Microsoft description of these phases is as follows:
- The Plan Phase is generally the preliminary phase. The goal of this phase is to plan and optimize an IT service strategy in order to support business goals and objectives.
- The Deliver Phase comes next. The goal of this phase is to ensure that IT services are developed effectively, are deployed successfully, and are ready for Operations.
- Next is the Operate Phase. The goal of this phase is to ensure that IT services are operated, maintained, and supported in a way that meets business needs and expectations.
- The Manage Layer is the foundation of the IT service lifecycle. Its goal is to provide operating principles and best practices to ensure that the investment in IT delivers expected business value at an acceptable level of risk. This phase is concerned with IT governance, risk, compliance, roles and responsibilities, change management, and configuration. Processes in this phase take place during all phases of the lifecycle.
The following is a link to the Microsoft MOF description of these phases: http://technet.microsoft.com/en-us/library/cc543217.aspx
These phases relate well to virtualization in the following ways:
The implementation of virtualization technologies is tightly associated with solid pre-planning to ensure that the necessary resources exist to support the proposed consolidation goals. This phase is responsible for the creation of the portfolio of services and ensuring that the services offered by IT align with the needs of the business. Examples of how I would plan the deployment of a virtual infrastructure in respect to the "plan phase" are:
- Identifying a Service Catalog / Map - A critical step in a virtualization strategy is first to understand the catalog of services that IT operates, or will operate. By understand the service catalog of the organization you will be better armed to associate the underlying requirements of those services with a virtualization strategy. If I have documented the business criticality, growth plans, and financial impact of services in my organization, I will be well prepared to determine my IT strategy as relates to those services. I can make a strong business case for my proposed actions because I understand the systems which I offer to the business. Make sure to take a look at this: http://technet.microsoft.com/en-us/library/cc543303.aspx
- Analyzing the Core Infrastructure and Designing a Virtualization Plan - The first initiative should always be to plan what you intend to accomplish. In this respect, the development of a virtualization strategy is critical. Prior to even developing a server infrastructure, you first need to have a clear idea of where you're going, why you're doing it, and how is this going to benefit the business. This may seem obvious in certain senses, but the exercise of analyzing and documenting these requirements is imperative to make the business case, both before and after the implementation is complete. You want to have very clearly defined success factors, so you can determine to extent to which the initiative was successful.
- Validating Success Factors with Business Decision Makers - A key element of the plan phase is the alignment between business and IT. In the case of virtualization, this can be a big win, since virtualization brings a significant number of direct business benefits, like cost savings, recoverability, availability, and others.
- Analyzing Cost Outlays and Presenting ROI - In addition to aligning the deliverables with business objectives, a clear ROI on a virtualization solution should be created and presented. In the case of a server virtualization project this is a consolidation report outlining operational, replacement, and project costs, vs. the same environment virtualized. In the case of presentation, application, or desktop virtualization you will need to analyze your alternative solution and present the alternative with clear cost figures. Wherever possible, use actual costs, especially items that have direct cost savings, such as power, licensing, deployment time, replacement hardware, support contracts, etc. You will be also be presenting items such as risks, new or eliminated controls, and the operational requirements of the solution. A key tenant of infrastructure optimization is "plan for operations", which essentially means that you plan for the entire lifecycle of a system, not just to deploy it. Definitely look at risk and control management here: http://technet.microsoft.com/en-us/library/cc543273.aspx
For more information on the planning phase, please see: http://technet.microsoft.com/en-us/library/cc543274.aspx
The deliver phase is responsible for the deployment of services into the production environment, as well as the validation that systems are ready for production based upon the testing of deployments within a staging environment. This phase defines the goals (envisioning), designs the solution (project planning), builds the system (build), stabilizes the system and prepares for operation (stabilization), and deploys a system into operation. The Deliver Phase is where a virtualization solution is designed and built. In this respect, I'd like to highlight areas of planning that should be kept in mind:
- Envisioning Service Management Function - The envisioning SMF of delivery essentially takes the high level objective from the strategic plan and develops it into a vision / scope for the project. This process ensures that the strategy defined in the road mapping process is incorporated into the actual delivery of the solution. In the case of a virtualization solution, it means again, if only in a minimal sense, restating the rational for the project and agreeing on specific goals.
- Project Planning Service Management Function - The project planning SMF consists of the defining of detailed system requirements and agreeing on a functional specification. This usually includes selection of software technologies, which is very important when deciding between various virtualization technologies like VDI, application virtualization, or RDS. In gathering system requirements I would recommend the following tools:
- Server Virtualization: Microsoft Assessment and Planning Toolkit (MAP). The MAP tool is invaluable in planning the deployment of a virtual server infrastructure because it allows the IT planner to retrieve performance and capacity information from current servers and translate that information to a virtualization design. Although the results are not perfect (no tool's is), it serves as an excellent starting point. http://technet.microsoft.com/en-us/library/bb977556.aspx
- Capacity and Performance: Ensure that in addition to planning for current systems, new systems are taken into account. Virtual infrastructure are prone to over-provisioning services, especially because of the ease of deployment. Capacity and performance must be planned at the beginning and should be initiated as an operational process later. I always plan the virtual infrastructure in greater depth than even a physical infrastructure, especially in respect to disk IO, memory, and processor requirements. I would rather have confidence in the performance of a solution, than take a "wait and see" approach. The MAP tool mentioned earlier is a great tool for figuring capacity and performance.
- VM Grouping: Plan the grouping of VMs on individual hosts, especially in respect to performance. Try to group VMs based upon your anticipated performance requirements. Also, in Hyper-V R2, use Clustered Shared Volumes to limit the number of LUNs used on your SAN. This will simplify administration of the VM environment. For more information on CSV, check here: http://technet.microsoft.com/en-us/library/dd630633(WS.10).aspx
- Availability: Ensure that the planning process takes into account availability. The team should review whether some or all of the VMs will be made highly available through a Hyper-V failover cluster. I will rarely not build a highly available clustered Hyper-V environment, simply because I want to be prepared for hardware failure and also want simple maintenance processes (like patching) to not negatively affect utilization. In the case of Hyper-V R2, you can build highly available failover clusters which support "Live Migration" of virtual machines between hosts without downtime. In the case of geographically dispersed availability, you'll need to look at a technology that can copy content to a secondary location, such as Double Take or SAN replication.
- Backup and Recovery: Ensure the planning process takes into account the backup and recovery of bare-metal VHDs. This can be accomplished by using tools as basic as Windows Server Backup. In addition, using Windows Server Backup provides a straightforward way to backup and recovery individual machines.
- Build Service Management Function - The build SMF is responsible for the actual construction of the implemented solution. This service management function often flows into a secondary project management methodology. The principal deliverable of the SMF is the "delivery of an IT solution." For more information about this function, please see the following: http://technet.microsoft.com/en-us/library/cc543240.aspx Also, in this part of the IT lifecycle there are several key aspects to consider in respect to Hyper-V:
- Roles: Ensure that roles for the project are well defined. It is very important that key internal IT teams are very involved in the implementation, as virtualization will introduce a significant change in the operations environment. It is also important that application teams are kept in the loop and are made responsible for individual testing, especially during physical to virtual conversions.
- Phases: Respect the project phases which outline the process for build, stabilization, and release. This becomes especially important when changes to host systems affect 8 - 16 VMs running on those hosts.
- Test, Test, Test: Ensure that throughout the project you have rigorous test processes to validate system stability, performance, and security. Although formalized external testing won't happen until stabilization, its important to have build testing throughout the project.
- Build to Operate: Projects often overlook operations during the build process. As a result, systems are constructed and deployed without appropriate documentation, procedures, or operational responsibilities identified. The operations process in fact is one of the most important deliverables of the project. By focusing on operations, the project team constructs a solution which delivers consistent services over its operational life. It also helps avoid the tendency to create "heroes" within the IT department, instead focusing on constructing a solution which is easily supported by the IT operations team.
- Stabilization Service Management Function - The stabilization SMF is responsible for ensuring that outstanding solution bugs are eliminated prior to the system releasing to operations. This phase is all about testing. The principal deliverable of this phase is a stabilized solution which has completed a release readiness review.
- Deployment - The deployment SMF is responsible for the delivery of a solution into the operations environment. At the end of deployment, the solution will be delivering its value to the business and the support responsibility will be on the IT operations team. For more information about deployment, please see the following: http://technet.microsoft.com/en-us/library/cc465181.aspx Examples of deliverables of deployment are:
- Bringing the Hyper-V cluster into the operations environment as a production system.
- Executing a P2V conversion process
The Operations Phase is responsible for the maintaining, monitoring, and supporting of an IT solution throughout its operational life. This timeframe is from the deployment handoff to the official decommission of the solution. As such, the phase is responsible for not only operations, but also feeding information into the manage layer and plan phase. This phase is extremely important to Hyper-V, as the extent to which operations is controlled effectively will significantly impact operational quality.
- Operations Service Management Function: The Operations Service Management Function is responsible for executing operational processes that are outlined by the build process. The processes automate, or at least standardize the operations of the infrastructure. In respect to virtualization, keep the following kinds of procedures in mind:
- Validation of host security
- Validation of system backups (host and VMs)
- Maintaining security permissions
- Maintaining Virtual Machine Resources
- Reoccurring patching processes
- Monitoring of new virtual machines
- Maintaining system templates
- Service Monitoring and Control Service Management Function: This service management function is responsible for monitoring the deployed solution and providing data to improvement processes or incidents. This is extremely important for virtualization, as managing growth ensures that the solution provides a stable environment. Examples of these actions include:
- Customer Service SMF: This service management function is the primary line of communication between the business and IT. This team is responsible for providing operational end user and IT support for the solution. It includes roles which provide for customer service, incident resolution, and problem review. In respect to Hyper-V, there will be less end user questions, but this role is still very relevant. The IT team responsible for virtualization will find itself assisting with the provisioning of new services, or the supporting of the underlying OS for existing ones. The key roles of this SMF are:
- Customer Service Representative
- Incident Resolver (incidents are any end user issue)
- Incident Coordinator
- Problem Analyst (problems are any aggregate issue, usually having a root cause)
- Problem Manager
- Customer Service Manager
For more information, see the following: http://technet.microsoft.com/en-us/library/cc543265.aspx
- Problem Management SMF: This service management function is responsible for the identification of issues with a complex root cause which can only be resolved outside of the normal incident management process. The identified problems typically enter into the portfolio management process and are categorized for build / deploy. This SMF is also responsible for the development of workarounds when the root cause is not resolved. These workarounds are typically delivered to the operational support team. For more information about problem management, please see the following: http://technet.microsoft.com/en-us/library/cc543261.aspx
The Manage Layer is responsible for facilitating the continuous improvement of IT systems, control of current configurations, management of change, and the organization of teams. These deliverable areas are outlined as "Governance, Risk, and Compliance", "Change and Configuration", and "Team". How does this relate to virtualization? These three concepts are critical to virtualization in that they establish a mature process for managing the virtual infrastructure.
- Governance, Risk, and Compliance SMF: This service management function is essentially about establishing a culture of continuous improvement and facilitating that at the senior level by engaging both the business and IT in addressing strategic goals. The Microsoft documentation describes the relationship between GRC and the IT Lifecycle particularly well here: http://technet.microsoft.com/en-us/library/cc531020.aspx
How do these relate to virtualization?
- GRC in the Plan Phase with Virtualization: The manage layer is aiding the planning phase by establishing that virtualization is a deliverable that is accomplishing a business strategy. This would include items like operating cost reductions (server consolidation), availability improvements, recoverability improvements, and reduced time necessary to provision IT services. These are analyzed against risks (both the do vs. do-nothing) approaches. The do-nothing approach in virtualization really has to focus on current costs, provisioning times, and recovery. One of the best ways to demonstrate this is to create a clear ROI statement, using actual numbers and actual times. With virtualization you have this opportunity because the cost savings are more transparent.
- GRC in the Deliver Phase with Virtualization: The manage layer is aiding in the deliver phase by establishing a framework for the planning and building of a solution. Virtualization projects are aided by this because they significantly uproot the existing infrastructure and change the way that IT delivers services. This requires a strong project plan for build, P2V conversions, and continuous performance monitoring. It also requires strong documentation of the implemented infrastructure, procedural controls for operations, and a well defined test process. The GRC layer is responsible for documenting what these deliverables are. For example, what types of procedures does your business require and how are they to be documented? The cost of management goes up if documentation is not consistent and easy to find. Are you documenting the risks for a P2V conversion and mitigating them through the inclusion of the application team responsible for the solutions running on the server? The GRC team is responsible for ensuring that process is in place. At its core, you can see that GRC is critical to a professional vs. haphazard project.
- GRC in the Operate Phase with Virtualization: The manage layer is aiding in the operate phase through its focus on ensuring the consistent and stable operation of a system and documenting opportunities for improvement in existing solutions. In respect to virtualization this means that the host cluster, VMs, and current performance is well documented. Risks are identified from current operations. Controls for delivering new systems are consistently followed. For example: Is the cluster over utilized? Are hosts built with the appropriate sized drives? Are systems performing adequately for the hosts? Are the host systems consistently secured? Are failover tests working as expected?
- Change and Configuration Management SMF: This service management function is one of the most important functions within all of IT operations, in respect to improving the stability and availability of IT services. If asked by a client "what is the best thing I can do to ensure high availability", I typically respond, "implement solid change management". The introduction of change management will immediately improve the stability of services because it limits the culture of "wild west IT", where changes are made "whenever IT feels like it". In many ways, change management protects IT from itself. This does not have to be a major bureaucratic process. What it does need to be is consistent. For example, do you apply changes immediately when trying to fix an major issue because its "no big deal"? Do you reboot systems in the middle of the day because "only a few people are using it"? Do you failover virtual machines live or quick to another host because "a little downtime is ok"? You need change management.
What is a way to get started with change management? Start by implementing a change window.
Implementing a change window is the first step in changing the culture of IT. You can have a change window every evening at 9:00 PM, or every week, or every other day, etc. Change Management is about changing the culture such that not every change is an emergency.
That doesn't mean that changes aren't urgent, or that certain changes
cannot be implemented immediately. It does mean that most changes can wait until a standard change window.
The major components of formalized change management are:
- Configuration Baselines - How do you implement good change management if you don't know what the configuration is? Configuration baselines provide us with that information. This is referred to as "configuration management", which maintains all the current configurations of systems within an infrastructure. This may be a fully automated solution or simply a basic spreadsheet. This depends upon your size and complexity. When defining a configuration baseline management strategy you need to ask "will my people actually do this"? If the answer is "yes", then you're in good shape. If it is "no", then you need to scale it back or convey to the team where this will help them out.
- Change initiation / Proposal / Classification - These processes are responsible for requesting and classifying changes. In more complex implementations this might be placed into a change management solution, which feeds into a CAB (Change Advisory Board). This can also be as simple as "Implementation of patches on Hyper-V hosts in failover cluster 101", complemented by a description of risks, anticipated impact, and back-out plan.
- Approval - The process always has someone who is responsible for saying "yes" or "no". In a one man shop this might be a business user, in a larger shop it might be the Change Advisory Board (CAB). This individual or group is responsible for whether not a change is implemented. For example, if a change is non-critical and the company is in the middle of a critical week, you may choose to place a "change freeze" on the infrastructure. If it is a critical change, it may choose to be implemented immediately, even if there is a business impact.
- Development, Implementation, and Release - This is the process of actually implementing the change. The critical difference between this within a change management process and without, is that within a change management process you are (1) defining what you intend to do, (2) creating a back-out plan, and (3) outlining the impact on operations.
- Validation of the Change - Within a formalized change management process the validation of a change takes on special significance, because it ensures that a system is properly tested after every major modification. This limits business impact because problems are identified right after a change. This testing shouldn't just be "can I login?" or "does the system start?", but should include critical user and detailed validation.
Change and Configuration Management is outlined here: http://technet.microsoft.com/en-us/library/cc543211.aspx
- Team SMF: This service management function responsible for organizing teams and individuals to deliver project, operational, or management services. Essentially this is the definition of "who is responsible for what". I find this function important for virtualization projects, as it is essential to ensure the correct people are trained on the right technologies. Since virtualization projects turn much of the infrastructure in a new direction, the organization of an appropriate team will facilitate the change in a positive way. More information about the team SMF is available here: http://technet.microsoft.com/en-us/library/cc543329.aspx
In summary, it is essential that virtualization projects be complemented by a strong project and operational process.
The introduction of IT governance controls will make an infrastructure more efficient and more stable, will drive efficiency, improve teamwork, and create a stable offering of IT services. In my own experiences, I've found that virtualization projects are well complemented by IT governance efforts, bringing both system and process change which improves IT in many different facets.
Thanks for reading.