Read Windows Server 2008 R2 Unleashed Online
Authors: Noel Morimoto
corruption, or a new application or driver installation could overwrite a critical file leaving
a system unstable or in a failed state. Also, more commonly found in today’s networks, a
security, application, or system update conflicts with an existing application or service
causing undesirable issues.
Prioritizing the Recovery
After all of the computer services and applications used on a network are identified, as
well as deciding which typical disaster scenarios will be considered in the backup and
recovery plan, the next step is to organize or prioritize how the recovery of critical systems
and services will be executed. The prioritization usually involves getting the most critical
services up and running first; this usually requires networking services such as DNS and
DHCP, as well as Active Directory domain controllers, especially on corporate networks
that utilize Microsoft Windows servers and client operating systems.
Maintaining up-to-date backup and recovery plans requires following strict processes
when changing an organization’s computer and network infrastructure. With an up-to-
date technology priority list, administrators can tackle the planning for the most impor-
tant services first to ensure that if a disaster strikes sooner rather than later, the most
important systems are always protected and recoverable.
Understanding Your Backup and Recovery Needs and Options
1231
Identifying Bare Minimum Services
The bare minimum services are the fewest possible services and applications that must be
up and running for business operations to continue. Only the top few services and applica-
tions in the technology prioritized list will become part of the bare minimum services list.
For example, a bare minimum computer service for a retail outlet could be a server that
runs the retail software package and manages the register and receipt printer. For a web-
based company, it could be the web and e-commerce servers that process online orders.
Determining the Service-Level Agreement and Return-to-Operation
Requirements
A service-level agreement (SLA) is an estimated planned uptime or availability time frame
for a system, service, or application. SLAs are usually defined by hours per day, week,
month, or year and are expressed in percentages. For example, if the corner grocery store
claims to be open 24 hours a day, every day of the year, the grocery store SLA is 100%.
Another example could be an organization’s electronic fax services that should be avail-
able 7 days a week between the hours of 5:00 a.m. and 11:00 p.m.
Many organizations hope to achieve and maintain operation of the most critical services
24 hours a day, 7 days a week or 100% planned uptime as logistically possible. A few
common SLA targets are included in the following list:
ptg
. 99.999% planned uptime results in 5.25 minutes of planned downtime or mainte-
nance per year.
. 99.99% planned uptime results in 52.5 minutes of planned downtime or mainte-
nance per year.
. 99.9% planned uptime results in 8 hours, 45.6 minutes of planned downtime or
maintenance per year.
. 99.7% planned uptime results in 26 hours and 17 minutes of planned downtime or
maintenance per year.
. 99% planned uptime results in 87 hours and 36 minutes of planned downtime or
maintenance per year.
Executives and managers alike all know that maintaining 100% of planned uptime is not
usually possible because of a number of factors. Also, many professionals might also
consider that the SLA must account for the time to recover after a failure or disaster is
encountered. Ensure that the definition of the SLA is understood by all as “planned”
uptime or “planned and unplanned.” The difference is huge. A recommendation is that an
30
SLA is defined as planned uptime. The unplanned recovery time frame is defined as the
Return to Operation (RTO) number for the remainder of this section.
The RTO defines how long it will take to recover a system, service, application, or business
operation after a failure or disaster has occurred. Of course, the shorter the RTO time
frame is, the more likely the backup and recovery solution costs will increase. For
example, deploying a Windows Server 2008 R2 failover cluster can provide system recov-
ery within seconds or minutes, but the hardware and software licensing costs would easily
1232
CHAPTER 30
Backing Up the Windows Server 2008 R2 Environment
exceed the costs of a recovery plan that included diagnosing a hardware issue and waiting
for a replacement part to arrive within a 4-hour window. The business owners or execu-
tives of an organization need to clearly understand how long it will take to recover from
certain failures and that will help derive the final accepted backup and recovery solution.
Separating the SLA and RTO in disaster recovery documentation can be a very valuable
tool to use when presenting the current or proposed computer and network infrastructure
disaster recovery solution to executives, managers, auditors, and customers. For example, a
service might be presented to customers with a 99.99% SLA. The same system can be
presented in the finer details to have a maximum of an 8-hour RTO, which will still meet
a 99.9% uptime in the event of a major disaster. This can also be worded as “This service
will provide 99.9% to 99.99% availability.”
Creating the Disaster Recovery Solution
When administrators understand what sorts of failures can occur and know which services
and applications are most critical to their organization, they have gathered almost all the
information necessary to create a preliminary high-level disaster recovery solution. Many
different pieces of information and several documents will be required, even for the
ptg
preliminary solutions. Some of the items required within the solution are listed in the
following sections.
Disaster Recovery Solution Overview Document
The Disaster Recovery Solution Overview document is a short narrative of the solution in
action, including presentations with quality graphics and/or Microsoft Visio diagrams.
This document first provides an executive summary, including only high-level details to
provide executives and management with enough information to understand what steps
are being taken to provide business continuity in the event of a disaster. The remainder of
the document should contain detailed information related to the plan, including many of
the following items:
. Current computer and network infrastructure review.
. Detailed history of the planning meetings and the information that was presented
and discussed in those meetings.
. The list of which disaster and outage scenarios will be greatly mitigated by this plan,
and which scenarios will not be addressed by this plan.
NOTE
Scenarios that will not be addressed in your organization’s disaster recovery solutions
should still be referenced in the document to show that it was presented, discussed,
and considered very unlikely to occur, too expensive to mitigate up front, or not impor-
tant enough to dedicate budget or staff resources.
Creating the Disaster Recovery Solution
1233
. The list of the most critical applications, systems, and services for the organization
and the potential impact to the business if these systems encounter a failure or are
not available.
. Description of the high-level solution, including how the proposed disaster recovery
solution will enhance the organization by improving the reliability and recoverability.
. Defined SLA and RTO time estimates this solution provides to each failure and disas-
ter scenario.
. Associated computer and network hardware specifications, including initial purchas-
ing and ongoing support and licensing costs.
. Associated software specifications and licensing costs for initial purchase and
ongoing support and maintenance costs.
. Additional WAN links costs.
. Additional outside services costs, including hosting services, data center lease costs,
offsite disk and tape storage fees, consulting costs for the project, technical writing,
document management, and ongoing support or lease costs.
. Estimated internal staffing resource assignment and utilization for the solution
deployment, as well as the ongoing utilization requirements to support the ongoing
ptg
backup and recovery tasks.
. The initial estimated project schedule and project milestones.
Getting Disaster Recovery Solutions Approved
Prioritizing and identifying the bare minimum services are not only the responsibility of
the IT staff; these decisions belong to management as well. The IT staff is responsible for
identifying single points of failure, gathering the statistical information of application
and service usage, and possibly also understanding how an outage can affect business
operations.
Before the executives can make a decision regarding budget for an organization’s disaster
recovery plan, they should be presented with as much information as possible to make the
most informed decision. As a general guideline, when presenting the preliminary disaster
recovery solution, make sure it includes the “In a perfect world with unlimited budget”
plan, along with one or two lower-cost plans with clearly highlighted extended downtime
or reduced functionality. Presenting alternate plans highlighting different costs and results
30
might help ensure that the solution gets approval in one form or another.
Getting the budget approved for a secondary disaster recovery solution is better than
getting no budget for the preferred solution. The staff should always try to be very clear
on the SLA for a chosen solution and to document or have a paper trail concerning all
disaster recovery solutions that have been accepted or denied. If a failure that could have
been planned for occurs but budget was denied, IT staff members or IT managers should
make sure to have all their facts straight and documentation to prove it.
1234
CHAPTER 30
Backing Up the Windows Server 2008 R2 Environment
So far, in the backup and recovery preparation, computer and network discovery has been
performed, different failure scenarios have been considered, and the most critical services
have been identified and prioritized. Now, it is time to start actually building the backup
and disaster recovery plan that a qualified individual will use in the event of a failure. To
begin creating the plan, the current computer and network infrastructure must be docu-
mented. Information on documenting a Windows Server 2008 R2 system can be found in
Chapter 22, “Documenting a Windows Server 2008 R2 Environment.” Documentation
should include, but not be limited to, the following:
.
Server configuration document—
This document details which services and appli-
cations the system provides, as well as the network settings, software installed, and
hardware specifications.
.
Server build document—
This document contains step-by-step instructions on how
to build a Windows Server 2008 R2 system for a specific role, such as domain
controller or file server, including which software is required and hardware specifica-
tions. This document will also include specific security configurations, hardware and
software configurations, and other organizational server configuration standards.
.
Network diagrams—
Network diagrams should contain network configurations, as
ptg
well as the hardware included in the infrastructure and the WAN links.
.
Network device configuration—
These documents contain the configurations of
the network devices, including the switches, firewall, and routers on the network.
.
SAN configuration—
Most medium- and large-size organizations utilize one form of
centralized storage or another. When storage devices are utilized, these device
configurations should be documented so they can be recovered in the event of a
device issue.
.
Software documentation—
This document contains a list of all the software used
in the organization, possibly including the licensing information and the storage
location.
.
Service accounts and password document—
A master list of user accounts and
network device usernames and passwords should be created and kept in a sealed
envelope in a secured onsite and offsite location.
.
Contact and support documentation—
This document should contain all IT staff
and vendor contact information required to support the infrastructure.
Determining not only what needs to be backed up, but also how the backups will be
performed and stored, is an important task. Many organizations back up data to tape
media and have that media shipped to offsite storage locations on a weekly basis.
Windows Server 2008 R2 Server Backup is built to support backup to local internal and
Windows Server Backup Overview
1235
externally connected disks and network shares for scheduled backups. Windows Server