Read IT Manager's Handbook: Getting Your New Job Done Online
Authors: Bill Holtsnider,Brian D. Jaffe
Tags: #Business & Economics, #Information Management, #Computers, #Information Technology, #Enterprise Applications, #General, #Databases, #Networking
HIPAA
www.hhs.gov/ocr/hipaa
.
www.hhs.gov/ocr/privacy/hipaa/administrative/enforcementrule/hitechenforcementifr.html
.
www.hipaa.com/
.
Basel II
www.bis.org/bcbs/index.htm
.
www.federalreserve.gov/generalinfo/basel2/default.htm
.
SB-1386
info.sen.ca.gov/pub/01-02/bill/sen/sb_1351-1400/sb_1386_bill_20020926_chaptered.html
.
www.giac.org/paper/gsec/3647/californiasnotice-security-breachs-about-means/105901
.
FACTA
www.epic.org/privacy/fcra
.
www.ftc.gov/os/statutes/fcrajump.shtm
.
Gramm–Leach–Bliley
www.epic.org/privacy/glba
.
www.ftc.gov/privacy/privacyinitiatives/glbact.html
.
U.S. Securities
rules.nyse.com/nyse/www.sec.gov/rules/final/34-44992a.htm
.
www.sec.gov/rules/final.shtml
.
Patriot Act
thomas.loc.gov/cgi-bin/bdquery/z?d107:h.r.03162
.
www.epic.org/privacy/terrorism/hr3162.html
.
OFAC
www.asic.gov.au/asic/asic.nsf/byheadline/CLERP+9?openDocument
.
www.treas.gov/offices/enforcement/ofac
.
www.treasury.gov.au/contentitem.asp?NavId=013&ContentID=403
.
www.treasury.gov/resource-center/sanctions/OFAC-Enforcement/Pages/enforcement.aspxCLERP-9
.
PIPEDA
www.pipedainfo.com
.
www.priv.gc.ca/leg_c/leg_c_p_e.cfm
.
Privacy and Electronic Communications Directive
eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32002L0058:EN:NOT
.
europa.eu/legislation_summares/information_society/legislative_framework/l24120_en.htm
.
Data Protection Directive
ec.europa.eu/justice_home/fsj/privacy/index_en.htm
.
www.spamlaws.com/eu.shtml
.
Chapter 9
Disaster Recovery
Organizing is what you do before you do something, so that when you do it, it is not all mixed up.
A. A. Milne
Chapter table of contents
9.1
Defining the Scope
9.2
Creating a Disaster Recovery Plan
9.3
A Word about Incident Response, Business Continuity, and Disaster Recovery
9.4
The Hidden Benefits of Good Disaster Recovery Planning
9.5
Further References
It's no secret that the daily routine of everyday life has become highly dependent on information technology. And incidents like September 11, 2001; hurricanes Katrina and Rita in 2005; the blackout of the Northeast in August of 2003; and many other crises serve as regular reminders that we have to be prepared for the worst. This is where disaster recovery comes in.
Disaster recovery is like buying insurance; you're planning for the worst, but the entire time you're hoping that you'll never need it. IT disasters come in all shapes and sizes, from hardware failures and computer viruses to blizzards, floods, chemical spills, fires, and terrorist attacks. As individuals, we routinely do things to be ready for the unexpected emergency. We keep Band-Aids in the medicine cabinet, a spare tire in the car, and a fire extinguisher in the house.
IT environments are replete with all kinds of solutions to deal with various outages and failures: clustered servers, transaction logs, backups, RAID disk drives, and so on. The problem with these kinds of solutions is that each can only handle the failure of a
specific component
. This leaves IT Managers with the issue of what to do if the
entire
environment fails, or becomes unavailable.
As shown in this chapter, the key to good disaster recovery planning is the involvement of as many areas of the organization as possible. IT can be the leader or motivator (although even that isn't required, but it's a common situation), but
all facets of a company must be involved
in disaster recovery planning. The reason is simple: disasters of every size (from brief power outages to city-wide blackouts) affect every department and affect every employee—possibly your customers and suppliers, too. Everyone should be ready. Murphy's Law that “anything that can go wrong will” would be just as apt a quote for the beginning of this chapter as the one that appears.
9.1 Defining the Scope
The extent to which you as an IT Manager have to, or can, plan for a disaster is directly related to how much your organization is dependent on IT for its core business operation and how much money your organization is willing to invest to protect it. Today it is the rare organization that is not
highly
dependent on their IT operations.
When you start to think about disaster recovery and the infinite combination of things that could go wrong and what you might have to plan for, you can easily find yourself losing a lot of sleep, thinking there is simply no way you have it all covered. However, as stated in
Chapter 8, Security and Compliance,
on
page 205,
you need to take comfort in the very fact that you're taking the right steps to best protect your environment. Remember, disaster recovery planning is an ongoing and iterative process; you can never say, “Okay, now it's done.”
Key Questions
One of the most important steps in disaster recovery planning is trying to define the scope. While it is not possible to think of every scenario, you can help put things in perspective with questions such as the following:
•
Which exactly are my
critical
applications and services?
•
How quickly do I have to recover those critical applications and services (seconds, minutes, hours, days)?
•
What are the different scenarios to plan for?
•
No access to the building (e.g., snow storm)
•
Loss of data center (e.g., flood, fire)
•
Loss of building (e.g., fire, hurricane, collapse)
•
Loss of some public services (e.g., mass transit, access to Internet, electricity)
•
Geographic impact (e.g., a blackout that affects just a few blocks in your city or one that affects 50 million people across several states, such as the blackout in the northeast United States in August of 2003)
•
How long of an interruption should be planned for (days, weeks, indefinite)?
•
How quickly do I need access to my data and systems? Can I wait a few days, or do I need to be up and running in 24 hours?
•
Is last week's backup good enough or do we need to be able to restore more current data?
Notice the range of disasters you need to think about. One- to three-day events such as snow storms all the way up to events such as terrorist attacks have permanently changed the way we think about disasters.
Obviously, these are questions you can't answer alone, and they can be the subject of endless discussion. The answers will vary greatly depending on the size of your organization, the industry you're in, and, probably most critically, the cost of the required resources.
Once the scope is agreed to by the key departments (HR, IT, Facilities, Legal, etc.), it's critical that it have the support of company executives. At a minimum, this would probably include the CIO and CFO. But there are others that should be considered. For example, the Legal department and the departments that deal with regulatory compliance may need to weigh in with their concerns. It is not unheard of for the scope to be presented to the CEO and Board of Directors or perhaps one of the board's committees.
During the process of defining the scope, you can expect to be educating others about your IT environment. Some of the people you're working with may assume that since they hear the word “server” so often, they think that the company has just 1, not 100. They may think of the computer room as that rack they saw in a closet five years ago and have no idea that it's now a 2500-square-foot facility with dedicated environmental services. In short, you'll be explaining to them what is a mountain and what is a mole hill. See the section
“Getting Approval and Defending Your Budget”
on
page 165
in
Chapter 6, Managing the Money,
about how to present technical information to a nontechnical audience. If you are addressing a Board of Directors or senior management, especially about a topic as important as disaster recovery, you want to make sure your message gets across as cleanly and professionally as possible.
Recovery Time and Recover Point Objectives
As part of the scope definition phase, you have to determine two key objectives:
1.
Recovery Time Objective
(
RTO
):
The amount of time between the disaster and when services are restored. The RTO parameter essentially quantifies how long you can/will be down for. Factors here include the availability of space (such as a data center and office space), availability of equipment (such as workstations, servers, storage), availability of connectivity (both local and wide area networks), and other resources (including staff and vendors) to help you restore the environment.
2.
Recovery Point Objective
(
RPO
):
The age, or “freshness,” of data available to be restored. The RPO could be a factor of how often you do backups or how often the tapes are sent off site. In fact, the exercise of determining your RPO may lead you to re-evaluate your processes and schedules for backups and sending tapes off-site. Technologies such as data duplication to remote facilities or the ability to replay transaction logs can be very useful in mitigating the RPO. See the section
“Value of Your Data”
later in this chapter.
RPO and RTO are sometimes best illustrated graphically (see
Figure 9.1
).
Figure 9.1
RPO and RTO.
Disaster Recovery Committee
To get the answers to the questions posed earlier, you'll need to work closely with others in the organization. Although it relies heavily on IT, disaster recovery planning isn't a function of IT alone. It requires the involvement of a number of other departments, including:
•
Finance
•
Human Resources
•
Legal
•
Key user departments (Manufacturing, Customer Service, etc.)
•
Building Facilities
A committee like this can also help determine what the priorities are in the event of a disaster. What about customer service? Financial stability? Regulatory compliance? Health and safety? These answers can vary greatly depending on the organization's industry. A Web-based retailer may consider continued customer service as its key priority as a way of protecting its financial stability. A publicly traded financial services company may place a high priority on investor relations, regulatory compliance, and financial stability. A hospital may be willing to sacrifice all those things because it's focusing on the health and safety of its patients.