Windows Server 2008 R2 Unleashed (252 page)

BOOK: Windows Server 2008 R2 Unleashed
13.86Mb size Format: txt, pdf, ePub

. After the issue is isolated or, at least, the scope of the issue is understood, the

network administrator should communicate the outage to the necessary managers

and/or business owners and, as necessary, open communication to outside support

vendors and ISP contacts to report the issue and create a trouble ticket. And no—this

should not go out in an email if the network is down.

. Create a logical action plan to resolve the issue and execute the plan.

. Create and distribute a summary of the cause and result of the issue and how it can

be avoided in the future. Close the trouble ticket as required.

Physical Site Failure

In the event a physical site or office cannot be accessed, a number of business operations

might be suspended. Planning how to mitigate issues related to physical site limitations

can be extensive, but should include the considerations discussed in the following sections.

Physical Site Access Is Limited but Site Is Functional

This section lists a few considerations for a situation where the site or office cannot be

accessed physically, but all systems are functional:

. Can the main and most critical phone lines be accessed or forwarded remotely?

. Is there a remote access solution to allow employees with or without

notebooks/laptop computers to connect to the organization’s network and perform

their work?

Disaster Scenario Troubleshooting

1275

. Are there any other business operations that require onsite access that are tied to a

service-level agreement, such as responding to paper faxes or submitted customer

31

support emails, phone calls, or custom applications?

Physical Site Is Offline and Inaccessible

This section lists a few considerations for a situation where the resources in a site are

nonfunctional. This scenario assumes that the site resources cannot be accessed across the

network or Internet and the data center is offline with no chance of a quick recovery.

When planning for a scenario such as this, the following items should be considered:

. Can all services be restored in an alternate capacity—or at least the most critical

systems, such as the main phone lines, fax lines, devices, applications, system, and

remote access services?

. If systems are cut over to an alternate location, what is the impact in performance,

or what percentage of end-user load can the system support?

. If systems are cut over to an alternate location, will there be any data loss or will

only some data be accessible?

. If the decision to cut over to the alternate location is made, how long will it take to

cut over and restore the critical services?

ptg

. If the site outage is caused by power loss or network issues, how long of an outage

should be sustained before deciding to cut over services to an alternate location?

. When the original system is restored, if possible, what will it take to failback or cut

the systems back to the main location, and is there any data loss or synchronization

of data involved?

These short lists merely break the surface when it comes to the planning of or dealing

with a physical site outage, but, hopefully, they will spark some dialogue in the disaster

recovery planning process to lead the organization to the solution that meets their needs

and budget.

Server or System Failure

When a server or system failure occurs, administrators must decide on which recovery

plan of action will be the most effective. Depending on the particular system, in some

cases, it might be more efficient to build a new system and restore the functionality or

data. In other cases, where rebuilding a system can take several hours, it might be more

prudent to troubleshoot and repair the problem.

Application or Service Failure

If a Windows Server 2008 R2 system is still operational but a particular application or

service on the system is nonfunctional, in most cases troubleshooting and attempting

repair or restoring the system to a previous backup state is the correct plan of action. The

Windows Server 2008 R2 event log is much more useful of a tool than in previous

versions, and it should be one of the first places an administrator looks to determine the

cause of a validated issue. Following troubleshooting or recovery procedures for the partic-

ular application is the next logical step. For example, if an end user deleted a folder from a

1276

CHAPTER 31

Recovering from a Disaster

network share, the preferred recovery method might be to use Shadow Copy backups to

restore the data instead of the Windows Server Backup.

For Windows services, using Server Manager to review the status of the role and role

services assists administrators in identifying and isolating problems because the Server

Manager tool displays a filtered representation of Event Viewer items and service state for

each role installed on the system. Figure 31.1 details that the File Services role SERVER10

logged several errors and warnings in the last 24 hours.

ptg

FIGURE 31.1

File Services role and role status.

Data Corruption or Loss

When a report has been logged that the data on a server is missing, is corrupted, or has

been overwritten, Windows Server 2008 R2 administrators have a few options to deal with

this situation. Shadow Copies for Shared Folders can be used to restore previous versions

of selected files or folders and Windows Server Backup can be used to restore selected files,

folders, or the entire volume on a Windows disk. Using Shadow Copies for Shared Folders,

administrators and end users with the correct permissions can restore data right from their

workstation. Using the restore features of Windows Server Backup, administrators can

place the restored data back into the same folder by overwriting the existing data or

placing a copy of the data with a different name based on the backup schedule date and

time. For example, to restore a file called ClientProprosal.docx that was backed up on

10-9-09 at 12:30 p.m., Windows Server Backup will restore the file as 2009-10-09 12-30

Recovering from a Server or System Failure

1277

Copy of ClientProposal.docx, and the time representation will be the current time zone

of the server.

31

Hardware Failure

When hardware failure occurs, a number of issues and symptoms might result. The most

common issues related to hardware failures include system crashes, services or drivers

stopping unexpectedly, frozen (hung) systems, and systems that are in a constant reboot

cycle. When hardware is suspected as failed or failing on a Windows Server 2008 R2

system, administrators should first review the event logs for any related system or applica-

tion event warnings and errors. If nothing apparent is logged, hardware manufacturers

usually provide several different diagnostic utilities that can be used to test and verify

hardware configuration and functional state. Don’t wait to call Microsoft and involve

their professional support services department because they can be working in conjunc-

tion with your team to capture and review debugging data.

When a system is suspected of having hardware issues and it is a business-critical system,

steps should be taken to migrate services or applications hosted on that system to an alter-

nate production system, or the system should be recovered to new hardware. Windows

Server 2008 R2 can tolerate a full system restore or a complete PC restore to alternate

hardware if the system is an exact or close hardware match with regard to the mother-

ptg

board, processors, hard disk controller, and network card. Even if the hardware is exact

and the disk arrays, disk IDs, and volume or partition numbers do not match, a complete

PC restore to alternate hardware might fail if no additional steps are taken during the

restore or recovery process. This is detailed in a later section of this chapter named

“Complete PC Restore to Alternate Hardware.”

Recovering from a Server or System Failure

When a failure or issue is reported regarding a Windows Server 2008 R2 system, the

responsible administrator should first perform the standard validation tests to verify that

there is a real issue. The following sections include basic troubleshooting steps when

failure reports are based around data or application access issues, network issues, data

corruption, or recovery issues.

Access Issues

When end users report issues accessing a Windows Server 2008 R2 system but the system

is still online, this is categorized as an access issue. Administrators should start trou-

bleshooting access issues by first verifying that the system can be accessed from the

system console and then verifying that it can be accessed across the network. After that is

validated, the access issue should be tested to reveal whether the access issue is affecting

1278

CHAPTER 31

Recovering from a Disaster

everyone or just a set of users. Access issues can be system or network related, but they

can also be related to security configurations on the network or local system firewall or

application, share, and/or NTFS permissions. The following sections can be used to help

troubleshoot access issues.

Network Access Troubleshooting

Troubleshooting access to a system that is suspected to be network related can involve the

networking group as well as the Windows Server 2008 R2 system administrators. When

networking is a suspect, the protocol and system IP information should be noted before

any tests are performed. Tests should be performed from the Windows system console to

determine if the system can access other devices on the local network and systems on

neighboring networks located across a gateway or router. Tests should be performed using

both the system DNS names as well as IP addresses and, if necessary, IP Next Generation

IPv6 addresses.

NOTE

Testing connectivity for web-based applications should be performed using system host-

names, fully qualified domain names, and IP addresses to ensure that tests yield the

ptg

proper results. Many web servers and/or firewalls can receive a properly formed head-

er in the web GET request and will not respond to a request made from an IP-based

uniform resource locator (URL).

If the system can communicate out but users still cannot access the system, possible

causes could be an incorrect IP subnet mask default gateway or routing table or a restric-

tion configured in the Windows or network firewall. Windows Firewall is enabled by

default on Windows Server 2008 R2 systems and the new firewall supports multiple fire-

wall profiles simultaneously. If a network is identified incorrectly as a public network

instead of a domain network, depending on the firewall profile settings, this might restrict

access undesirably. When administrators follow the proper procedures for installing roles

and role services, during the installation of the roles, exceptions will be added to the fire-

wall. Administrators can review the settings using the Windows Firewall applet from

Control Panel but to get very detailed firewall information, the Windows Firewall with

Advanced Security console should be used. This console is located in the Administrative

Tools program group.

Share and NTFS Permissions Troubleshooting

If network connectivity and firewall configurations check out, the next step in trou-

bleshooting access issues is to validate the configured permissions to the affected applica-

tion, service, or shared folder. For application access troubleshooting, refer to the section,

Recovering from a Server or System Failure

1279

“Application Access Troubleshooting,” and the application vendors’ administration and

troubleshooting guides. For Windows services and share folder permission troubleshoot-

31

ing, Event Viewer can assist tremendously, especially if auditing is enabled. Auditing can

be enabled within an Active Directory group policy on the Windows Server 2008 R2 local

computer policy, but auditing must also be enabled on the particular NTFS folder. For

information on local and domain Group Policies, refer to Chapter 27, “Group Policy

Management for Network Clients.” To troubleshoot share and NTFS permissions, please

Other books

Have space suit-- will travel by Robert A. Heinlein
Shimmer by Hilary Norman
Fast and Loose by Fern Michaels
The Great Altruist by Z. D. Robinson
Market Street by Anita Hughes
La colonia perdida by John Scalzi
Gallicenae by Poul Anderson