For When You Can't Have The Real Thing
[ start | index | login ]
start > Semi Annual Backup Review

Semi Annual Backup Review

Created by dave. Last edited by dave, 14 years and 131 days ago. Viewed 4,860 times. #2
[diff] [history] [edit] [rdf]

The Point

As a consultant sysadmin, I sometimes have responsibility for a customer's backups as part of my duties. One of the things I have found useful is to take the effort to document the backup system, and periodically review this documentation with the customer.

The goal of this review is to ensure that the customer agrees with me that

  • the things they want backed up are actually getting backed up
  • the backups are restorable
  • the things they don't want backed up are actually not needed to be backed up
  • the rotation/retention scheme agrees with the customer's needs
It also gives me a regular reporting period to keep the customer up to date on any problems which have been encountered, their solutions or statuses.

At the end of this process you should have a document that your replacement can be handed so they will have an understanding of what the backup system is, why it is that way, and how to find the information needed to get around the backup system.

The Report

Explicit Requirements: (if applicable) if there are specific SLA-type requirements for the backups in terms of backup windows, rollback ranges, or recovery time requirements they are listed up front.

Hardware: a brief description of the hardware and software used in the backup systems, including

  • server hardware type and OS versions
  • server software, including versions and licenses and the location of any vendor-provided documentation
  • robot and drive types in use (itemized with their servers)
  • support statuses for each item (hardware and software), including SLA
Clients: each backup client entry includes:
  • client's name
  • functional description (ie "exchange server")
  • OS type and rev
  • any "special" software considerations (databases, exchange)
  • approximate backup media footprint (both now and six months ago)
  • any backup software "class" or "group" that may be in use in the backup software
  • retention pattern or group
  • any exceptions to the general "class" or "group" standards that might otherwise be expected
  • any other notes important to the backups
Backup Operations: here I detail how the backups are run, including
  • how frequently and when media is physically rotated out of the backup servers/robots
  • how frequently restore tests are performed and how restore test sets are selected
  • what times the backups are expected to start
  • how long they are expected to run
  • where the detailed instructions on how to perform the operations is located
Noted and Potential Omissions: any data which
  • you've just learned about and have not been officially told to not back up; or
  • you have been officially told to not back up, who told you that, plus the reason why it isn't getting backed up.
Retention Patterns: The rules for media rotation and retention are detailed here.

Offsite Rotations: This is mostly procedural explaining

  • who the offsite supplier is
  • how tapes are scheduled to be sent, and returned from, offsite
  • what the normal schedule actually is
  • at a high level how to request unscheduled offsite returns, and details the location of the detailed instructions on making such requests
Unresolved Issues: any problems which can be categorized as "more important than trivial" which are not currently solved. The more important a problem is, the more detail there should be in the report.

Resolved Issues: any problems which can be categorized as "more important than trivial" which have been solved in the last six months.

Trivial Issues: any ongoing problems which are not important that someone just parachuting in to the backup responsibilities should know about (ie: the backup keeps failing on changing log files in /var/log, but we don't care because logs are pre-rotated prior to backups running).

Growth Potential: (if necessary) if I am aware of major storage purchases coming up, I list them and their potential backup implications.

Now That You Have The Report

Make a PDF of the report. Save the PDF and email it to the responsible person or people at the customer site. Save that email.

Schedule a meeting to go over the report with the responsible person or people. At some places this can be a 10-minute go-through of what is/isn't a backup source, open problems, and growth issues; at other places I've spent 60 minutes briefing a small group of people, power point included.

Between these two actions you have both informed at least your superior about the state of the backups, plus retained a record of that briefing.

The Important Question

During your meeting, be it the 10-minute quickie or the 60-minute grilling, ask repeatedly: is anyone aware of anything else that should be backed up?

This is a good question to ask anybody who comes to you with a problem involving data you've never known about. Ask:

  • is this data being backed up?
  • why not?
  • who is responsible for this data?
Then go find the responsible person, ask them the same questions, plus
  • what is the potential impact of the loss of (any/all) of this data?
It is a good idea to undergo this procedure whenever you find new caches of data on the network. Find an owner, ask the questions.

Depending on these answers, you may have a new target to add; you may merely have a new item to add to your "what isn't being backed up" list.

The Bottom Line

A cynic could see this as a cover-my-ass exercise, which at some level I suppose it is. I prefer to see this as an exercise in helping my customer avoid situations where my ass would otherwise require covering, if you follow me.

If done correctly, the customer will be informed as to the state of, and risks inherent to their company in, their backup system.

Realistically if things go to hell and recoveries from backups are not possible either because they have failed or because they were never done in the first place, I'm still going to get bounced to the curb. But if this process has been done correctly, I'm not likely to go there alone.

no comments | post comment
This is a collection of techical information, much of it learned the hard way. Consider it a lab book or a /info directory. I doubt much of it will be of use to anyone else.

Useful: | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt