Thursday 5 August 2010

Top 10 Backup and Recovery best practices

Top 10 Backup and Recovery best practices.


This document assumes that you are doing the Backup and Recovery basics
- Running in Archivelog mode
- multiplexing the controlfile
- Taking regular backups
- Periodically doing a complete restore to test your procedures. 

Questions and Answers

1. Turn on block checking.
REASON: The aim is to detect, very early the presence of corrupt blocks in the database.
This has a slight performance overhead, but will allow Oracle to detect early
corruption caused by underlying disk, storage system, or I/O system problems.

SQL> alter system set db_block_checking = true scope=both;


2. Turn on block tracking when using RMAN backups (if running 10g)
REASON: The block tracking file contains a bitmap that is used during incremental backups

Each bit represents an extent, so only those extents that have been modified will be read and backed up.

If this is not used, all blocks must be read to determine if they have been modified since the last backup.

SQL> alter database enable block change tracking using file ‘/u01/oradata/ora1/change_tracking.f’;


3. Duplex log groups and members and have more than one archive log dest.
REASON: If an archivelog is corrupted or lost, by having multiple copies in multiple locations,
the other logs will still be available and could be used.

If an online log is deleted or becomes corrupt, you will have another member that can be
used to recover if required.

SQL> alter system set log_archive_dest_2='location=/new/location/archive2' scope=both;
SQL> alter database add logfile member '/new/location/redo21.log' to group 1;


4. When backing up the database use the 'check logical' parameter
REASON: This will cause RMAN to check for logical corruption within a block as well as the normal
head/tail checksumming. This is the best way to ensure that you will get a good backup.

RMAN> backup check logical database plus archivelog delete input;


5. Test your backup.
REASON: This will do everything except actually restore the database. This is the best method to
determine if your backup is good and usable before being in a situation where it is
critical and issues exist.

RMAN> restore validate database;


6. Have each datafile in a single backup piece
REASON: When doing a partial restore RMAN must read through the entire piece to get the
datafile/archivelog requested. The smaller the backup piece the quicker the restore can
complete. This is especially relevent with tape backups of large databases or where the
restore is only on individual / few files.

RMAN> backup database filesperset 1 plus archivelog delete input;


7. Maintain your RMAN catalog/controlfile
REASON: Choose your retention policy carefully. Make sure that it compliments your tape subsystem
retention policy, requirements for backup recovery strategy. If not using a catalog,
ensure that your controlfile record keep time instance parameter matches your retention policy.

SQL> alter system set control_file_record_keep_time=21 scope=both;
This will keep 21 days of backup records.

Run regular catalog maintenance.
REASON: Delete obsolete will remove backups that are outside your retention policy.
If obsolete backups are not deleted, the catalog will continue to grow until performance
becomes an issue.

RMAN> delete obsolete;

REASON: crosschecking will check that the catalog/controlfile matches the physical backups.
If a backup is missing, it will set the piece to 'EXPIRED' so when a restore is started,
that it will not be eligible, and an earlier backup will be used. To remove the expired
backups from the catalog/controlfile use the delete expired command.

RMAN> crosscheck backup;
RMAN> delete expired backup;


8. Prepare for loss of controlfiles.

set autobackup on
REASON: This will ensure that you always have an up to date controlfile available that has been
taken at the end of the current backup not during.

RMAN> configure controlfile autobackup on;

keep your backup logs
REASON: The backup log contains parameters for your tape access, locations on controlfile backups
that can be utilised if complete loss occurs.

9. Test your recovery

REASON: During a recovery situation this will let you know how the recovery will go without
actually doing it, and can avoid having to restore source datafiles again.

SQL> recover database test;


10. Do not specify 'delete all input' when backing up archivelogs
REASON: Delete all input' will backup from one destination then delete both copies of the
archivelog where as 'delete input' will backup from one location and then delete what has
been backed up. The next backup will back up those from location 2 as well as new logs
from location 1, then delete all that are backed up. This means that you will have the
archivelogs since the last backup available on disk in location 2 (as well as backed up
once) and two copies backup up prior to the previous backup.

No comments: