Linux in a Windows World/Additional Server Programs/Network Backups

From WikiContent

< Linux in a Windows World | Additional Server Programs
Revision as of 23:06, 11 March 2008 by Docbook2Wiki (Talk)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search
Linux in a Windows World

Data storage technologies are imperfect. For this reason, it's wise to invest a small amount of time, effort, and expense in backing up your data, rather than risk spending crippling amounts of time, effort, and expense on recreating your data from scratch. If your primary data storage device dies, you'll have your backup, which can greatly reduce the recovery time. In a small office, backups may be performed without using the network—say, by using a portable tape backup unit and backing up each system directly. On a larger network, though, network backup tools can prove beneficial, and Linux can play a role in such systems. Linux can fit into the backup picture by providing an inexpensive platform to handle this task, along with tools of varying sophistication that can back up Linux, Unix, and Windows systems. Of course, backing up Linux itself (and other nonbackup Linux servers) is also important. Some Windows tools can do this, or you can use a Linux backup server to help out.

To begin this chapter, you should understand something of network backup strategies, such as what hardware is available, what types of backups are best suited to which situations, and so on. You must also understand how to back up a Linux system without using any network connections. This task is helpful in protecting the backup computer itself, and the skills involved transfer to some types of network backups. This chapter then looks at two specific network backup tools: Samba and the Advanced Maryland Automatic Network Disk Archiver (AMANDA).

Contents

Backup Strategies

To the uninitiated, computer backup can be an intimidating topic, filled with its own list of things that must be learned. These include backup hardware, complete and incremental backups, local and network backups, and client- versus server-initiated network backups. These topics all require at least minimal description before you can make an informed decision about how to set up a network backup system.

Backup Hardware

The first choice you must make when putting together a network backup solution is what type of hardware to use. The choices can be baffling because there are so many. If you want to use Linux with an existing backup device, you must consider Linux's compatibility with your hardware. In any event, backup hardware falls into several broad classes, each of which has many specific models and subtypes:

Tapes
Tape backup has historically been the most common form of backup medium, due largely to the low cost per gigabyte of tapes, their high capacities, and the fact that they're highly portable, which is a boon for storing some of your backups off-site. Tape, though, is an inconvenient backup medium because of its sequential-access nature, meaning that data must be read or written sequentially; you can't randomly seek to and read a particular file, as you can with disks and other random-access media. The price advantage of tapes is less advantageous in recent years, as hard disk prices have plummeted. Tape is less reliable than many other media; finding that a tape has lost some or all of its data is a sadly common occurrence. Tape is unusual because most mid-range and high-end tape drives provide built-in data compression features. In fact, manufacturers often advertise their typical capacities when using compression. Be sure to remember this fact when comparing tapes to other backup media.
Optical media
Optical media include CD-R, CD-RW, and various recordable DVD formats. These media have the advantages of being extremely common and inexpensive, but their capacities (even of DVDs) are low, at least for full network backups. Nonetheless, optical media can be important for backing up individual projects or for creating basic desktop system recovery disks.
Removable disks
This category includes floppy disks, magneto-optical (MO) disks, Zip disks, Jaz disks, and similar devices. These devices use technologies similar to those of hard disks (MO disks are a cross between magnetic disk and optical technologies, though), but individual disks can be removed from their drives for storage or transport to other computers with compatible drives. As computer backup tools, however, they're poor choices because the media are expensive and usually low in capacity. These disks work well for backing up individual users' files or specific projects.
Removable hard disks
A variant on the removable disk idea is to place a hard disk in a special housing that enables it to be easily removed. This can be either an external disk that connects to the computer using SCSI, IEEE-1394, or USB-2.0 connectors, or an internal disk with a special mounting bay. Removable hard disks have the advantage of fast random access and, increasingly, low cost. Hard disks are fairly delicate, though, so they aren't good for routine transport between sites; the risk of a shock causing damage is too great.

Of these broad classes, tape is still the medium of choice for backing up entire networks, but the initial cost can be high. A high-capacity single-tape drive can cost over $1,000, and a tape changer, which automatically changes several tapes, enabling you to treat several as one, is even more expensive. However, high-end tape formats, use tape media that are relatively inexpensive—typically about $1 per gigabyte, uncompressed. Individual tape capacities range from 4 GB to 160 GB uncompressed, for current models.

Removable hard disks have fallen in price enough that they're now competitive with tape, particularly for small sites. A typical removable disk system costs about $100, with extra trays going for another $50 or so. You'll need one tray for each hard disk you use, which is likely to raise the price for the media (tray plus disk) to $1 per gigabyte or thereabouts, at least in early 2005. Hard disk capacities, of course, compete with those of tapes.

Removable disks (other than hard disks) and optical media simply lack the capacity to be used for full network backups, or even for full backups of individual servers or desktop systems. You might still want to use them as part of your backup plan, however. For instance, if your desktop systems hold an OS but little or no user data (that is, if you store user data on a server), you can create CD-R or recordable DVD backups of your OS installations when you first set the systems up or when you perform major OS upgrades, then omit these computers from your normal backup schedules. If your OS installations are small enough, they might fit (with compression) on a single CD-R, and almost certainly on a recordable DVD. Because most desktop systems have CD-ROM drives, and many now have DVD-ROM drives, you can restore these backups without using the network, which greatly simplifies the restore process. You could also use this approach in conjunction with selective network backups of user data directories (such as /home on a Linux desktop system) to protect data stored on users' desktop systems.

If you elect to use tapes for some or all of your backup needs, you must choose a tape format. Quite a few exist, with varying capacities, prices, and speed. Table 14-1 summarizes some of the more common tape formats. Prices in this table were taken from Internet retailers in late summer 2004; they may change by the time you read this. Also, existing tape formats are often extended to support higher capacities, and new formats are periodically introduced. Thus, you may find something better suited to your needs than anything described here. Table 14-1 summarizes drives that are currently on the market and tapes for these drives; tapes for lower-capacity variants of these units are still available and may cost less than indicated here. This table also shows prices for single-tape units; changers for many of these formats are also available, but cost more.

Table 14-1. Common tape formats

Drive type Drive price Media price Uncompressed capacity Speed
Travan $250-550 $30-50 10-20 GB 0.6-2 MB/s
DAT/DDS $400-1,200 $5-30 4-20 GB 1.5-5 MB/s
8mm $800-4,000 $8-90 7-60 GB 3-12 MB/s
VXA $600-1,300 $30-100 33-80 GB 3-6 MB/s
AIT $800-3,800 $75-120 35-100 GB 3-12 MB/s
DLT and SuperDLT $800-4,700 $50-170 80-160 GB 3-16 MB/s


One more consideration in your choice of backup hardware is how the hardware interacts with software. Removable disks and removable hard disks can be accessed like internal hard disks, by creating a filesystem on the disk and copying files to the disk. You can also compress files and store them in carrier archives, such as tarballs. Tapes must be accessed using special tape device files, which provide sequential access to the drive. Typically, files are backed up using a carrier archive file. Optical media are usually written using a special program, such as cdrecord, which writes the entire disc's contents at once. The disc usually holds a filesystem, though, so that it can be read as if it were an ordinary magnetic disk. Some software enables more direct read/write access to the drive, but it is still relatively new in Linux and may not be suitable for backup purposes. In all cases, using a carrier archive file can help preserve file permissions, time stamps, and so on, even if the carrier file isn't a strict requirement.

Complete Versus Incremental Backups

One of the difficult questions you must answer when designing a backup solution is how much to back up. Most computers hold gigabytes of data, but only some of that data changes frequently. For instance, most executable program files change infrequently. Even many user data files can go unchanged for extended periods of time. Thus, if you can identify the changed files and update them without updating unchanged files, you can save considerable time (and backup media space) on your backups. Doing this is called an incremental backup , which contrasts with a complete backup or full backup , in which every file is backed up.

Incremental backups sound like a great idea, but they do have a drawback: they complicate restores. Suppose for the sake of argument that you perform a complete backup on Monday and an incremental backup every day thereafter. If the hard disk dies on Friday, you need to restore Monday's full backup followed by either every intervening incremental backup or the last one, depending on whether the incremental backups copy files that have changed since the last backup of any type or just the last full backup. What's more, your restored system will have files that might have been intentionally deleted during the week. This can cause serious problems if the system sees heavy turnover in large files, such as if users routinely create and then quickly destroy large multimedia files. (Some backup packages can spot such deletions and handle them automatically, but not all backup software can do this.) These problems become more severe the longer you go between full backups.

Generally speaking, using a small number of incremental backups between full backups can be a great time-saver. For instance, on critical systems that see lots of activity, you might perform a weekly full backup and a daily incremental backup. A less busy or less critical system might manage with monthly full backups and weekly incremental backups.

Given these examples, you may be wondering just how often you need to perform backups. There's no easy answer to this question because it depends on your own needs. You should ask yourself how much trouble a complete system failure would cause and design a backup schedule from there. For instance, if losing a single day's work would be a major hassle, that system should be backed up daily; however, if losing even a week's worth of data would not be a major inconvenience, weekly or even less frequent backups might suffice. The answer to this question, of course, can vary from one system to another; a major file server might need daily backups, whereas desktop computers might need much less frequent backups, or even none at all if they just hold stock OS installations.

Local Versus Network Backups

Much of the preceding description has assumed that individual computers are being backed up. You can certainly back up computers one by one, equipping each one with its own backup hardware or using portable backup hardware that you can move between computers. This is likely to be tedious and expensive, though. When it comes to users' desktop systems, getting them to perform backups can be difficult. One solution to these problems is to perform network backups. These use network protocols to transfer data from the system being backed up (the backup client ) to the computer that holds the backup hardware (the backup server).

The main advantages of performing network backups are reduced hardware cost and the potential for simplified backup administration. This second advantage has a corollary: because backups are likely to be less tedious, they're more likely to be done. On the other hand, network backups have certain disadvantages: they can consume a great deal of network bandwidth, they require larger backup storage devices than do individual backups, they require careful planning so as to operate smoothly, and they may require overcoming cross-platform differences (such as Linux versus Windows filename conventions).

Overall, network backups are worth doing on all but the smallest networks—or at least, on any network with more than a tiny number of computers that are worth backing up. Typically, your first priority will be your servers, followed by workstations on which users store their data files. You may want to create your own priority list, though; knowing what's most important on your own network will help you plan what hardware to buy and what software will best back up the data.

The backup server computer itself can be fairly unassuming, aside from its backup device and a decent network connection. The computer most likely won't be running any RAM-intensive programs. (Some high-end backup software uses large RAM buffers, however.) If you compress your backups, the CPU might need to be adequate to back up the data, but this task won't strain a CPU unless you've paired it with much more modern network and data storage systems. You might be tempted to equip a major file server with the backup hardware and make it your backup server, and this does have the advantage of simplifying the backup of this important server. On the other hand, it also imposes an extra load on the file server, both in terms of CPU (particularly if you use it to compress data) and network bandwidth. This might be acceptable if you expect to be able to fully complete backups in off hours, but if you expect your backups to occur partly when the network is in use, you might want to use a dedicated backup server. Also, a backup server may have increased vulnerability to certain types of attack, so placing it on its own computer can have security implications compared to having a file server do double duty.

Client- Versus Server-Initiated Backups

When doing network backups, one critical detail is which system controls the backup process: the backup server or the backup client. Both approaches have several consequences:

Scheduling
When the backup server initiates the backup process, it can do so in a way that makes scheduling sense for the network as a whole, and you can specify this schedule from a single computer (namely, the backup server). When the backup client initiates the process, by contrast, scheduling can become difficult, because the possibility of conflicts increases dramatically. This is particularly true if backups are performed on an as-needed basis rather than being strictly scheduled.
Computer availability
When the backup server initiates the backup process, the backup clients must be turned on and available for backup whenever the server does its job. This might be a hassle when backing up desktop computers, which are often powered down at night or over the weekend. When the backup clients initiate the process, though, the server must be available at all times, or at least at scheduled backup times. Because this requirement is placed on just one computer, it's usually less onerous.
Security
When the backup server initiates the backup, the backup client computers must all run a server to respond to backup requests. This server is a potential security risk, making the backup clients vulnerable to outside intrusion. When the backup client initiates the backup, by contrast, it means that the backup server must run a server program. Again, this is a potential security risk, but it applies to just one computer. (The client's files must typically be accessed using a program running as root or its equivalent, but this program need not respond to outside accesses, and therefore needn't be as much of a security risk.) Thus, server-initiated backups can be more of a risk to your network as a whole, particularly if the server software used for backups isn't something you'd otherwise run. (Some backup methods, however, use protocols, such as SMB/CIFS, that you might use even if you didn't perform network backups.)

Tip

Network backups use the terms client and server in an unusual way. Typically, the backup server is the computer that houses the backup hardware, and the backup client is the computer that holds data to be backed up. When the backup client initiates the backup, the client/server relationship is as you'd expect; however, when the backup server initiates the backup, the backup client runs network server software, and the backup server runs network client software. This relationship can be confusing if you're unfamiliar with the terminology.

Both client- and server-initiated backups have their uses. Broadly speaking, client-initiated backups work best on small networks with few users and irregular backup schedules, such as in a business with half a dozen employees. As the number of computers grows, though, the scheduling hassles of client-initiated backups become virtually impossible to manage, so server-initiated backups become preferable. You might also prefer server-initiated backups even on a small network because of software features of specific packages or for other reasons; don't feel compelled to use a client-initiated backup strategy on a small network.

Backup Pitfalls

Backups don't always proceed as planned. Worse, restores don't always work the way you expect, and a backup is useless if you can't restore it. Some common problems, particularly in cross-platform network backups, include:

Network bandwidth consumption
Backing up over the network necessarily consumes a certain amount of bandwidth. Ideally, you should schedule backups during off hours to minimize the impact of this activity on day-to-day work.
Metadata support
Every filesystem supports its own types of metadata (data about files, such as file creation times and permissions), and not all backup tools support all the metadata you need. This issue comes up again later in this chapter.
In-use files
Sometimes it's not possible to read a file that's in use by another program, or a file's backup may be corrupted if it was being modified at the moment of the backup. This can cause problems with such Windows files as the Registry, the Outlook mail file, and files used by Microsoft Exchange. One radical solution is to shut down the system and boot it into a secondary OS installation (of the same OS or of another one) for backup, but this is a disruptive process. Some program-specific solutions exist, such as creating backups of the affected files from the programs that create them. These backups should then be handled by the backup software and can be restored to the main files if it becomes necessary.
Restore glitches
No matter what backup solution you choose, you should perform periodic tests of your ability to restore data, simply to ensure that it can be done. Unused and untested procedures have a tendency to "rot" as you upgrade software, rendering a formerly working procedure inoperable.

Unfortunately, backup pitfalls can be very site-specific because they often involve details of your own network, the systems you're backing up, your backup hardware, and the programs you use (both for backup and on the systems being backed up). You may need to rely on testing and experience to discover these problems, then try to find a solution on the Web or in some other way. This is why testing your backups is so critically important; it's far better to discover problems before you need to restore data than after such a restore is needed!

Backing Up the Linux System

The backup server itself should be backed up, which constitutes a local backup procedure. Certain Linux network backup tools also resemble the local backup procedures. For these reasons, you should understand how to perform a local backup. This involves knowing what backup packages are available and how to use at least one. (I describe the tar command, which is often used when backing up to disk and tape media.) Because optical media are particularly complex, I also describe them in more detail. Finally, no backup is complete unless you can restore data from it, so I describe how to do this.

A Rundown of Linux Backup Packages

Backing up a computer is essentially a matter of copying files. Backup, though, presents certain unique challenges that aren't present in many other file-copying operations. One of these is the preservation of file metadata. Some file copying techniques lose some types of metadata, but backup tools tend to preserve more metadata. Another unique backup challenge is use of tapes, CD-R drives, and other unusual media used for backups. Most Linux backup packages are either designed for use with tapes as well as or instead of disk files, or they use additional programs to help store the data on the backup media. Finally, backup media are often of limited capacity, so a method of compression is desirable. Some Linux backup tools include compression algorithms, but others rely on additional programs, such as gzip or bzip2, to compress a backup archive file before sending it to the backup medium.

Numerous programs can be used for backing up a Linux system. Some of the more popular of these include:

tar
This program, which is a standard part of all major Linux distributions, is a simple but popular backup tool. It's described in more detail in the next section. This program performs backups and restores on a file-by-file basis, placing all files in a carrier file. It's also frequently used to create tarballs, which are disk-based archives of files that can be moved across a network, placed on removable media, and so on. Tarballs are commonly used to distribute program source and executable files.
cpio
The cpio program is conceptually similar to tar, in that it's a file-by-file backup tool that creates an archive file. This file can be compressed or copied to a backup medium.
dump
The dump program is another file-by-file copying program; however, dump is tied to a specific filesystem, such as ext2fs or XFS. It reads filesystem data structures at a lower level than tar or cpio, and can therefore back up files in a slightly less intrusive way. Unfortunately, versions of dump are not available for all filesystems; in 2004, only ext2fs/ext3fs and XFS have dump programs, of common Linux filesystems. Worse, with 2.4.x and later kernels, dump may not work reliably, so it shouldn't be used. (See http://lwn.net/2001/0503/a/lt-dump.php3 for a mailing list message from Linus Torvalds on this subject.) To restore data backed up using dump, you must use a separate restore program.
Partition Image
This program works at a still lower level than dump; instead of backing up individual files, it backs up disk sectors that are marked as being used. This method of operation means that Partition Image is tied to the filesystem you use. As of Version 0.6.4, stable filesystems are ext2fs/ext3fs, ReiserFS, JFS, XFS, FAT, and HPFS. UFS and HFS are considered beta, while NTFS support is marked as experimental. This package can only back up and restore an entire partition, which makes it most useful for creating images of just-installed desktop systems and the like, rather than backups from which individual files might need to be retrieved in the future. You can learn more at http://www.partimage.org.
cp
Although the Linux file copy command, cp, is seldom considered a backup tool, it can be used in this capacity, particularly with removable disk and removable hard disk media. Using the -a parameter performs a recursive copy that preserves most file metadata. Because cp performs a file-by-file copy without using a carrier file, it's most useful for backing up relatively limited numbers of files to removable disks.
BRU
The Backup and Recovery Utility is a commercial backup tool for Linux and other Unix-like systems. It includes compression and provides easier file restore operations than are available from most open source backup programs. It also ships with a GUI, although you can use command-line tools, as well. Check http://www.bru.com for details.
Veritas
Veritas (http://www.veritas.com) offers a line of commercial network-enabled backup products for Linux, Windows, and other platforms.
Legato
Legato (http://www.legato.com), like Veritas, offers commercial network backup products for Linux, Windows, and other platforms.

Most of these programs store data in archive files. In Linux, tape drives are accessed as files, so you can use these programs to back up data directly to tape. You can also apply compression by using gzip, bzip2, or a similar tool to the archive file. Most of these programs provide a means to do so automatically by adding a special command-line parameter.

These programs can all be used to back up a single computer, although with certain additions, they can be used for network backups. (The upcoming sections describe some of these capabilities.) In addition, some network-centric backup programs are available. One of these is described in Section 14.4.

Using tar for Tape and Disk Backups

All major Linux distributions ship with a version of tar that's part of the GNU's Not Unix (GNU) project. This version of tar is similar to commercial versions of tar that ship with commercial versions of Unix, but a few commands differ slightly. GNU tar can read most other tar archives, but the reverse isn't usually true.

GNU tar takes precisely one function and any number of options as arguments, along with a list of files or directories. The available functions are described in Table 14-2, while Table 14-3 shows the most common tar options. Some options also take their own arguments, as detailed in Table 14-3.

Table 14-2. Available tar functions

Function Abbreviation Description
--create c Creates an archive.
--concatenate A Links together two tarballs.
--append r Adds files to the end of an existing archive.
--diff or --compare d Finds differences between files on disk and those in an archive.
--list t Displays the contents of an archive.
--extract or --get x Extracts files from an archive.
--delete - Deletes files from an archive (can't be used on archives stored on tape).


Table 14-3. Common tar options

Option Abbreviation Description
--directorydir C Performs operations in the specified directory (dir) rather than in the current directory.
--file [host:]file f Creates or uses the specified archive file. If the host is specified, tar uses the file on that system.
--listed-incrementalfile g Causes tar to perform an incremental backup, using file as a list of files from the last backup.
--one-file-system l Restricts the backup to a single filesystem (disk partition or other device).
--multi-volume M Performs a backup across multiple media.
--tape-lengthlength L Used with --multi-volume; specifies the length of each individual tape, in kilobytes.
--same-permissions p Preserves all possible file metadata.
--absolute-paths P Stores filenames with their leading slashes (/) or other directory indicators.
--verbose v Lists filenames as they're stored or extracted. When used with the function --list, adds ownership, time stamp, and file size information.
--verify W Verifies newly created archives (similar to running --diff on a second pass).
--excludefile - Prevents file from being backed up or restored.
--exclude-fromfile X Prevents files listed in file from being backed up or restored.
--gzip or --gunzip z Uses gzip to process the archive.
--bzip2 j Uses bzip2 to process the archive.


In use, you specify the function, one or more options, and any required arguments, including a pointer to the directories or files you want to back up:

# tar --create --verbose --one-file-system --file /dev/st0 /home / /usr
               

You can state the same command more succinctly using abbreviations:

# tar cvlf /dev/st0 / /home /usr
               

Tip

Some non-GNU versions of tar require a dash (-) before the abbreviated functions and options, as in tar -cvlf. GNU tar can work with or without the dash.

For system backup purposes, tar is ordinarily run as root, because only root is guaranteed read access to all ordinary files. You may also need root privileges to write to your backup device. Non-root users can run tar to create tarballs in their own directories or to back up files to a backup medium if they have write privileges to the device.

This command looks simple enough, even if it's fairly long in the nonabbreviated form. It does deserve some explanations, though:

Archive filename
This command uses /dev/st0 as the archive's filename. This filename corresponds to a rewinding SCSI tape device, which automatically rewinds after every operation. A nonrewinding SCSI tape device, which might be used when packing multiple archives on a single tape in an incremental backup scheme, is /dev/nst0. ATA tape devices use the device filenames /dev/ht0 and /dev/nht0 for rewinding and nonrewinding devices, respectively. If you back up to a removable hard disk, you can use a similar command, but you specify a partition on the disk (such as /dev/hde5) or a filename on a mounted disk filesystem (such as /mnt/backup/05-05-backup.tar).
Compression
This example command didn't include the --gzip or --bzip2 options. The idea is that the tape device probably provides its own compression. When backing up to a disk backup device, chances are you'd enable compression.

Warning

Because tape backups are less reliable than some other media, using compression with tape can be risky. This is particularly true of tar's --gzip and --bzip2 options, which compress an entire archive in such a way that a read error can make all subsequent data unrecoverable. Tape drives' built-in compression usually causes fewer problems when recovering subsequent data from a corrupt archive.

Limiting backups
The --one-file-system option prevents backup of data from partitions that aren't explicitly listed as backup targets. This option is often used as a means of preventing backup of mounted removable media and the /proc filesystem, which holds pseudo-files that could cause real problems when restored. Alternatively, you could use --exclude or --exclude-from to explicitly exclude such directories from being backed up.
Backup order
The order of the directories in the backup command is potentially important. This example backs up the /home directory first, followed by root (/) and /usr. Because tape is a sequential-access medium, restores must read all preceding data, which means that you want the directories with files that are most likely to need recovery to appear first. In this example, the idea is that users might accidentally delete files and request their recovery, so you want those files to be first in the archive. You might have other priorities depending on your needs, though.

The preceding tar command creates a full backup—or at least, a full backup of the specified directories. Each backup uses the --listed-incremental option to point to a log file. On the first backup, this file is empty or nonexistent, which results in a full backup. For subsequent backups, you have two choices:

  • After the full backup, you can copy the log file to a backup location. After each backup except for the first, you then copy the copied file over the log file. The end result is that each incremental backup will be done relative to the original full backup. These backups will grow in size as time goes on and changes accumulate, but they'll be relatively simple to restore because you'll only need to deal with the full backup and the latest incremental backup.
  • You can issue precisely the same command every time without changing the log file. The result is that every backup will be an incremental backup relative to the last incremental backup. This backup style is sometimes called a differential backup. On average, each differential backup will be the same size as the others, but restoring data may require accessing multiple backups.

A backup solution that uses tar is likely to rely on scripts you write yourself for your specific need. A simple backup script might contain nothing more than a single call to tar with appropriate parameters to perform a full backup of your system. A more complete script might include housekeeping commands, such as commands to copy log files for incremental backups or to use mt to skip over intervening backups on a tape, as described in the sidebar . A still more complete script can accept parameters to specify a full or incremental backup or to set other site-specific options. Backup scripts like this may be called from cron jobs in order to perform backups on a regular basis. Of course, you must be sure that the correct tape is in the drive!

Backing Up to Optical Media

Optical media pose certain special challenges. Where you can use tar, cpio, or most other backup programs to create archive files on disk partitions or to store archives on tape, direct read/write access to optical media requires the use of special programs, such as cdrecord or cdrdao. These programs ship with all major Linux distributions, but integrating them into your backup plans requires extra effort.

Tip

Tools to provide disk-like direct read/write access to optical media have been making slow inroads in the Linux world. GUI desktop environments often provide such access via their file managers, for instance. Such tools are still difficult or impossible to use as full backup solutions, although of course you can drag-and-drop individual files and directories to the media in this way. This can be a good way to back up individual project files or the like, but not an entire computer.

Several approaches to optical media backups exist:

Backup archive direct to media
The first approach to using optical media is to treat these media much like a tape: store a tarball (or other archive file) directly to the optical medium. Typically, you'll create a tarball on disk and then use cdrecord to copy it to the optical disc, or you can pipe the output of tar directly to cdrecord. This approach has the drawback that non-Unix OSs may have a hard time reading the backup. On the other hand, instructions for doing tape backups and restores need relatively few changes. Restores work precisely as they do for tapes, except that you specify a CD-ROM device's filename rather than a tape device's filename, and mt isn't used.
Backup archive on carrier filesystem
A variant on the preceding approach is to store tarballs (or other archive files) on a filesystem, which is recorded to the optical disc. To do this, you create a tarball on disk, create an ISO-9660 filesystem containing that tarball using mkisofs, and then record the ISO-9660 filesystem to the optical disc using cdrecord. (You can pipe some of these operations together or use GUI tools, such as X-CD-Roast, to help with some parts of the job.) This approach is more complex initially, but it makes the archive easier to access from non-Linux systems. You can also include text files (perhaps including an index of files in the tarball) or other explanatory materials in the disc's filesystem, which can make access easier. Because most people and OSs expect optical discs to have ISO-9660 or other filesystems, this approach is less likely to cause confusion when accessing the media in the future.
Backup files on optical filesystem
The final backup method is to store files directly on an optical disc's ISO-9660 filesystem. To do this, you use normal CD-R creation tools, such as mkisofs and cdrecord, or GUI frontends to these tools, such as X-CD-Roast. This approach makes recovery of arbitrary files relatively easy; you can mount the disc and access the files just as you would the original files on the hard disk. The drawback is that you'll lose some file metadata. (Precisely how much you lose depends on the options you choose.)

Tip

If you back up files directly to an optical disc's filesystem, use the -R option to mkisofs, rather than -r. Using the uppercase version of this option preserves more file metadata, including write permission bits. This is most important for performing system backups; for backing up smaller sets of data, using -r may be preferable, particularly if you don't know who'll be reading the data. Using -J or -hfs to generate Joliet or HFS filesystems won't hurt, but they won't provide any real benefit, either, at least not if Linux is to read the backup. If non-Linux systems will read the data, using one or both of these options may be helpful.

Generally speaking, storing backups in a carrier archive on an optical disc's own filesystem is the best way to perform system backups to these media. For backing up project files or the like, though, storing them directly on the optical disc's filesystem, without a carrier file, is often the best way to proceed; this enables the quickest access to the individual files.

To perform a backup using a carrier archive inside a filesystem, you must run tar, mkisofs, and cdrecord in sequence:

# tar cvzlf /tmp/bu/backup.tgz / /home /usr
# mkisofs -r -o /tmp/backup.iso /tmp/bu
# cdrecord dev=0,6,0 speed=8 /tmp/backup.iso
               

These commands presuppose that the temporary backup directory (/tmp/bu) exists and holds no extraneous files. (You could store files there that describe the backup, if you like.) You might also want to make adjustments for your specific needs, such as changing the SCSI device ID (dev=0,6,0) or speed (speed=8) passed to cdrecord to suit your hardware.

Tip

The optical recorder specification passed to cdrecord is peculiar. The form shown in the preceding example is used for SCSI devices and takes the form bus,target,LUN, where bus is the SCSI bus (typically, the SCSI adapter number), target is the SCSI ID number of the drive, and LUN is the logical unit number (LUN), which is typically 0. Through the 2.4.x Linux kernel, even ATAPI optical drives were accessed as SCSI devices, using the kernel's ATA SCSI emulation layer. With the 2.6.x and later kernels, though, you can access ATAPI drives directly, using a Linux device file as the device specification, as in dev=/dev/hdc.

After running these commands, you'll have two temporary files on your hard disk: the tarball and the ISO-9660 image file. Remember to delete them both. If you like, you can pipe the last two commands together to bypass the creation of the ISO-9660 image file:

# mkisofs -r /tmp/bu | cdrecord dev=0,6,0 speed=8 -
               

Be sure to include that trailing dash (-) because it tells cdrecord to accept the previous command's output as its input.

Restoring Data Locally

No backup will do you any good unless you can restore the data. Broadly speaking, data restores fall into two categories:

Partial restores
In a partial restore, you need to restore only a few files to a system that's basically functional. The files could be user datafiles or system files, but they're not critical to the basic functioning of the computer or its backup and restore software. To perform a partial restore, you can basically run the backup process in reverse, although specifying the precise files can be tricky, as described shortly.
Full restores
In a full restore, you need to restore all of a computer's files. These are typically necessary when a hard disk fails completely, when a computer is stolen, or when you intentionally replace one computer with a new one. Full restores are much trickier than partial restores because you need some way to run the restore software on a computer that holds no OS. Thus, you must carefully plan how to perform your full restore before the need arises. Attempting to plan the restore when a server has crashed, and your boss is demanding it be restored immediately, is stress-inducing and will result in wasted time as you try to work out a solution.

To begin planning a restore, start with some deliberate partial restores. Try backing up a test directory and then restoring it using the backup software's restore feature (such as tar's --extract function). A trickier variant is restoring just some of the files. In the case of tar, you must specify the files or directories to be restored, much as you specify the files or directories you want to back up:

# tar xvlf /dev/st0 home/linnaeus/gingko/biloba.txt
               

This command extracts the file home/linnaeus/gingko/biloba.txt from the backup archive to its original location. You can as easily specify a directory or a set of individual files. A couple of details of this command require elaboration, though:

  • The leading slash (/) in the file specification is missing. This is because tar normally omits this feature of the filename. If you provide a leading slash but they aren't recorded in the archive, tar will fail to restore the file. This can be a time-consuming mistake to make because tar can take minutes or hours to scan the entire archive before finishing, with no file restored.
  • Because tar restores files using the filenames recorded in the archive, and because the leading slash is normally missing, files are restored relative to the current directory. Thus, in most cases, you must execute the restore command from the root (/) directory to restore them to their correct locations. Alternatively, you can restore the files to a temporary location and then move them elsewhere.

A tricky part about partial restores, particularly with simple programs such as tar, is in specifying the file that's to be restored. If you mistype the filename, tar won't restore it and won't provide any helpful error messages. This can be particularly frustrating if you don't know the exact filename.

Tip

If you perform incremental backups, you can use the incremental backup log to scan files for a precise match to a given filename. Even if you don't perform incremental backups, you can pipe the output of tar using the --verbose option to a file and use it to help locate files. If you have only a vague notion of what the correct filename is and have no record of it, you can use the --list function to tar to create a file list similar to what might be produced at backup. This can, however, take as long to complete as a full backup.

In principle, full restores work just like partial restores, except that you don't provide a file specification, which lead tar to restore everything in its backup. (You can exclude some individual files or directories if you like, though.) The tricky part is in running Linux on a computer whose OS has been wiped out in some way. Several ways of handling this chicken-and-egg problem exist:

Emergency disk
You can create an emergency disk that enables you to boot a minimal Linux system and direct the restore process much as if you were running a partial restore. You can either prepare your own emergency disk system or locate one on the Internet. Several options for the latter exist, ranging from floppy-based systems to Linux systems that boot from CD-ROM. Examples include Tom's Root/Boot (a.k.a. tomsrtbt, http://www.toms.net/rb/), a floppy-based system; ZipSlack (http://www.slackware.com/zipslack/), a variant of Slackware designed to fit on a 100-MB Zip disk; and Knoppix (http://www.knoppix.org/), a Debian variant that boots from a CD-R. Many other variants exist, as well; a web search on keywords that are important to you may turn up helpful pointers. If you have specific needs, such as an ability to restore using particular software, be sure that your needs are met by the option you pick, or create your own custom variant that includes the software you need.
Emergency OS installation
Some administrators like to create a minimal emergency OS installation alongside the regular OS installation. This practice enables you to boot the emergency installation in case of a serious problem with the main installation. This practice requires extra planning beforehand, though, and it won't help in case of a complete hard disk failure, system theft, or other catastrophic problems. It can, however, be a helpful approach in case of massive filesystem corruption or other problems that don't damage the emergency system.
Partial OS bootstrap
You can reinstall the core OS files and use this system to restore your main system. When doing a truly full restore, this practice works best if you reinstall your OS as a secondary OS, much like an emergency OS installation; trying to restore a backup over a working OS is an iffy proposition because you might be left with a bizarre mish-mash of files. Alternatively, you can reinstall the OS and all its files, and then perform a partial restore of user files alone. This approach works well if you want to upgrade to a newer version of your distribution or to another distribution, but it's likely to entail additional effort in reconfiguring your new OS installation.
Second computer assist
You can enlist the aid of another computer in your restore procedures. Place a new hard disk and a backup device in an existing Linux system and use that system to restore your failed system's files to the new hard disk. You can then move the new hard disk to the target computer and reboot it into the restored OS. This approach is conceptually similar to using an emergency OS or an emergency disk, but it uses an entirely separate computer as a key component. Juggling the physical disks can be tedious, though, and you may run into problems related to the way the two computers handle the disk's cylinder/head/sector (CHS) geometry; if they don't match, some disk utilities will complain.

In all these cases, one particular challenge is in restoring the system to a bootable state. The safest way to proceed is usually to place a copy of the restored system's kernel on a floppy disk or a small FAT partition and use a utility such as LOADLIN.EXE (a DOS program to boot Linux) to boot the kernel. This should get you into a working Linux system, from where you can reinstall the Linux Loader (LILO) or the Grand Unified Boot Loader (GRUB) to boot Linux normally. Most Linux distributions provide GUI utilities to help with these tasks, or you can reinstall the boot loader by using command-line tools. LILO can be reinstalled by typing lilo, although if you've changed your partition layout, you may need to edit /etc/lilo.conf first. Similarly, typing grub-install often installs GRUB, although in some cases you may need to edit /boot/grub/grub.conf or /boot/grub/boot.lst or use the grub utility to install it with special options. Consult the LILO or GRUB documentation if you have problems.

Backing Up with Samba

One of the conceptually simplest network backup tools is Samba, the network file and printer sharing program described in Part II. Using Samba enables you to back up Windows computers using either client- or server-initiated backup procedures. You can also perform client-initiated backups of Linux and other non-Windows computers using Samba, although server-initiated backups of Linux systems are tedious when done with Samba.

Before proceeding further, you should understand the basic features and uses of Samba backups. These determine the advantages and disadvantages of using Samba as part of the backup picture. This chapter also presents two basic Samba backup scenarios: using a backup share for client-initiated backups and using smbtar for server-initiated backups.

Tip

The following pages presuppose at least some familiarity with Samba basics. If you know little about Samba, you should read Part II or at least Chapter 3 and Chapter 4.

Pluses and Minuses of Samba Backups

Samba is a Linux implementation of the SMB/CIFS protocol—the default file-sharing protocol for Windows. Although Samba is frequently considered a server package, it includes client tools. Thus, Samba can be used as part of either a client-initiated (using Samba server tools) or a server-initiated (using Samba client tools) network backup design.

SMB/CIFS supports common Windows filesystem metadata, but it provides limited support for Unix-style ownership, permissions, and other metadata. Thus, SMB/CIFS can be a good way to back up files from Windows systems while preserving metadata, but SMB/CIFS is a poor way to back up Linux or Unix systems. If you transfer data in a carrier file, though, such as creating a tarball on a Linux system and then using SMB/CIFS to copy the tarball across the network and onto a backup device, SMB/CIFS provides no inherent problems relating to preservation of file metadata; that information is stored within the tarball. Such a backup approach is best handled in a client-initiated backup procedure, though, which is why Samba and SMB/CIFS make a poor choice for backing up Linux systems using server-initiated backup methods.

Although SMB/CIFS supports Windows filesystem metadata, Linux doesn't. Samba provides ways to map most important Windows filesystem metadata onto Linux filesystem metadata that Windows can't use. Thus, if you use Samba with identical configurations on backup and restore, chances are you won't lose any important filesystem metadata when restoring data. One exception to this rule is certain advanced NTFS features, such as multiple data streams and (depending on your Samba server's options) support for ownership and ACLs. Thus, you may lose some metadata when backing up a Windows system that uses NTFS in a server-initiated backup or even in some types of client-initiated backup. However, client-initiated backups that use Windows-specific backup software can preserve these metadata.

Another problem with SMB/CIFS backups is that restoring data to the backup client can be tricky, particularly in the case of a complete restore. This topic is covered in more detail later, in the section Section 14.3.4.

On the plus side, support for SMB/CIFS is free in both Windows and Linux. Thus, implementing a Samba-based backup solution can be inexpensive, particularly if you're willing to invest some time in creating appropriate backup scripts. In fact, Samba ships with a tool that's specifically designed with backup in mind: smbtar, which is described shortly.

Because of Samba's support for FAT-style metadata, Samba can be a good way to back up all the data from systems that continue to use FAT. Even some Windows NT/200x/XP systems use FAT, and many of those that use NTFS don't rely heavily on NTFS-specific metadata. Thus, you may be able to back up such systems, or at least their user data, without risking undue loss of file metadata on restore.

Using a Samba Backup Share

The first approach to backup using Samba is to create a special Samba share for the purpose. This Samba share is then accessed from the backup client in a client-initiated backup scenario. Typically, the share accepts files (either directly or in a carrier archive, such as a tarball) and then copies them to a backup device.

Creating a backup share

Broadly speaking, you can design a backup share in any of three ways:

  • The share may point directly to the backup device. This approach works only with removable disk or removable hard disk media; you can't point a Samba share directly at an optical disc or tape device. Typically, users then copy their files, either raw or in a compressed carrier archive, to the backup device. The share often includes mechanisms to automatically mount and unmount the backup device, as described shortly.
  • The share points to a holding area in which users copy their files. When the connection is terminated, Samba runs a script that backs up the share using tar, cpio, cdrecord, or other Linux backup tools.
  • The share accepts a prepared carrier archive from the backup client and copies it to a backup medium. This approach handles any metadata the client's backup tools can handle, so Samba's metadata limitations aren't an issue.

From a Samba perspective, the simplest type of backup share is the first: create an ordinary file share that points to your removable disk's mount point. The removable disk can use any common Linux filesystem. You can even use FAT if you think the disk might be read directly by Windows or some other OS in the future, but ironically, using FAT will cause some Windows metadata—such as archive, hidden, and system bits—to be lost. The tricky part of this type of backup share is mounting and unmounting it. One approach is to use Samba's preexec and postexec configuration parameters. These reside in the smb.conf file's share definition and point to commands that Samba executes when the user connects to or disconnects from, respectively, the share. For instance, a complete backup share might look like this:

[backup]
   comment = Direct-Access Backup Share
   directory = /home/samba/backup
   max connections = 1
   read only = No
   preexec = mount /home/samba/backup
   postexec = umount /home/samba/backup

The preexec parameter mounts a removable medium to /home/samba/backup. This mount point must be properly defined in /etc/fstab, though. The postexec parameter reverses this process. The max connections = 1 option limits the connections to a single user, which can help avoid problems that might be caused should two users try to use the backup share simultaneously. To the user, the share looks just like any other; it's accessed from Network Neighborhood or My Network Places on a Windows system just like any other share, and it accepts files that are copied there in the Windows file manager or in any other way. Users will presumably insert and remove disks themselves, though, or perhaps ask somebody in physical proximity to the server to do so for them.

Warning

One problem with this approach is that Windows systems frequently don't terminate their connections to the server in a timely manner. Thus, the postexec command may not execute until several minutes, or even hours, after activity ceases. Logging out of the Windows session usually terminates it on Windows NT/200x/XP clients, but Windows 9x/Me clients may need to be rebooted. Another approach is to use the global Samba deadtime parameter, which tells Samba how many minutes of inactivity to accept before disconnecting a client. For instance, deadtime = 5 ensures that inactive connections are terminated in five minutes.

A similar approach can be used to back up to tape or to optical media, except that the preexec and postexec options are likely to do more:

preexec = rm -r /home/samba/backup/*
postexec = tar cvlC /home/samba/backup --file /dev/st0 ./

These options, used in place of those shown earlier, cause Samba to back up the contents of the backup directory when the user disconnects. (As with a mounted share, Samba may wait a while before doing this, because Windows clients often don't disconnect immediately.) The preexec option tells Samba to delete all the files in the backup directory. This ensures that two consecutive users' backups don't collide.

Perhaps the most flexible type of client-initiated Samba backup, though, uses a printer share, odd as that may sound. The idea is to use a Samba printer share option, print command, to have Samba execute a command that can operate on a single file sent by the client. Typically, this single file is a tarball, Zip file, or other archive file. The print command copies the file to the backup device. For instance, consider this share definition:

[print-bu]
   comment = Pseudo-Printer Backup Share
   directory = /var/spool/samba
   max connections = 1
   printable = Yes
   print command = dd if=%s of=/dev/st0; rm %s

This share uses dd to copy the received file, whatever it is (the %s Samba variable refers to the received print file) to /dev/st0. A more complex command stores the received file on an optical disc using mkisofs and cdrecord, or even uncompresses a tarball and creates a CD-R from its contents. One important point to note about this share is that its print command ends in rm %s. Removing the received print file is vitally important; Samba printer shares don't do so automatically, so if you fail to remove the print file, your backup server's disk will soon overflow with old backup jobs.

Tip

If you want to create a very complex print command, try writing a script to do the job and then call the script. This enables you to perform arbitrarily complex actions, while keeping your smb.conf file's share definitions readable.

Using a backup share

The tricky part to using a pseudo-printer backup share comes on the client. You must create a backup archive using local tools and then copy them to the server. For instance, you can use tar for Windows (see http://unxutils.sourceforge.net or http://www.cygwin.com for a couple of sources) to do the job:

C:\> TAR -cvf D:\BACKUP.TAR C:\
C:\> COPY D:\BACKUP.TAR \\BUSERVER\PRINT-BU
C:\> DEL D:\BACKUP.TAR
                  

This series of commands, typed at a DOS prompt, backs up the client's C: drive to the PRINT-BU share on the BUSERVER server. This specific set of commands uses D: as a temporary storage area; you may need to change this detail for your own system. Of course, many variants on this approach are possible. For instance, you can use a Zip utility or a dedicated Windows backup tool to create the archive that's copied to the backup server. You can also perform more-or-less the same task using Linux tools, in order to back up a Linux server; however, you'll use the Linux smbclient program to copy a file, rather than the Windows COPY command. If you send a file in tarball form and if Samba dumps it directly to tape, the result will be indistinguishable from creating a backup using a tape drive that's directly connected to the backup client.

Tip

You can enter commands to back up a Windows system into a Windows batch file. Thereafter, running that batch file backs up the client. To make the process even more user-friendly, you can create a desktop object that points to the batch file. Call it Backup or something similar, and users should have no trouble double-clicking it to back up their computers.

All of these client-initiated Samba backup methods do have certain limitations, in addition to those described earlier for client-initiated backups. Most notably, they all require that the Samba server have enough disk space to temporarily hold a complete backup. This disk space must be available in the directory used for the backup share. For removable disk backups, this isn't a very special requirement; the disk space needed must reside on the backup medium itself. For other methods, though, the server must be able to temporarily hold the entire archive before copying it to an external medium. If your backup plan involves manipulating files, such as storing a set of backup files on an optical disc, you may need more space for the temporary files you create in this process.

Using smbtar for Backups

The smbtar program is a script that comes with Samba. It combines the Samba smbclient program and the standard tar utility to read files from an SMB/CIFS server and store them in a tarball or on a tape. As such, it can be a good way to perform a server-initiated backup using SMB/CIFS. To do so, you must first configure your backup clients to share files (that is, to be file servers). Once this is done, you can actually employ smbtar to do the backup.

Tip

Because SMB/CIFS provides limited support for Linux file metadata, server-initiated SMB/CIFS backups of Linux systems are unlikely to work well, except perhaps for partial backups of user data files, particularly on a Samba server. For this reason, this chapter assumes that such backups use Windows backup clients.

Configuring Windows clients to share files

To perform a server-initiated backup via SMB/CIFS, you must configure the backup client as a file server. On Windows systems, this task requires installing and activating the SMB/CIFS software, although it's not called that in the Windows network tools. A typical procedure, for Windows XP, is as follows:

  1. Open the Windows Control Panel.
  2. Double-click the Network Connections icon. This displays a window of the same name. (This icon is called Network and Dial-Up Connections in Windows 200x.)
  3. In the new window, right-click the Local Area Connection icon. This produces a pop-up menu; select Properties in this menu. The result is the Local Area Connection Properties dialog box shown in Figure 14-1.
  4. If the protocol list includes an item called File and Printer Sharing for Microsoft Networks, skip ahead to Step #8.
  5. Click the Install button to bring up a dialog box called Select Network Component Type.
  6. Pick the Service item and click Add in the Select Network Component Type dialog box. This should produce the Select Network Service dialog box.
  7. In the Select Network Service dialog box, pick the File and Printer Sharing for Microsoft Networks item, and then click OK. This action will install SMB/CIFS server support on the computer.
  8. In the Local Area Connection Properties dialog box, verify that the File and Printer Sharing for Microsoft Networks item is checked. Click OK in this dialog box to dismiss it.

Figure 14-1. Windows displays the protocols it supports in the Local Area Connection Properties dialog box

Windows displays the protocols it supports in the Local Area Connection Properties dialog box

Adding SMB/CIFS server support is only part of the job; you must also define shares that the backup server will access. To do so, follow these steps.

  1. Open the My Computer folder on the desktop.
  2. Locate the icon for the drive you want to back up, and right-click it to produce a context menu. Select the Sharing and Security item from this menu. (This item may be called Sharing or something else on some versions of Windows.) This action brings up a Properties dialog box with a Sharing tab selected.
  3. In Windows XP, the Sharing tab displays a warning that sharing an entire drive can be a security risk. Click this notice to view the real configuration tab, as shown in Figure 14-2.
  4. Check the "Share this folder on the network" button and enter a name for the share in the "Share name" field. This interface is somewhat different in Windows 200x and Windows 9x/Me. In Windows 200x, you must click the New Share button to enter a share name.
  5. Windows XP allows you to enable or disable write access to the share via the "Allow network users to change my files" button. Windows 9x/Me provides two fields for passwords for read-only and read/write access. A backup share can ordinarily be read-only, although you will have to enable read/write access when you want to restore data.
  6. To start sharing the drive, click OK.

Figure 14-2. The Properties dialog box for a disk or directory enables sharing via SMB/CIFS

The Properties dialog box for a disk or directory enables sharing via SMB/CIFS

Unfortunately, every major release of Windows has changed these user interfaces slightly. The preceding description is based largely on Windows XP. Windows 9x/Me is different. Most importantly, the Network icon in the Control Panel brings up a Network dialog box that's similar to the Local Area Connection Properties dialog box (see Figure 14-1).

For Windows XP Professional and Windows 200x systems, you use a local username and password to access the share. For improved security, you might want to create a special backup account that provides read access to all the files you want to back up, but that's not used for ordinary local access. Windows 9x/Me systems use share-level security; i.e., a password without a username provides access to the shares. You enter the password when creating the share, as just noted. From the Linux backup server, you can enter a dummy username; it's ignored by the Windows 9x/Me file server.

Warning

Windows XP Home, which ships on many new computers, provides no password to protect its shares. This configuration makes Windows XP Home a very risky version of Windows to back up using server-initiated backups. If possible, upgrade such computers to Windows XP Professional or Windows 200x to obtain password-based share protections. If this isn't possible, you should at least use a strong firewall to limit access to TCP ports 139 and 445, so that only the backup server and other authorized systems can access the SMB/CIFS file servers on Windows XP Home systems.

Some versions of Windows require you to reboot at some point during this procedure, typically after installing the SMB/CIFS server software but before configuring shares.

Backing up with smbtar

Once you've configured a Windows system as a backup client (that is, a file server), you can try using smbtar on the backup server to perform backups. This command's basic syntax is:

smbtar [options] [filenames]

The smbtar command accepts quite a few options, but the most important are:

-s server
You pass the name of the file server (that is, the backup client) with this option.
-x share
You must tell smbclient what share to back up with this option. If you omit it, the program looks for a share called BACKUP.
-v
You can have smbtar provide more verbose information about its actions with this option.
-u username
When connecting with Windows NT/200x/XP servers, smbtar must pass a username to the file server with this parameter.
-p password
Unless the backup share requires no password (a risky configuration), you must deliver one with the -p parameter.
-a
Microsoft filesystems provide an archive bit, which is cleared when files have been backed up and set when they're modified. This can be helpful in performing incremental backups. If you use this option, smbtar clears the archive bit when backing up files.
-i
This option performs an incremental backup by backing up only those files on which the archive bit is set.
-N filename
This option implements a different type of incremental backup system, in which smbtar backs up only files that are newer than the specified filename, which is ordinarily a backup log file from the previous backup.
-t tape
You should pass a filename to smbtar with this option. The filename can be a tape device, such as /dev/st0, or a regular file.
-r
By default, smbtar backs up files from the remote share. This option reverses the process and causes the program to restore files.

As an example, consider this command:

$ smbtar -s GINGKO -x CDRIVE -u redwood -p Y#iWT7td -t /dev/st0
                  

This command backs up the CDRIVE share on the GINGKO server, using the redwood account and the password Y#iWT7td, and storing the backup on /dev/st0 (a SCSI tape device). You may also add filenames to the end of the smbtar command line. Doing so backs up the specified files or directories without backing up other files.

Server-initiated backups using smbtar can certainly be convenient, particularly when you want to back up an entire network of computers from a central location. Typically, you'll write a script to back up one computer per night on a small network, or perhaps do several each night on a larger network. Of course, you'll need to ensure that the backup clients are turned on at the scheduled backup times. This backup method is also limited in the types of metadata it can handle. Because smbtar doesn't understand some of the more sophisticated NTFS features, such as ownership and multiple data streams, it might not be a suitable tool for performing complete backups of Windows NT/200x/XP systems. Nonetheless, smbtar may be adequate for backing up user datafiles on Windows workstations, and it can even perform full backups of Windows computers that run off of FAT filesystems.

Restoring Data with Samba

Restoring data over the network introduces an extra component in the equation: the backup client must be able to accept the data transfer. Precisely how this happens depends on how you backed up the data:

Client-initiated removable disk backups
When using removable disks as if they were ordinary file shares, files can be restored by inserting the original backup medium and using drag-and-drop operations to restore files. This practice requires no special extra configuration on the client or the server, except for full restores (as described shortly).
Client-initiated two-stage backups
When the backup server processes data in some way, such as bundling data into a tarball and storing it on tape, a restore operation can be tricky. You may need to extract the data to a special data-restore share on the backup server and then copy it to the client. Alternatively, you may be able to configure the backup client as for a server-initiated backup and use smbtar or similar tools to perform the restore.
Server-initiated backups
In a server-initiated backup scenario, restores can be done very much like the initial backups, but you must specify the restore option (-r) to smbtar to do the work. You must also ensure that the backup client's file server accepts full read/write access to the share, at least when the restore operation is in progress. (If you like, you can disable read/write access once the restore is done.)

With some backup methods, you can restore data without using of a network. For instance, if you back up to an optical disc, and if the backup client has an optical reader that can read the backup, you can restore the data locally. In some cases, you can even move the backup drive to the backup client to perform a local restore without involving the network. This is most likely to be helpful when performing full restores.

With the exception of two-stage backups, partial network restores usually aren't much more work than similar restores would be on a local backup. The real trouble occurs when a full restore is necessary. With these, many of the same problems described earlier with reference to full local restores apply (see the section Section 14.2.4). The difference is that instead of having access to local backup software on the emergency system, you must have access to network tools—your SMB/CIFS client or server software.

Tip

When restoring a Windows system to a FAT disk, you can use a Linux emergency disk if your backup archive can be read by Linux. This usually works well, although there may be some minor changes to filename case. Also, short filenames are occasionally restored differently by Linux than by Windows, which can sometimes cause problems if configuration files refer to files by their short filenames.

Once you've restored data to a Windows system, you may need to take special steps to ensure that it's bootable. For Windows 9x/Me, you can do this from an emergency boot floppy created from the same version of the OS. Boot from the floppy, and use the FDISK program to mark the boot partition as bootable. You should then type SYS C: to restore a boot loader to the partition's boot sector. With Windows NT/200x/XP, boot from an emergency disk or the Windows installation CD, and select the system repair options. These should detect the lack of boot sectors and correct the problem.

Backing Up with AMANDA

Samba can be an effective part of a network backup solution, but it's got its limitations. Most importantly, it can be difficult to schedule backups, particularly on larger networks; you must add each machine individually to a network backup schedule. One solution to this problem is AMANDA, which was designed to automate the tape backup process as much as possible, while also providing tools to simplify the restore process. AMANDA serves as a "wrapper" around several other tools, and as such requires extra configuration. Once it's configured, though, AMANDA simplifies the day-to-day administration of a backup plan.

To begin using AMANDA, you should first understand its principles of operation: what can it do and how does it do it? Three types of configuration are then relevant: the AMANDA backup server, Linux backup clients, and Windows backup clients. Once you've configured all your systems, you can proceed to using AMANDA for both backups and restores.

AMANDA Principles

AMANDA was designed as a network-centric backup solution, in the sense that it's designed to treat a network as a single entity that's to be backed up. This contrasts with tar or even smbtar, which treat backups on a computer-by-computer basis. Of course, you must still tell AMANDA about the individual computers that are to be backed up, but you needn't be concerned with details such as scheduling when each system is backed up. Instead, let AMANDA work out those details, based on information you provide it concerning how often you want to complete a backup and what your network bandwidth is. Of course, you must ensure that backup clients are accessible to the backup server at the scheduled times. Because you may not know what those times are, it's best to make the backup clients accessible at all times.

AMANDA performs backups using two types of network protocols: its own unique tools and SMB/CIFS. AMANDA uses its own protocols to back up other Linux or Unix systems; these systems run tar or dump locally and transfer data to the AMANDA server. For Windows systems, AMANDA uses smbclient to transfer data using SMB/CIFS. In both cases, the backup clients must run server software and respond as servers. The AMANDA backup server, though, also runs server software, for the benefit of client-initiated restores. This configuration means that AMANDA can be trickier to configure than most backup server systems. Once configured, though, the backup procedure can be highly automated, and partial restores can be simpler, as well.

Warning

AMANDA hardcodes some values in its executables. Thus, mixing AMANDA client and server packages for different Linux distributions may not work very well. If your site has multiple Linux distributions, or Linux and other Unix-like systems, you may need to compile AMANDA locally to get these systems to interoperate. Pay particular attention to the --with-user and --with-group options, which set the AMANDA user and group. In theory, a low-priority backup user should work, but in practice, you may need to run it as root to back up all files on the backup clients. This isn't a concern for networks with a Linux AMANDA backup server and Windows backup clients; because the Windows backup clients run SMB/CIFS servers rather than AMANDA servers, no special coordination is necessary.

AMANDA's normal mode of operation is to first copy data from the backup client to a holding area on the backup server's hard disk and then copy this data to the backup tape. (AMANDA was designed with tape backups in mind and can't be used with other backup media.) AMANDA therefore works best with a large local hard disk, or at least something that's large enough to hold a substantial chunk of a day's backup. If your local hard disk is smaller than this, AMANDA will perform the backup in bursts, pulling as much data as it can from the client, backing it up to tape, pulling more from the client, and so on. This process is likely to be less efficient than retrieving a full backup and then spooling it all to tape.

Configuring an AMANDA Server

The bulk of the effort in AMANDA configuration is on the backup server side. Tasks include running the server programs for client-initiated restores, setting general AMANDA options, preparing tapes, and defining backup sets.

AMANDA server programs

The AMANDA backup server computer doesn't need to run any server programs for ordinary backup operations, but it does need to run two server programs to handle client-initiated restores: amandaidx and amidxtape. These programs are typically run from a super server (inetd or xinetd). If your distribution uses xinetd, and you install AMANDA from a package provided by your distribution, it may include one to three files in /etc/xinetd.d to handle the servers—both the servers for the backup server system and the server for the backup clients. (This third server is described in the Section 14.4.3.) If these files aren't present, you can create one or two files to do the job. These files should contain entries like these:

service amandaidx
{
   socket_type  = stream
   protocol     = tcp
   wait         = no
   user         = amanda
   group        = disk
   server       = /usr/lib/amanda/amindexd
   disable      = no
}

service amidxtape
{
   socket_type  = stream
   protocol     = tcp
   wait         = no
   user         = amanda
   group        = disk
   server       = /usr/lib/amanda/amidxtaped
   disable      = no
}

These entries tell xinetd to handle the servers. You may need to adjust some items for your system; pay particular attention to the user and group entries, which should match the values used when the servers were compiled. (Consult your binary package's distribution if you installed a binary package.) You might also need to adjust the path to the server. If your package includes xinetd configuration files, you shouldn't need to adjust these features, but you may need to change the disable lines, as these usually ship set to yes, which disables the servers.

Warning

The user who runs AMANDA on the backup server must have read/write access to the backup device files.

If your distribution uses inetd rather than xinetd, you must create entries in /etc/inetd.conf to handle these two servers:

amandaidx  stream  tcp  nowait  amanda.disk  amindexd  amindexd
amidxtape  stream  tcp  nowait  amanda.disk  amidxtaped  amidxtaped

In addition to the inetd or xinetd configuration files themselves, you should check your /etc/services file to be sure that port numbers are registered under the names used in your super server registration:

amandaidx  10082/tcp
amidxtape  10083/tcp

Once you've made these changes, restart or reload your super server. You can typically do this using a SysV startup script by typing /etc/init.d/xinetd restart or something similar. Consult your distribution's documentation if you have problems.

Setting AMANDA options

AMANDA uses two main configuration files, each stored under /etc/amanda or subdirectories of that directory:

amanda.conf
This file holds the main AMANDA configuration options. This file sets site-wide options.
disklist
This file specifies the computers that are to be backed up and the partitions on those computers that you want to back up. It's covered in more detail in Section 14.4.2.4.

In theory, these files can reside in /etc/amanda, or sometimes in /etc, /usr/local/etc, or a similar location. In practice, it's common to define multiple sets of configuration files, each of which resides in a subdirectory named after its purpose. For instance, you might use a directory called /etc/amanda/daily for daily backups and /etc/amanda/archive for long-term archival backups. You can then perform radically different types of backups by running AMANDA with appropriate options to use the configuration files you specify. Many AMANDA configurations provide a sample amanda.conf file in the /etc/amanda/example directory. You can copy this file to a new directory you create and modify it to suit your purpose.

Most amanda.conf options consist of a keyword followed by one or more options, such as netusage 800 Kbps to tell AMANDA that it may use up to 800 Kbps of network bandwidth. Some configuration options, though, require multiple lines. These use an opening curly brace ({) to mark the beginning of the block of lines that apply to an option and a closing curly brace (}) to mark the end of the block.

You can leave most of the options alone in a typical example configuration file. Here are some of the options you might need to adjust:

org
This option sets a name that appears in reports, so it's not critical for basic functioning, but you might as well set it.
mailto
Specify usernames or email addresses using this option, and AMANDA will send reports on its activities to those addresses.
dumpuser
AMANDA runs backups as the user you specify with this option. If it's unspecified, it uses a compile-time option that's specified via the --with-user option when building the program.
netusage
This option specifies the maximum amount of network bandwidth that AMANDA can expect to have available to it.
dumpcycle
You tell AMANDA how long you want a full network backup to take with this option. Typically, you specify a value in days or weeks, such as 5 days or 2 weeks.
runspercycle
This option sets the number of times that AMANDA expects its amdump program, which does most of the real work, to run in each dump cycle. Setting this value equal to the number of days in dumpcycle results in an expectation of one run per day, while setting it to a higher or lower value results in multiple runs per day or less than one run per day. (The amdump program is actually run by cron; this option just tells AMANDA what to expect for planning purposes.)
tapecycle
This option specifies the number of tapes used in a dump cycle. Ordinarily, it's the same as runspercycle plus a few for error handling; in case a tape goes bad and can't be used, you want AMANDA to be able to recover relatively gracefully.
runtapes
You can tell AMANDA to use multiple tapes per run by specifying the number with this option. The default value is 1, which is usually desirable.
tapedev
You tell AMANDA what tape device to use with this option. AMANDA expects to use nonrewinding tape devices, so be sure to point to one.
tapetype
To plan its backups, AMANDA must know several key things about your tape backup device. You therefore specify the tape type with this option, which refers to definitions that appear later in the amanda.conf file. (Search for define tapetype to find this list.) If you don't see your tape device in the list, you'll need to either locate one on the Internet (check the AMANDA home page, and click the TapeType link) or generate one yourself. To do the latter, you'll need the tapetype utility, which comes with the AMANDA source code but isn't built by default. Type make tapetype in the source code directory to build it. You should then insert a tape that holds no important data and type ./tapetype -f /dev/ device to test the tape accessible from /dev/ device. This operation erases all data on the tape and will probably take several hours. If your tape device supports hardware compression, you may be able to increase the reported tape length by the compression ratio (typically about 2), but if you then try to back up data that's not easily compressed, AMANDA may run out of space on the tape, which will cause problems.
labelstr
When preparing tapes, as described in the next section, you give each tape a name. You must provide a regular expression describing the form of this label; AMANDA will use only tapes that match this label. This helps prevent accidental erasure of tapes if you insert the wrong one in the tape drive.

Another important option is the description of holding disks. You can define one or more holding areas, and each definition spans multiple lines, as in:

holdingdisk hd1 {
  comment "primary holding area"
  directory "/var/spool/amanda"
  use -500 MB
  chunksize 2000 MB
}

The comment is a comment for human use, and the directory specifies the location of the holding area. The use line is optional; when it's present, it specifies how much space may be used in this area. A negative use value tells AMANDA how much disk space to leave free; this example causes AMANDA to leave at least 500 MB available. The chunksize line is also optional, and it specifies the maximum size of individual files that are temporarily stored in the holding area. This feature can be useful on some older filesystems or 2.2.x kernels, which have file size limits of about 2 GB. A negative chunksize value tells AMANDA to attempt to pass files larger than the absolute value of the specified size directly to the tape device, which saves disk space but may result in slower operation, depending on your hardware.

Preparing tapes

AMANDA labels every tape that it uses, then keeps track of the tapes during the backup process. This arrangement enables AMANDA to tell you precisely what tape to insert in the drive when performing restores. To do any good, though, you must first label all the tapes you'll use for a backup set. To do this, use the amlabel command:

$ amlabel daily DailySet107
                  

You must run this command as the user who will perform the backup. It takes the name of the backup configuration (that is, a subdirectory name within /etc/amanda) and a label as options. In this example, the label is DailySet107. This label must match the regular expression specified on the labelstr line in amanda.conf, or AMANDA won't be able to use the tape.

Defining dump types and backup sets

In order to accommodate different computers' backup needs, AMANDA provides a number of dump types near the end of the amanda.conf file. These dump types are specified with the define dumptype option, as in:

define dumptype comp-user {
    global
    comment "Non-root partitions on reasonably fast machines"
    compress client fast
    program "GNUTAR"
    priority medium
}

Each named dump type is referenced in the disklist file to set assorted backup options, each of which appears on its own line within the dump type definition. Some of the options you might want to set include:

compress
This option sets two options: whether compression occurs on the client or the server and the compression's speed/quality tradeoff (best, fast, or none).
exclude
You can exclude individual files from backup using this option. Alternatively, exclude list enables you to pass a list of filenames that AMANDA will exclude from backup. AMANDA excludes no files by default.
holdingdisk
Pass yes or no to this option to tell AMANDA whether to use a holding disk. The default value is yes.
index
Pass yes or no to this option to tell AMANDA whether to store an index of files that are backed up. You might want to omit the index on disks that are likely to be restored only in a full restore as a measure for saving disk space. The default value is yes.
kencrypt
This option takes yes and no values, and controls whether AMANDA uses Kerberos encryption. Setting it to yes requires that your network use Kerberos, as described in Chapter 9. The default value is no.
program
Pass "DUMP" (including the quote marks) as this option's parameter to have it use dump for backups on the remote system; pass "GNUTAR" to have it use tar. Given the limitations of dump, routinely using program "GNUTAR" is often wise. The default is "DUMP" for AMANDA backup clients, but only "GNUTAR" is valid for Samba clients, so that's the default.
skip-incr
If this option is yes (the default is no), disks that use this dump type are skipped when performing incremental backups.
priority
This option accepts values of low, medium, and high, which describe the importance of the disk for the backup. In case of errors or insufficient disk space, disks with higher priorities are saved, at least in the holding area, in preference to disks with lower priorities. The default value is medium.

These and more options are described in comments in the amanda.conf file, so if you'd like to achieve some effect not described here, check that file's comments. The example configuration file includes many dump types, so chances are you can use those that are provided. Peruse them to learn more. You can then create a disklist file, which specifies the backup client computers, the directories you want to back up, and the dump types you want to use for each directory:

# Be sure to back up the backup server
buserver.example.org /      root-tar
buserver.example.org /var   root-tar
buserver.example.org /hold  holding-disk

# Back up a Linux client
buclient.example.org /      root-tar
buclient.example.org /home  user-tar

# Back up a Windows client
buserver.example.org //GINGKO/DRIVEC  user-tar

Tip

The first set of entries in this example configuration backs up the backup server system. This means that the backup server must be configured as a backup client (as described in the next section), as well as being configured as a backup server.

For Linux or other Unix-like systems that run AMANDA software, you specify the hostname, a directory name, and a dump type. For Windows backup clients, you specify the backup server as the hostname and provide a hostname and share name in // HOST/SHARE format instead of a directory specification. AMANDA then uses Samba's smbclient to transfer the files. You must also create a password file, /etc/amandapass, which holds share names along with usernames and passwords:

//GINGKO/DRIVEC  mypassword
//MAIZE/DRIVED   buuser%bupassword

This example sets a password alone for the DRIVEC share on GINGKO, and a username and password for the DRIVED share on MAIZE. Because this file contains unencrypted passwords, you should ensure that it's readable only to the backup user (and root, if the two aren't the same).

At this point, AMANDA is configured on the backup server; however, you must still configure it on any Linux clients and prepare Windows systems. Once this is done, you can actually begin using AMANDA for backups and restores.

Linux AMANDA Client Configuration

Linux AMANDA backup clients run a server program called amandad, which responds to commands from the backup server system. The amandad program is normally run from a super server. If you installed AMANDA from a distribution's package on a distribution that uses xinetd, it may have installed a file called /etc/xinetd.d/amanda to handle this server. If you use xinetd, and this file isn't present, you'll have to create it:

service amanda
{
   socket_type  = dgram
   protocol     = udp
   wait         = yes
   user         = amanda
   group        = disk
   server       = /usr/lib/amanda/amandad
   disable      = no
}

As with the servers that are run on the AMANDA backup server computer, this one may need modification for your system. In particular, the user and group items may need adjustment. Be sure the specified user and group exist and have the necessary permissions to access the files you want backed up on the system. In practice, it's sometimes necessary to run the server as root, particularly if you want to back up files that only root may read. Even if your distribution provides a file to handle this server, you should check it and set disable = no; the default usually sets this value at yes, disabling the server.

If you use inetd as your super server, you must create an /etc/inetd.conf entry for amandad:

amanda  dgram  udp  wait  amanda.disk  amandad  amandad

Warning

The server run on the AMANDA backup client, like all servers, is a potential security risk, particularly if it's run as root. A miscreant who manages to access the server can read files from the computer, potentially including sensitive files such as password databases. Be sure the port used by the server (UDP port 10080) is well protected by firewall rules.

You must also ensure that the server's port is defined in /etc/services:

amanda  10080/udp

As a security measure, amandad uses an authorization file, .amandahosts, which is located in the home directory of the user who runs the server. This file contains the hostname of the backup server and the username of the user who runs the backup software on that system:

buserver.example.org amanda

The amandad server refuses to interact with amandad clients (that is, backup server systems) other than the one specified in this file. AMANDA doesn't use passwords for authentication, though.

Once all these features are set up, you should restart your super server. On most distributions, this can be done using SysV startup scripts, as in /etc/init.d/xinetd restart. Consult distribution-specific documentation for details.

Windows AMANDA Client Configuration

Because AMANDA uses SMB/CIFS to back up Windows systems, you needn't install or configure any special AMANDA software on these systems. Instead, configure them as you would for an SMB/CIFS backup using smbclient, as described earlier. Be sure to set the password for the backup user or share to the value you've set in the AMANDA backup server's /etc/amandapass file.

Backing Up and Restoring Data with AMANDA

To run a backup via AMANDA, use the amdump command. This command has the following syntax:

amdump config [ host [ disk ] ]

Normally, you just pass it a config name, which should match one of the subdirectory names in /etc/amanda. The amdump program then performs part of a network backup. The tool scans your configuration files to determine how many systems and disks it should back up over the course of a dump cycle. It can then perform an appropriate fraction of the full backup, the assumption being that the run you perform with this command is a regularly scheduled one. Of course, you must insert one of the tapes you prepared for this backup configuration in the tape drive before you issue this backup command.

Normally, you call amdump from cron. For instance, you might use a crontab entry like this to run amdump once every weeknight:

00 21 * * 1-5 /usr/sbin/amdump

You enter this line in the ~/crontab file for the user who you want to perform the backup, then type crontab -u user /home/user/crontab as root, where user is the username in question. The result is that cron will run amdump at 21:00 (9:00 P.M.) every weekday (1-5 in the final date field corresponds to Monday through Friday). Depending on your network bandwidth, tape capacity, and so on, each run can take anywhere from a few minutes to several hours to complete. After each run, AMANDA will email a report of its activities to the address specified with the mailto option in amanda.conf, so you can use that information to verify AMANDA's correct operation.

Restoring from an AMANDA backup requires special tools on the backup client. (For Windows backup clients, though, you perform these steps on the backup server system.) In particular, the amrecover tool enables you to browse the backup database maintained by the backup server. This tool presents its own amrecover> prompt and accepts commands you type. You can select files to recover and then extract them all from the backup with a single command. Specific commands you're likely to use include:

sethost hostname
Sets the name of the computer whose files you want to restore. The default is the localhost.
setdisk diskname
Sets the name of the disk on which the files you want to restore were originally held. It must match a name set in disklist.
listdisk diskname
Lists the contents of a disk.
setmode mode
Tells amrecover how to extract files for SMB/CIFS operations. Setting mode to smb causes shares to be extracted directly to the SMB/CIFS backup client computer; setting mode to tar causes files to be extracted to the local system.
mode
Displays the mode for extracting SMB/CIFS shares.
add items
Adds the specified items (files or directories) to a restore set.
extract
Begins the extraction process. To do any good, you must have added files to the restore set first. The tool prompts you to insert particular backup tapes, then recovers the data from those tapes.

In addition to these commands, amrecover accepts several more. Some of these, such as cd and ls, are similar to commands in bash or other common Linux shells; they enable you to move around the directories in the backup set and view files. Consult the amrecover manpage for more information.

As with local restores using tar or other tools, restores using amrecover are simplest if the systems involved are in more-or-less functional condition. To perform a full restore, you must have an emergency system working, as described in Section 14.2.4. This system must have a working version of the AMANDA backup client software running.

Summary

Backups are extremely useful insurance in case of hardware failure, major filesystem problems, system intrusion, or even user error. When these problems crop up, a backup can speed recovery of a working system. Unfortunately, backing up an entire network can be a tedious proposition. Fortunately, tools such as Samba and AMANDA can help simplify this process. Although they can take some time to set up, once they're configured, their day-to-day use is relatively straightforward, and they can pay off quite handsomely when problems occur that require data recovery from the backup.

Personal tools