simplesnap [--sshcmd
COMMAND] [--local
] --store
STORE --setname
NAME --host
HOST
simplesnap --check
TIMEFRAME --store
STORE --setname
NAME [--host
HOST]
simplesnap is a simple way to send ZFS snapshots across a network. Although it can serve many purposes, its primary goal is to manage backups from one ZFS filesystem to a backup filesystem also running ZFS, using incremental backups to minimize network traffic and disk usage.
simplesnap is FLEXIBLE; it is designed to perfectly compliment snapshotting tools, permitting rotating backups with arbitrary retention periods. It lets multiple machines back up a single target, lets one machine back up multiple targets, and keeps it all straight.
simplesnap is EASY; there is no configuration file needed. One ZFS property is available to exclude datasets/filesystems. ZFS datasets are automatically discovered on machines being backed up.
simplesnap is SAFE; it is robust in the face of interrupted transfers, and needs little help to keep running.
simplesnap is SECURE; unlike many similar tools, it does not require full root access to the machines being backed up. It runs only a small wrapper as root, and the wrapper has only three commands it implements.
Besides the above, simplesnap:
Does one thing and does it well. It is designed to be used with a snapshot auto-rotator on both ends (such as zfSnap). simplesnap will transfer snapshots made by other tools, but will not destroy them on either end.
Requires ssh public key authorization to the host being backed up, but does not require permission to run arbitrary commands. It has a wrapper to run on the backup host, written in bash, which accepts only three operations and performs them simply. It is suitable for a locked-down authorized_keys file.
Creates minimal snapshots for its own internal purposes, generally leaving no more than 1 or 2 per dataset, and reaps them automatically without touching others.
Is a small program, easily audited. In fact, most of the code is devoted to sanity-checking, security, and error checking.
Automatically discovers what datasets to back up from the remote. Uses a user-defined zfs property to exclude filesystems that should not be backed up.
Logs copiously to syslog on all hosts involved in backups.
Intelligently supports a single machine being backed up by multiple backup hosts, or onto multiple sets of backup media (when, for instance, backup media is cycled into offsite storage)
simplesnap's operation is very simple.
The simplesnap program runs on the machine that stores the backups -- we'll call it the backuphost. There is a restricted remote command wrapper called simplesnapwrap that runs on the machine being backed up -- we'll call it the activehost. simplesnapwrap is never invoked directly by the end-user; it is always called remotely by simplesnap.
With simplesnap, the backuphost always connects to the activehost -- never the other way round.
simplesnap runs in the backuphost, and first connects to the simplesnapwrap on the activehost and asks it for a list of the ZFS datasets ("listfs" operation). simplesnapwrap responds with a list of all ZFS datasets that were not flagged for exclusion.
Next, simplesnap connects back to simplesnapwrap once for each dataset to be backed up -- the "sendback" operation. simplesnap passes along to it only two things: the setname and the dataset (filesystem) name.
simplesnapwrap looks to see if there is an existing simplesnap snapshot corresponding to that SETNAME. If not, it creates one and sends it as a full, non-incremental backup. That completes the job for that dataset.
If there is an existing snapshot for that SETNAME, simplesnapwrap creates a new one, constructing the snapshot name containing a timestamp and the SETNAME, then sends an incremental, using the oldest snapshot from that setname as the basis for zfs send -I.
After the backuphost has observed zfs receive exiting without error, it contacts simplesnapwrap once more and requests the "reap" operation. This cleans up the old snapshots for the given SETNAME, leaving only the most recent. This is a separate operation in simplesnapwrap ensuring that even if the transmission is interrupted, still it will be OK in the end because zfs receive -F is used, and the data will come across next time.
The idea is that some system like zfSnap will be used on both ends to make periodic snapshots and clean them up. One can use careful prefix names with zfSnap to use different prefixes on each serverhost, and then implement custom cleanup rules with -F on the holderhost.
This section will describe how a first-time simplesnap user can get up and running quickly. It assumes you already have simplesnap installed and working on your system. If not, please follow the instructions in the INSTALL.txt file in the source distribution.
As above, I will refer to the machine storing the backups as the "backuphost" and the machine being backed up as the "activehost".
First, on the backuphost, as root, generate an ssh keypair that will be used exclusively for simplesnap.
ssh-keygen -t rsa -f ~/.ssh/id_rsa_simplesnap
When prompted for a passphrase, leave it empty.
Now, on the activehost, edit or create a file called ~/.ssh/authorized_keys. Initialize it with the content of ~/.ssh/id_rsa_simplesnap.pub from the backuphost. (Or, add to the end, if you already have lines in the file.) Then, at the beginning of that one very long line, add text like this:
command="/usr/sbin/simplesnapwrap",from="1.2.3.4", no-port-forwarding,no-X11-forwarding,no-pty
(I broke that line into two for readability, but this must all be on a single line in your file.)
The 1.2.3.4 is the IP address that connections from the backuphost will appear to come from. It may be omitted if the IP is not static, but it affords a little extra security. The line will wind up looking like:
command="/usr/sbin/simplesnapwrap",from="1.2.3.4", no-port-forwarding,no-X11-forwarding,no-pty ssh-rsa AAAA....
(Again, this should all be on one huge line.)
If there are any ZFS datasets you do not want to be backed up, set
org.complete.simplesnap:exclude
property
on the activehost
to on
. For instance:
zfs set org.complete.simplesnap:exclude=on tank/junkdata
Now, back on the backuphost, you should be able to run:
ssh -i ~/.ssh/id_rsa_simplesnap serverhost
say yes when asked if you want to add the key to the known_hosts file. At this point, you should see output containing:
"simplesnapwrap: This program is to be run from ssh."
If you see that, then simplesnapwrap was properly invoked remotely.
Now, create a ZFS filesystem to hold your backups. For instance:
zfs create tank/simplesnap
Now, you can run the backup:
simplesnap --host serverhost --setname mainset --store tank/simplesnap --sshcmd "ssh -i /root/.ssh/id_rsa_simplesnap"
You can monitor progress in /var/log/syslog. If all goes well, you will see filesystems start to be populated under tank/simplesnap/host.
Simple!
Most people will always use the same SETNAME. The SETNAME is used to track and name the snapshots on the remote end. simplesnap tries to always leave one snapshot on the remote, to serve as the base for a future incremental.
In some situations, you may have multiple bases for incrementals. The two primary examples are two different backup servers backing up the same machine, or having two sets of backup media and rotating them to offsite storage. In these situations, you will have to keep different snapshots on the activehost for the different backups, since they will be current to different points in time.
All simplesnap options begin with two dashes (`--'). Most take a parameter, which is to be separated from the option by a space. The equals sign is not a valid separator for simplesnap.
The normal simplesnap mode is backing up. An alternative check mode is available, which requires fewer parameters. This mode is described below.
--check TIMEFRAME
Do not back up, but check existing backups. If any
datasets' newest backup is older than
TIMEFRAME, print an error and
exit with a nonzero code. Scans all hosts unless a
specific host is given with --host
. The
parameter is in the format given to GNU date(1); for
instance,
--check "30 days ago". Remember to enclose it in quotes
if it contains spaces.
--host
HOSTGives the name of the host to back up. This is both passed to ssh and used to name the backup sets.
In a few situations, one may not wish to use the same name for both. It is recommend to use the Host and HostName options in ~/.ssh/config to configure aliases in this situation.
--local
Specifies that the host being backed up is local to the machine. Do not use ssh to contact it, and invoke the wrapper directly.
--sshcmd
COMMAND
Gives the command to use to connect to the remote host.
Defaults to "ssh". It may be used to select an
alternative configuration file or keypair. Remember to
quote it per your shell if it contains spaces. For example:
--sshcmd "ssh -i /root/.id_rsa_simplesnap". This command
is ignored when --local
or
--check
is given.
--setname SETNAME
Gives the backup set name. Can just be a made-up word if multiple sets are not needed; for instance, the hostname of the backup server. This is used as part of the snapshot name.
--store
STORE
Gives the ZFS dataset name where the data will be stored. Should not begin with a slash. The mountpoint will be obtained from the ZFS subsystem. Always required.
--wrapcmd
COMMAND
Gives the path to simplesnapwrap (which must be on the
remote machine unless --local
is given).
Not usually relevant, since the
command
parameter in
~root/.ssh/authorized_keys gives the
path. Default: "simplesnapwrap"
zfSnap (1), zfs (8).
The simplesnap homepage: https://github.com/jgoerzen/simplesnap
The examples included with the simplesnap distribution, or on its homepage.
The zfSnap package compliments simplesnap perfectly. Find it at https://github.com/graudeejs/zfSnap.
This software and manual page was written by John Goerzen <jgoerzen@complete.org>
.
Permission is
granted to copy, distribute and/or modify this document under
the terms of the GNU General Public License, Version 3 any
later version published by the Free Software Foundation. The
complete text of the GNU General Public License is included in
the file COPYING in the source distribution.