I’ll cover creation of the S3 bucket, configuration of the AWS IAM security, installation of the AWS CLI, some key commands, and creating a cron job to automate it.
Hot on the heels of standing up a new Ubuntu server with a Docker stack, I’ll need to get a regular scheduled backup job set up. I’ll cover creation of the S3 bucket, configuration of the AWS IAM security, installation of the AWS CLI, some key commands, and creating a cron job to automate it.
The good news is that (unless you have a massive amount of change) the data backed up is relatively cheap, and the weekly backup should just take deltas, or, just the files that have changed. Disclaimer: All services have charges. You are responsible for these so investigate thoroughly!
Note that this walkthrough uses the AWS Management Console to set up AWS. If this is a production system you could make use of the CLI, or even better, code it with CloudFormation.
Warning: The S3 cli sync being set up here doesn’t backup empty directories, due to the way S3 objects are stored. If this is necessary, and it often is, you might be better off creating a tarball and backing that up, or replacing the sync command with some sort of script thus. Or, if it’s possible to find out what they were, manually create the directories on recovery.
Create the S3 bucket for backups
- Assuming you are already set up to use Amazon Web Services, log in to the management console. If not, head to https://aws.amazon.com/
- Find the S3 section. It’s on https://s3.console.aws.amazon.com/s3/home
- Click “Create bucket”.
- You’ll be asked to choose a bucket name. This needs to be world-unique. I am using a combination like servername-backup-10digitrandomstring, e.g. linux123-backup-skhvynirme. I used a string generator from random.org to make a random string for me to append and make sure I have a unique name. We’ll continue with that fictional example bucket name. Make a note of yours to later back up to it.
- You will need to select a region. If your server is on Lightsail or EC2, make sure the bucket is in the same region as your server. If not, it doesn’t matter so much the location, but be aware that most hosting companies charge for egress from the web server, and AWS is no exception. Additionally, S3 pricing varies by region.
- Select the region and go to the next screen.
- On the second screen you can optionally choose versioning and encryption. I’m going to use neither.
- Go to the next screen, the public access can remain off, then go to the next screen again. Check the settings and create the bucket.
- Make a note of the bucket ARN. This can be found by checking the box next to your bucket in the main S3 screen, and looking to the top right of the pop-in box. You will see a button “Copy bucket ARN”.
Configure IAM security for the S3 backup CLI user
Let’s jump to the IAM section of the management console. It’s on: https://console.aws.amazon.com/iam/home
Create a S3 backup user IAM policy
First, we are going to make a policy for the bucket user with these features:
- The S3 service (only)
- List all
- Read all
- Write all
- The backup bucket (only)
- All objects in that bucket
I have generated a suitable example using the policy manager in the AWS management console. With time and experimentation you would be able to condense this policy.
- In the IAM screen, select Policies in the left bar.
- Select the Create Policy button
- Change tabs to the JSON tab
- Paste in the below code block replacing everything already there
- Replace the REPLACE-WITH-YOUR-ARN with your ARN. Note the second one has a forward slash-asterisk after it: /*
- Click Review Policy. If the code syntax is valid you will go to a Review policy screen. As below, enter a policy name, and select the Create Policy button.
Create the IAM S3 backup user
- In the IAM screen, select Users in the left bar.
- Select the Add user button
- This user can only backup to that one bucket, so let’s give the name as bucketname-user, e.g. linux123-backup-skhvynirme-user
- This user is just for the CLI to use, and does not need the console. We check the Programmatic access box and then the Next button.
- It is good practice to make use of groups and feel free to do so, but in this case we have a user tied to a specific bucket so for demo purposes, we can select Attach existing policies directly.
- Use the filters or search to find the policy you just made, and check the box next to it.
- Select Next. Unless you would like to tag, select Next again.
- If everything looks good on the review screen, select the Create user button.
- Now you have the opportunity to get the credentials that will be needed to give to the CLI in order for it to perform backups. Copy the Access key ID. Click Show to display the Secret key and also copy. If you are not familiar with this process, also select the Download .csv file and save to a secure location. Note this is the only chance to get the Secret, so make sure you do!
Install the AWS CLI in Ubuntu
On my Ubuntu 18.04 image, I install the AWS CLI as follows, with a Y to continue at the right time. The $ is my prompt:
$ sudo apt install python3-pip
$ sudo pip3 install awscli –upgrade –user
Taking it for a spin:
$ aws –version returns something like:
aws-cli/1.14.44 Python/3.6.7 Linux/4.15.0-1035-aws botocore/1.8.48
In order to send files to my backup bucket, I will need to add credentials and the desired region. If you are setting this up as the only CLI user, we’ll configure the backup user as our default.
$ aws configure
AWS Access Key ID [None]: Enter your Access key from earlier here.
AWS Secret Access Key [None]: Enter your Secret from earlier here.
Default region name [None]: Enter your default region name here, e.g. us-east-1
Default output format [None]: Optionally, add a format, e.g. text
This will create two files in your home directory under a directory called .aws named config and credentials, should you wish to change these settings later. For a basic test, you can try to simply list the S3 buckets in your account:
$ aws s3 ls
Which should return, along with any other buckets you have, something like:
2019-04-14 02:07:17 linux123-backup-skhvynirme
Manually run the first backup to S3 using the AWS CLI
Let’s go into our data directory, e.g. cd /data, and create our first backup manually. Be aware that this may use up some of your data allowance or incur a charge, even if you have a Lightsail instance in the same region as your bucket. The size of the directory to be backed up can be obtained with the du command
You’ll be able to research the options more if need be, but I am going to backup with the s3 sync flag, which will only take deltas following the first run. You may want to explore other options to do with the S3 storage classes, applied with the flag –storage-class.
I recommend adding –dryrun before your first run which won’t actually do the sync. You’ll also need to replace linux123-backup-skhvynirme with your bucket name.
This looks good to me, and the target path names are fine so I run my command without the –dryrunand watch as my site is backed up.
$ aws s3 sync /data s3://linux123-backup-skhvynirme
In a flash it’s done and I can see the objects in the bucket. You may see error messages such as “seek() takes 2 positional arguments but 3 were given” on empty files, or files being skipped, often because they are soft links to something inside a container. If it’s a concern you could investigate options such as –no-follow-symlinks or –exclude for log files or other unneeded zero sized files.
It’s good practice to get the sync running with no errors so that when we automate we can capture meaningful output to a log file.
Create an automated cron job for the S3 backup sync
We’ll use the classic cron for automating the sync. Create a file in the /etc/cron.weekly directory with a name such as s3-data-backup. Then set owner to be root and the permissions to rwxr-xr-x.
You should be able to see the outcome of your job after the weekly run – on Ubuntu 18.04 you’ll be able find the run times by listing the crontab file: cat /etc/crontab
Backing up more than one host to one bucket – directories
To share the bucket and run the backup on more than one server, a subdirectory can be used simply by adding it to the target path. i.e. s3://linux123-backup-skhvynirme simply becomes something like s3://linux123-backup-skhvynirme/hostname. Unless you want a 1:1 server:bucket ratio, it’s a good idea.
Give the root user the AWS CLI credentials
Because I am running my cron job as root, it will also need access to the CLI credentials that we configured, if you have been running so far as a non-root user such as ubuntu. Assuming you don’t already have a .aws directory in your root home (check first), copy over the credentials directory like so:
$ sudo cp -rp ${HOME}/.aws /root
Cron-job logfile output
You can send the output to a custom log file to save mixing it in with the noise of syslog.
Housekeeping of old files in the bucket (optional)
This might or might not be a good idea depending on your application. If you know that some of the files are totally useless when stale, you could scratch them. If you have an app or database that is a mix of dynamic and static files, it might not be a good idea. If you do a recovery from your bucket, a key static file might be missing. If older static files are removed, they will probably be replaced next time you run a sync, but then you have a gap in your resilience model.
AWS suggest one option is to configure an expiry, then enable/disable it ad-hoc to manually clean up objects.
I can imagine a great usage would be if you have shuffled things around, moved a directory, moved an app between servers, that kind of thing. In that case, a purge of old data followed by a fresh sync would be optimal.
You might also want to consider a Transition, to move files to a lower storage tier after a certain time.
If you do want to automatically purge
Decide how many days old objects should be before purge, and if it’s everything in the bucket vs. a subset.
In the console, inside the bucket profile, select Management -> Lifecycle -> Add lifecycle rule.
Give your rule a name like “60 days”, choose all-objects if that’s what you want, and Next.
Pass the Transition page with another Next.
Select the “Current version” check box (Previous as well if you have versioning enabled and want that too), and the number of days in the Expire field to say, 60. Optionally you can clean up incomplete multiparts, if that’s a thing.
Check the review screen and apply if you wish.
Creating a Server EBS Snapshot
In the case of a real recovery from disaster, a good starting point is a server image. This will encapsulate the whole root disk of your instance. It varies by provider, but in the case of my Lightsail instance, it’s a matter of going to the Lightsail console, selecting the instance, selecting the snapshot tab, giving a meaningful name and creating. Snapshots are not free, and the AWS region I am in charges 10 cents per GB per month. With EC2, a snapshot can also be made to S3 which is a lot cheaper.
Snapshots only store deltas, so while the first one may be expensive, if there is 5MB of data change since the last snapshot, only 5MB of additional storage will get charged for.
You don’t need to have a snapshot, it’s also quite possible to rebuild a new server and recover the S3 files back into place, quite a viable option when Docker is being used. It depends how much downtime you can tolerate in the event your hosting provider loses your instance, and your own technical skills.