Skip to main content

Amazon S3 Easy Scripted Backup from Windows for the Enterprise


I don’t normally post code, nor am I normally implementing scripts myself.  But sometimes to learn the ins and outs of a capability, you have to dive in and try it out.  Since I’ve been working with Amazon Web Services (AWS) via Windows, I’ve found a remarkable lack of sample scripts for using it, so I’m posing my little project here.

For me, with heavy Unix scripting experience in my distant background, using Powershell and the AWS Powershell add-in was a no brainer.  While the syntax of Powershell is significantly different from Unix Bourne shell, the capabilities are practically identical, including piping.

Now for the requirements:

- I needed to backup to the Cloud for an offsite backup.
- The data needed to be encrypted with a client-managed key, but I had neither the tools nor onsite CPU or extra storage for client-site encrypting.
- The backed-up data needed to track changes or access.
- To show any modifications, the backed-up files needed to manage versions – so no changed version would overwrite a previous version.

Here’s what I did…

1. Set up a specific IAM user with permissions only for S3 by:

- Created an AWS group titled “backup_group”.
- Attached the policy “AmazonS3FullAccess” and no others.
- Created users “backup_user1” & “backup_user2”.
- Stored these user’s REST access keys in a secure encrypted local location.
- Added both users to “backup_group”.

2. Create a specific S3 bucket(s) for these backups, accessible only by the Backup user (and administrative user), with logging.

- Created S3 bucket backup-logs with lifecycle setting, DELETE all content 2 years old or older – meaning logs have a 2 year life. (If you don’t give log directories a lifecycle rule, they’ll accumulate forever with ever increasing storage costs.  Since this “disk” never “fills”, you won’t get any kind of log error that would force rotation, just an ever growing cost.)

- Created S3 bucket “backup-for-bi”, enabled logging to “backup-logs” with subdirectory logs-bi/
- Created S3 bucket “backup-for-DB”, enabled logging to “backup-logs” to subdirectory logs-db/

3. Enable versioning to preserve each copy and prevent hidden changes. – enabled on both buckets.

6. Utilize an upload command setting that requires encryption of the uploaded data with an client managed key, which will prevent any unencrypted download of the content even by Amazon.  Key will be stored locally.

- Now this was particularly tricky and confusing, encryption keys not being my specialty.  AWS’s data encryption is AES 256, but generating an AES 256 key was rejected.  The command spec says it should be Base64 encoded, which I did but it was still rejected.  In the end I was able to generate an AES-128-cbc encrypt key from a passcode, and then Base64 encode that key which generated a 44 byte string (ending with =) that AWS accepted.  In essence, that Base64 string is, as far as we are concerned, the key – though I’m storing that key, the original password, and the 128 bit key and salt.

With all that prep ready, here’s the Powershell to upload a list of directories, the list being embedded in the shell.  $accesskey equals the AWS IAM user access key shown by AWS on creation of the user.  $secretkey is also shown by AWS on creation of the user.

<#
.DESCRIPTION
    Upload Listed Directories to Amazon AWS S3
    
.NOTES
    PREREQUISITES:
    1) AWS Tools for PowerShell from http://console.aws.amazon.com/powershell/

.EXAMPLE
    powershell.exe .\AWS_Backup_Dirs.ps1  
#>

$bucket          = "nameofbackupbucket"
$backup_list     = "E:\Prod", "E:\PreProd"
$AES256_key      = "AAAABBBBCCCCDDDDEEEE99991111222233334444777="
$accesskey       = "ASDKLJASDFJKLASDF"
$secretkey       = "ASDLFJKWEIOPUQWERASDFJ/LASDFASDFASDFASDF"

try
{
    import-module "C:\Program Files (x86)\AWS Tools\PowerShell\AWSPowerShell\AWSPowerShell.psd1"
}
catch [system.exception] 
{
    $error_fail = "Error: AWS Powershell Extensions not installed or missing from expected location... " + $_.Exception.Message
    Write-Host $error_fail
    throw $error_fail
}

foreach ($backme in $backup_list)
{
    $bucket_subdir = Split-Path -Path $backme -Leaf
    try
    {
        Write-S3Object -BucketName $bucket -Folder $backme -Recurse -KeyPrefix $bucket_subdir -AccessKey $accesskey -SecretKey $secretkey -ServerSideEncryptionCustomerProvidedKey $AES256_key -ServerSideEncryptionCustomerMethod AES256
    }
    catch [system.exception] 
    {
        Write-Host  "Error: " $_.Exception.Message
    }
} 

Notes:

The –KeyPrefix specifies that the data will be written into subdirectories matching the last name of the directory path.  So if the path is D:\dog\cat, it will be stored on S3 in the subdir “cat”.

This script can be set up as a scheduled task and run daily or weekly.  BUT, as written it will send up the whole directory, which can bear the cost of the full data transfer even if nothing has changed.  If you want incremental backups, you have to adjust the script to only find newer files, and loop to send them up one at a time (rather than the whole directory in this script).

Hope this helps!

Popular posts from this blog

Integration Spaghetti™

  I’ve been using the term Integration Spaghetti™ for the past 9 years or so to describe what happens as systems connectivity increases and increases to the point of … unmanageability, indeterminate impact, or just generally a big mess.  A standard line of mine is “moving from spaghetti code to spaghetti connections is not an improvement”. (A standard “point to point connection mess” slide, by enterprise architect Jerry Foster from 2001.) In the past few days I’ve been meeting with a series of IT managers at a large customer and have come up with a revised definition for Integration Spaghetti™ : Integration Spaghetti™ is when the connectivity to/from an application is so complex that everyone is afraid of touching it.  An application with such spaghetti becomes nearly impossible to replace.  Estimates of change impact to the application are frequently wrong by orders of magnitude.  Interruption in the integration functioning are always a major disaster – both in terms of th

Solving Integration Chaos - Past Approaches

A U.S. Fortune 50's systems interconnect map for 1 division, "core systems only". Integration patterns began changing 15 years ago. Several early attempts were made to solve the increasing problem of the widening need for integration… Enterprise Java Beans (J2EE / EJB's) attempted to make independent callable codelets. Coupling was too tight, the technology too platform specific. Remote Method Invocation (Java / RMI) attempted to make anything independently callable, but again was too platform specific and a very tightly coupled protocol. Similarly on the Microsoft side, DCOM & COM+ attempted to make anything independently and remotely callable. However, as with RMI the approach was extremely platform and vendor specific, and very tightly coupled. MQ created a reliable independent messaging paradigm, but the cost and complexity of operation made it prohibitive for most projects and all but the largest of Enterprise IT shops which could devote a focused technology

From Spaghetti Code to Spaghetti Connections

Twenty five years ago my boss handed me the primary billing program and described a series of new features needed. The program was about 4 years old and had been worked on by 5 different programmers. It had an original design model, but between all the modifications, bug fixes, patches and quick new features thrown in, the original design pattern was impossible to discern. Any pattern was impossible to discern. It had become, to quote what’s titled the most common architecture pattern of today, ‘a big ball of mud’. After studying the program for several days, I informed my boss the program was untouchable. The effort to make anything more than a minor adjustment carried such a risk, as the impact could only be guessed at, that it was easier and less risky to rewrite it from scratch. If they had considered the future impact, they never would have let a key program degenerate that way. They would have invested the extra effort to maintain it’s design, document it property, and consider