Tuesday 12 July 2016

Content Replication Using AWS Lambda and Amazon S3

Cross-region replication in Amazon S3 lets you copy from one source bucket to one destination bucket, where the destination bucket resides in a separate region from the source bucket. In order to replicate objects to multiple destination buckets or destination buckets in the same region as the source bucket, customers must spin up custom compute resources to manage and execute the replication.
In this post, I describe a solution for replicating objects from a single S3 bucket to multiple destination S3 buckets using an AWS Lambdafunction. This solution is presented as a complement to cross region replication for specific use cases that require either multiple destination buckets, or a destination bucket that resides in the same region as the source.
The solution leverages S3 event notification, Amazon SNS, and a simple Lambda function to perform continuous replication of objects. Similar to cross-region replication, this solution only replicates new objects added to the source bucket after configuring the function, and does not replicate objects that existed prior to the function’s existence.
Note that while this method offers functionality not currently offered by cross region replication, it also incurs costs that cross region replication would not incur, and has limitations which are noted at the end of the post.

Solution overview

The solution requires one S3 source bucket and at least one destination bucket. The buckets can reside either in the same region or in different regions. On the source bucket, you create an event notification that publishes to an SNS topic. The SNS topic acts as the mechanism to fan out object copying to one or more destinations, each achieved by invoking a separate Lambda function. The function source code is provided as an example that accompanies this post.
You can define a new destination by creating a subscription to the SNS topic that invokes the Lambda function. There can be multiple Lambda functions subscribed to the same topic, each performing the same action, but to a different destination S3 bucket. Define which bucket and which Lambda function by naming the function with the exact name of the S3 destination bucket. There is no need to edit the function code, and the different functions can be identical with the exception of their names. After they are invoked, the functions copy new source bucket objects to the destination buckets simultaneously.

Required IAM permissions

In order for a Lambda function to be able to copy an object, it requires a Lambda function IAM execution role. The function requires S3 GET permissions on the source bucket and S3 PUT permissions on any destination bucket. This needs to be created for each instance of the function (for each destination), calling out the respective destination bucket in the policy. An example IAM policy is provided later in this post.
The user that create the IAM role is passing permissions to Lambda to assume this role. To grant these permissions, the user must already have permissions to perform the iam:PassRole action.

Solution walkthrough

The following walkthrough install the S3 replication functionality using Lambda, SNS topics, and S3 event notifications. This walkthrough assumes that you have created or identified an S3 source bucket and one or more destination buckets. Note the bucket names for later.

Configure the SNS topic and source bucket

The following steps only need to be done one time per source bucket.

Create the SNS topic to fan out

  1. In the SNS console, create a new SNS topic. Note the topic name for later. A topic is created one time per S3 bucket source, so consider naming the topic as follows: [source-bucket-name]-fanout
  2. Note the SNS topic’s ARN string and then choose Other topic actions , Edit topic policy , and Advanced View.
  3. Replace the contents of the default policy with the following:
JavaScript
{
  "Version": "2008-10-17",
  "Id": "<default_policy_ID>",
  "Statement": [
    {
      "Sid": "<default_statement_ID>",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": [
        "SNS:Publish"
      ],
      "Resource": "arn:aws:sns:us-east-1:123123123123:s3-source-bucket-fanout",
      "Condition": {
        "ArnLike": {
          "AWS:SourceArn": "arn:aws:s3:*:*:s3-source-bucket-name"
        }
      }
    }
  ]
}
  1. Make the following changes in the policy that are marked in red:
    1. For Resource , change to the ARN value for the SNS topic.
    2. For AWS:SourceArn , change to the ARN value for the S3 source bucket.
  2. Choose Update policy.

Configure the source bucket

  1. In the S3 console, edit the source bucket configuration.
  2. Expand the Events section and provide a name for the new event. For example: S3 replication to dst buckets: dstbucket1 dstbucket2
  3. For Events , choose ObjectCreated (ALL).
  4. For Send to , choose SNS topic.
  5. For SNS topic , select the topic name chosen in step 2.
  6. Choose Save.

Configure the Lambda function and SNS subscription

The following steps are repeated for each destination bucket. Additional destinations can be added anytime; however, only subsequently created objects will be replicated to the new destination buckets. Note that the Lambda replication functions are executed in parallel as they are triggered simultaneously via their SNS subscriptions. The following steps only need to be done one time per destination bucket.

Create the Lambda function and IAM policy

  1. In the Lambda console, choose Create a Lambda function.
  2. Choose Skip to skip the blueprint selection.
  3. For Runtime , choose Python 2.7.
  4. For Name , enter a function name. The function name should match the name of the S3 destination bucket exactly.
  5. Enter a description that notes the source bucket and destination bucket used.
  6. For Code entry type , choose Edit code inline.
  7. Paste the following into the code editor:
Python
import urllib
import boto3
import ast
import json
print('Loading function')

def lambda_handler(event, context):
    s3 = boto3.client('s3')
    sns_message = ast.literal_eval(event['Records'][0]['Sns']['Message'])
    target_bucket = context.function_name
    source_bucket = str(sns_message['Records'][0]['s3']['bucket']['name'])
    key = str(urllib.unquote_plus(sns_message['Records'][0]['s3']['object']['key']).decode('utf8'))
    copy_source = {'Bucket':source_bucket, 'Key':key}
    print "Copying %s from bucket %s to bucket %s ..." % (key, source_bucket, target__bucket)
    s3.copy_object(Bucket=target_bucket, Key=key, CopySource=copy_source)
  1. For Handler , leave the default value: lambdafunction.lambdahandler
  2. For Role , choose Create new role , basic execution role.
  3. In the IAM dialog box, create a new IAM execution role for Lambda.
  4. For Role Name , enter a value that includes the destination bucket name. For example: s3replicationexecutionroletobucket[dstbucketname]
  5. Expand View Policy Document and choose Edit the policy.
  6. Choose OK to confirm that you’ve read the documentation.
  7. Replace the contents of the default policy with the following:
JavaScript
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::source-bucket-name/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::destination-bucket-name-1/*"
            ]
        }
    ]
}
  1. Make the following changes in the policy that are marked in red:
    1. Under the s3:GetObject action, change to the ARN value for the source bucket.
    2. Under the s3:PutObject action, change to the ARN value for the destination bucket.
  2. Choose Allow to save the policy and close the window.
  3. For Timeout , keep the default value 5 minutes.
  4. For VPC , leave the default value No VPC.
  5. Choose Next.
  6. Review the configuration and choose Create Function.

Create the SNS topic subscription

  1. In the SNS console, choose Topics and select the fan out topic [source-bucket-name]-fanout created earlier. Enter the topic’s details page.
  2. Choose Create Subscription.
  3. For Protocol, choose AWS Lambda.
  4. For Endpoint, select the function ARN that represents the destination bucket.
  5. Choose Create Subscription.

Validate the subscription

  1. Upload an object to the source bucket.
  2. Verify that the object was copied successfully to the destination buckets.
  3. Optional: view the CloudWatch logs entry for the Lambda function execution. For a successful execution, this should look similar to the following screenshot.
CloudWatchLogs

Conclusion

This method is simple, and addresses use cases not currently addressed by cross region replication. If you have any suggestions or comments, please feel free to comment below.

Notes

Costs : Getting and putting objects into S3 buckets incurs a cost, as well as cross region data transfer costs. Please see the S3 pricing page for more details.
Lambda : The limitation on file size that the above solution can support is variable and depends on the latency between the source and destination buckets. The Lambda function times out after 5 minutes; if the file copy has yet to complete within that time frame, it does not replicate successfully. Therefore, we recommend running multiple tests after setting up the different destination functions and, based on the results, rely on this method only for file sizes that consistently manage to replicate.
It is possible to expand this solution so that each Lambda execution reports at the end of the copy that replication has completed successfully. The result is a repository that can be queried to verify ongoing replication and that alerts on errors. One approach would be reporting to a DynamoDB table and logging each successful copy. This is beyond the scope of this post.
Also, we recommend carefully monitoring the number of files put into the source bucket as you may need to request an increase to the concurrent execution limits for Lambda.

0 comments:

Post a Comment

Popular Posts

Powered by Blogger.

Recent Comments

Contact Form

Name

Email *

Message *

Followers