Failure to monitor disk space utilization can cause problems that
prevent Docker containers from working as expected. Amazon EC2 instance
disks are used for multiple purposes, such as Docker daemon logs,
containers, and images. This post covers techniques to monitor and
reclaim disk space on the cluster of EC2 instances used to run your
containers.
Amazon ECS is a highly scalable, high performance container management service that supports Docker containers and allows you to run applications easily on a managed cluster of Amazon EC2 instances. You can use ECS to schedule the placement of containers across a cluster of EC2 instances based on your resource needs, isolation policies, and availability requirements.
The ECS-optimized AMI stores images and containers in an EBS volume that uses the devicemapper storage driver in a direct-lvm configuration. As devicemapper stores every image and container in a thin-provisioned virtual device, free space for container storage is not visible through standard Linux utilities such as df. This poses an administrative challenge when it comes to monitoring free space and can also result in increased time troubleshooting task failures, as the cause may not be immediately obvious.
Disk space errors can result in new tasks failing to launch with the following error message:
NOTE: The scripts and techniques described in this post were tested against the ECS 2016.03.a AMI. You may need to modify these techniques depending on your operating system and environment.
Step 1: Create an IAM role
The first step is to ensure that the EC2 instance profile for the EC2 instances in the ECS cluster uses the “cloudwatch:PutMetricData” policy, as this is required to publish to CloudWatch.
In the IAM console, choose Policies, Create Policy. Choose Create Your Own Policy, name it “CloudwatchPutMetricData”, and paste in the following policy in JSON:
Policy.
Step 2: Push metrics to CloudWatch
Open a shell to each EC2 instance in the ECS cluster. Open a text editor and create the following bash script:
Now, schedule the script to run every 5 minutes via cron. To do this, create the file /etc/cron.d/ecsmetrics with the following contents:
This pulls both free data and metadata every 5 minutes and push them to CloudWatch with the namespace ECS/.
Take a look at what you can do to remove unneeded containers and images from your instances.
Delete containers
Stopped containers should be deleted if they are no longer needed. The ECS agent, by default, deletes all containers that have exited every 3 hours. This behavior can be customized by adding the following to /etc/ecs/ecs.config:
This sets the frequency of the task to 10 minutes.
For this change to take effect, the ECS agent needs to be restarted, which can be done via ssh:
To set this up for new instances, attach the following EC2 user data:
Delete images
By default, Docker caches images indefinitely. Cached images can be useful to reduce the time needed to launch new tasks: if the image is cached, the container can be started from the cache. If you have a lot of images that are rarely used, as is common in CI or development environments, then cleaning these out is a good idea. Use the following commands to remove unused images:
List images:
Delete an image:
This could be condensed and saved to a bash script:
Set the script to be executable:
Execute the script daily via cron by creating a file called /etc/cron.d/dockerImageCleanup with the following contents:
Amazon ECS is a highly scalable, high performance container management service that supports Docker containers and allows you to run applications easily on a managed cluster of Amazon EC2 instances. You can use ECS to schedule the placement of containers across a cluster of EC2 instances based on your resource needs, isolation policies, and availability requirements.
The ECS-optimized AMI stores images and containers in an EBS volume that uses the devicemapper storage driver in a direct-lvm configuration. As devicemapper stores every image and container in a thin-provisioned virtual device, free space for container storage is not visible through standard Linux utilities such as df. This poses an administrative challenge when it comes to monitoring free space and can also result in increased time troubleshooting task failures, as the cause may not be immediately obvious.
Disk space errors can result in new tasks failing to launch with the following error message:
Error running deviceCreate (createSnapDevice) dm_task_run failed
NOTE: The scripts and techniques described in this post were tested against the ECS 2016.03.a AMI. You may need to modify these techniques depending on your operating system and environment.
Monitoring
You can use Amazon CloudWatch custom metrics to track EC2 instance disk usage. After a CloudWatch metric is created, you can add a CloudWatch alarm to alert you proactively, before low disk space causes a problem on your cluster.Step 1: Create an IAM role
The first step is to ensure that the EC2 instance profile for the EC2 instances in the ECS cluster uses the “cloudwatch:PutMetricData” policy, as this is required to publish to CloudWatch.
In the IAM console, choose Policies, Create Policy. Choose Create Your Own Policy, name it “CloudwatchPutMetricData”, and paste in the following policy in JSON:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "CloudwatchPutMetricData", "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData" ], "Resource": [ "*" ] } ] }After you have saved the policy, navigate to Roles and select the role attached to the EC2 instances in your ECS cluster. Choose Attach Policy, select the “CloudwatchPutMetricData” policy, and choose Attach
Policy.
Step 2: Push metrics to CloudWatch
Open a shell to each EC2 instance in the ECS cluster. Open a text editor and create the following bash script:
#!/bin/bash ### Get docker free data and metadata space and push to CloudWatch metrics ### ### requirements: ### * must be run from inside an EC2 instance ### * docker with devicemapper backing storage ### * aws-cli configured with instance-profile/user with the put-metric-data permissions ### * local user with rights to run docker cli commands ### ### Created by Jay McConnell # install aws-cli, bc and jq if required if [ ! -f /usr/bin/aws ]; then yum -qy -d 0 -e 0 install aws-cli fi if [ ! -f /usr/bin/bc ]; then yum -qy -d 0 -e 0 install bc fi if [ ! -f /usr/bin/jq ]; then yum -qy -d 0 -e 0 install jq fi # Collect region and instanceid from metadata AWSREGION=`curl -ss http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region` AWSINSTANCEID=`curl -ss http://169.254.169.254/latest/meta-data/instance-id` function convertUnits { # convert units back to bytes as both docker api and cli only provide freindly units if [ "$1" == "b" ] ; then echo $2 elif [ "$1" == "kb" ] ; then echo "$2*1000" | bc | awk '{print $1}' FS="." elif [ "$1" == "mb" ] ; then echo "$2*1000*1000" | bc | awk '{print $1}' FS="." elif [ "$1" == "gb" ] ; then echo "$2*1000*1000*1000" | bc | awk '{print $1}' FS="." elif [ "$1" == "tb" ] ; then echo "$2*1000*1000*1000*1000" | bc | awk '{print $1}' FS="." else echo "Unknown unit $1" exit 1 fi } function getMetric { # Get freespace and split unit if [ "$1" == "Data" ] || [ "$1" == "Metadata" ] ; then echo $(docker info | grep "$1 Space Available" | awk '{print tolower($5), $4}') else echo "Metric must be either 'Data' or 'Metadata'" exit 1 fi } data=$(convertUnits `getMetric Data`) aws cloudwatch put-metric-data --value $data --namespace ECS/$AWSINSTANCEID --unit Bytes --metric-name FreeDataStorage --region $AWSREGION data=$(convertUnits `getMetric Metadata`) aws cloudwatch put-metric-data --value $data --namespace ECS/$AWSINSTANCEID --unit Bytes --metric-name FreeMetadataStorage --region $AWSREGIONNext, set the script to be executable:
chmod +x /path/to/metricscript.sh
Now, schedule the script to run every 5 minutes via cron. To do this, create the file /etc/cron.d/ecsmetrics with the following contents:
*/5 * * * * root /path/to/metricscript.sh
This pulls both free data and metadata every 5 minutes and push them to CloudWatch with the namespace ECS/.
Disk cleanup
The next step is to clean up the disk, either automatically on a schedule or manually. This post covers cleanup of tasks and images; there is a great blog post, Send ECS Container Logs to CloudWatch Logs for Centralized Monitoring, that covers pushing log files to CloudWatch. Using CloudWatch Logs instead of local log files reduces disk utilization and provides a resilient and centralized place from which to manage logs.Take a look at what you can do to remove unneeded containers and images from your instances.
Delete containers
Stopped containers should be deleted if they are no longer needed. The ECS agent, by default, deletes all containers that have exited every 3 hours. This behavior can be customized by adding the following to /etc/ecs/ecs.config:
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=10m
This sets the frequency of the task to 10 minutes.
For this change to take effect, the ECS agent needs to be restarted, which can be done via ssh:
stop ecs; start ecs
To set this up for new instances, attach the following EC2 user data:
cat /etc/ecs/ecs.config | grep -v 'ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION' > /tmp/ecs.config echo "ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=5m" >> /tmp/ecs.config mv -f /tmp/ecs.config /etc/ecs/ stop ecs start ecs
Delete images
By default, Docker caches images indefinitely. Cached images can be useful to reduce the time needed to launch new tasks: if the image is cached, the container can be started from the cache. If you have a lot of images that are rarely used, as is common in CI or development environments, then cleaning these out is a good idea. Use the following commands to remove unused images:
List images:
docker images
Delete an image:
docker rmi IMAGE
This could be condensed and saved to a bash script:
#!/bin/bash docker images -q | xargs --no-run-if-empty docker rmi
Set the script to be executable:
chmod +x /path/to/cleanupscript.sh
Execute the script daily via cron by creating a file called /etc/cron.d/dockerImageCleanup with the following contents:
00 00 * * * root /path/to/cleanupscript.sh
Nice blog, thanks for providing for more updates on AWS get touch with AWS Online course
ReplyDelete