Building An Automated EC2 Backup System with AWS Lambda, EventBridge, and SNS

Data loss is a critical threat to any business. AWS offers built-in solutions for backing up EC2 instances, but these give users limited control. For instance, AWS Backup and EBS Lifecycle Manager mostly focus on EBS volume snapshots and rely on proper tagging, but they don’t back up full instance configurations, can’t easily send alerts, and offer limited cleanup or retention options. 

AWS Services Utilized

  • Amazon EC2: I used this to create the test instance that the system will back up. Amazon EBS: Provides persistent block storage for EC2; snapshots of these volumes are created.
  • Amazon EventBridge: I used this to create a schedule that triggers the lambda functions to start the backup process.
  • AWS Lambda: This was used to create and execute the core logic for identifying instances, creating/deleting snapshots, and tagging them.
  • Amazon SNS: This was used to deliver notifications to my email on backup success or failure.
  • Amazon CloudWatch: Amazon CloudWatch: Used to monitor logs and metrics of the Lambda function.

Procedure for Creating the above system

Step 1: Creating and Tagging Critical EC2 Instances for Backup 

The first and most fundamental step is to clearly identify which EC2 instances our automated system should back up. This is achieved using AWS Tags. These tags provide a powerful way to categorize AWS resources. For this project, the “TestInstance” I created included the tag “backup”: “true” as shown in the above screenshot. This tag is what our Lambda function will look for when invoked.

Step 2: Creating Backup Schedule with Amazon EventBridge 

The Lambda function, which contains the logic for automatically creating backups and assigning necessary metadata, requires a trigger.

The trigger will be a schedule created in Amazon EventBridge. The schedule will automatically invoke the AWS Lambda function at a regular interval. For this project, I will create a schedule that invokes the function every two days at 16:00, as shown in the screenshot above. 

Amazon EventBridge also allows for a flexible time window, which is a period within which our target (the Lambda function) can be triggered.

This is helpful when multiple services attempt to trigger the same function simultaneously. While not strictly necessary for this project, I included it for learning purposes, as this scenario is quite common in production cloud environments.

Step 3: Implementing The Core Backup Logic with AWS Lambda 

AWS Lambda hosts the Python code that performs tasks such as identifying instances, initiating snapshots, tagging them, and managing the retention policy. When creating our Lambda function, we granted it basic permissions to send logs to CloudWatch.

However, I also had to assign additional permissions to the Lambda function to enable interaction with other AWS services.

Specifically, these permissions allow it to create and delete EBS snapshots, describe EC2 instances and volumes, apply tags to snapshots, and publish notifications to an SNS topic. This was accomplished by adding a custom inline policy to the existing role as shown in the screenshot below.

For this Lambda function, we used Python to develop the core logic for the backup. I selected Python as the runtime, and in this case, I chose Python 3.9, which is the latest version. The other details of our Lambda function are displayed in the screenshots above.

Step 4: Configure Notifications with Amazon SNS

To ensure real-time updates on the status of automated backups, I integrated Amazon SNS (Simple Notification Service) to send alerts. These email notifications promptly inform all subscribers about the success or failure of backup operations, allowing any issues to be addressed immediately. 

I created an SNS topic with Lambda as the publisher. The necessary permissions for Lambda were granted in the previous step using the SNS topic ARN.

Finally, I subscribed one of my email addresses to the topic to receive all backup success or failure notifications directly. As shown in the screenshot above, the first email was sent after a scheduled backup was successfully executed.

Monitoring and Optimizing with Amazon CloudWatch

I used Amazon CloudWatch as the observability hub. It collects logs and metrics from the Lambda function, allowing us to monitor the health and cost-efficiency of the Smart Vault system. The Lambda function’s print() statements and logger.info/logger.error messages are automatically streamed to CloudWatch Logs.

As shared in the earlier steps, the lambda functions already has necessary permissions to send logs to CloudWatch. Some of the logs sent to CloudWatch after the backups are shown in the screenshot above.

Challenges Faced

  • IAM Permissions Granularity: One of the major challenges I faced was correctly configuring the IAM role with precise permissions the lambda function need to perform backup of EC2 instance. I later figured this out and added these permissions accordingly. 
  • EventBridge Trigger Setup: Since it was my first time using EventBridge in a project, it took me some time to understand that an EventBridge “schedule” is actually a “rule.” After figuring this out, it became easier to create the schedule with our Lambda function as the target.

Conclusions

Doing this project was an interesting learning experience, as it covered several AWS services that needed to communicate effectively with each other for everything to work smoothly.

This project also helped me better understand the power and synergy of serverless AWS services (Lambda, EventBridge, SNS, and CloudWatch) in automating critical cloud operations. The few challenges I faced—especially with IAM roles—also helped me see why giving clear permissions to each service is crucial to ensure it functions as intended.

Share your love
Tom Sankara
Tom Sankara
Articles: 17

Newsletter Updates

Enter your email address below and subscribe to our newsletter