Automating AWS Disaster recovery for RDS

“ It wasn’t raining when Noah built the Ark”

Disaster recovery is one of the biggest challenges for IT infrastructure. In this blog, We will be focusing on how we implemented Disaster recovery (DR) for AWS RDS, here in MiQ.For a production environment, it is important to take precautions so that we can recover if there’s an unexpected event. While Amazon RDS provides a highly available Multi-AZ configuration, it can’t protect from every possibility, such as a natural disaster, a malicious activity, or logical corruption of a database. To maintain business continuity, it is important to design and test a DR plan.

What is meant by Disaster and Disaster Recovery?

“Any event(natural or man made) that has a negative impact on a company’s business continuity or finances could be termed a Disaster.”

and

Disaster Recovery (DR) is all about “preparing for” or “recovering from” a disaster ”

Amazon RDS Snapshots

Automated backup allows you to recover a database in the same AWS region. If you want to be certain that you are prepared for DR or migration by being able to automatically recover your database in another AWS region, you need to use manual snapshots. After taking a manual snapshot, it can be copied to the other region, where it can then be restored.

How to configure Disaster Recovery for RDS?

The setup works in two parts i.e. it creates a manual snapshot of the current db present in production(source) region, then copies that snapshot to the DR(destination) region, also deletes the snapshots older than 3 days in the DR region and secondly, if the disaster occurs db can be restored from the snapshots present in the DR region.

We have automated the whole process through python scripts using Boto3.
For Db instances, rds_db_instances_backup.py can be used for creating and copying the snapshot into DR region from Production region and rds_db_instances_restore.py can be used to restore the db in the DR region in the event of disaster, similarly for AWS RDS cluster rds_db_cluster_backup.py and rds_db_cluster_restore.py can be used to solve the purpose.

The scripts are hosted here:

Conclusion:

Here we discussed how we can have our RDS protected from different possible failures using manual snapshots. For ease of use, the scripts can be scheduled to run periodically daily using a cron job for the back up and even can be hosted on CI (like Jenkins) as well.

Being an Imagineer beyond an Engineer; Passionate DevOps engineer. ❤️ Always eager to learn, share, and expand knowledge.