AWS Managed ElasticSearch Data Migration to another domain

Few weeks back, we decided to migrate our complete infrastructure to a single AWS region (which is closer to our client) for achieving better latencies and less data transfer cost.

There were multiple components migrated to a single AWS region and ElasticSearch was one of them. Although migration steps are documented on ElasticSearch website, there are multiple AWS concepts one needs to understand to execute it properly. Here, we have tried to dive deep into migration steps and login behind those steps.

ElasticSearch data migration is trivial and only requires knowledge of ElasticSearch in case of a self managed publicly accessible cluster but things get tricky when it comes to AWS managed clusters as AWS doesn’t allow you to access config yml file and there are a lot of other AWS IAM concepts one needs to understand for executing it properly.

In this blog we are going to discuss the step-by-step AWS managed ElasticSearch data migration process using snapshot and restore methods and how it is different from self hosted ElasticSearch data migration in detail.

Overview

Assumtion: ElasticSearch domain need to be created in destination region with same cluster configuration.

Take manual snapshot of ElasticSearch domain data into source S3 bucket
Copy source S3 bucket data into destination S3 bucket
Restore data from S3 bucket into new ElasticSearch domain

Take snapshot

Snapshot: A snapshot is a backup taken from a running ElasticSearch cluster. You can take snapshots of an entire cluster, including all its data streams and indices. You can also take snapshots of only specific data streams or indices in the cluster.

Repository: The repository is a place where you can store all your snapshots data.

Note: You must register a snapshot repository before you can perform snapshot and restore operations.You have multiple options available for defining a repository given that both old and new ElasticSearch should be able to access that location like a shared file system, AWS S3, HDFS etc. We will be discussing only the AWS S3 option in this blog post.

Register snapshot repository using AWS S3 bucket

ElasticSearch provides APIs for both registering repositories and taking snapshots.

How roles and user works in AWS

Role: An IAM role is an AWS Identity and Access Management (IAM) entity with permissions to make AWS service requests. IAM roles cannot make direct requests to AWS services; they are meant to be assumed by authorized entities, such as IAM users, applications, or AWS services such as EC2, ES etc
AssumeRole: Trusted entity can assume role and performs authorized action using temporary credentials.
PassRole To allow a user to delegate role to an AWS service, you must grant the PassRole permission to the user.

Let’s create a role to delegate permission to Amazon Elasticsearch Service to take a snapshot into S3.

Create S3 Bucket and Configure IAM Role

Create S3 bucket in the same region where the ElasticSearch domain exists and another bucket in destination region.

Create a new role theSnapshotRole without any permissions.

Create below Policy and add it to the newly created role. It will enable ElasticSearch to access both S3 bucket.

Update trust relationship for theSnapshotRole role.

Note: We need to update trust relationships for newly created roles which will enable ElasticSearch to assume roles. This assume role permission will allow ElasticSearch service to use theSnapshotRole as its own.

Configure IAM user

Once we are done with role creation, we need to create new user or we need to give PassRole permission to existing user which will allow user to pass newly created theSnapshotRole to ElasticSearch service and as theSnapshotRole already configured with assumeRole permission, it can access S3 bucket.

Create below policy and attach it with user

Register snapshot repository

Once we are done with the new role and user creation, let's start with repository registration. There are two ways of registering repositories on ElasticSearch.

Using postman if your ElasticSearch cluster can be accessed directly.

Postman

If your ElasticSearch cluster is in a VPC that you cannot access directly, you will need access to EC2 which is in the same VPC and that can execute below scripts.

Python script

Response

Take snapshot of current ElasticSearch data

After registering repository successfully, we need to run below script for taking snapshot of current ElasticSearch data.

Response

Verify snapshot

Copy S3 data to new region s3

You can copy s3 bucket data using AWS console

aws s3 cp s3://awsexamplesourcebucket s3://awsexampledestinationbucket

Restore Snapshot

Restore snapshot of old ElasticSearch data into new ElasticSearch

Verify restored indexes and alias.

Verify index data

Take-away!

AWS managed ElasticSearch provides us with a lot of benefits in terms of managing and scaling our ElasticSearch cluster but also restricts us to access configuration files which leaves us with snapshot-restore method when we want to migrate from one domain to another. Snapshot-restore method has its own downside in terms of write downtime during the migration process which is ranging from a few minutes to hours depending on data size.
Make sure you are running your services using ElasticSearch in read-only mode during the migration process to avoid data loss.

References:

Migrate data from an Amazon ES domain

Amazon Web Services, Inc.

Migrating from Amazon Elasticsearch Service | Elasticsearch Service Documentation | Elastic

Elastic