Skip to content

Latest commit

 

History

History
121 lines (96 loc) · 4.57 KB

File metadata and controls

121 lines (96 loc) · 4.57 KB

Transfer data from RDS Database Postgre to Datalake S3

Now, we are going to use DMS (Database Migration Servivce) as our real-time transfer data tools from our database to our datalake.

First, we need to make the subnet group.

  1. Go to DMS Console
  2. click subnet groups
  3. click create subnet group
  4. in subnet group page, fill the name as PostgreToS3SG
  5. fill the description as Subnet group for Postgre to Datalake S3
  6. in VPC, choose DMSRDSVPC
  7. in add subnet, choose the private subnets (Private1 and Private2)
  8. click Create subnet group

Now, we need to make the replication Instance.

  1. in left menu, click Replication instances
  2. click Create replication instance
  3. in name, fill PostgreToS3RI
  4. in description, fill Instance for replicate Postgre DB to Datalake
  5. in VPC, choose DMSRDSVPC
  6. scroll down and click create

It will take a while to provision it.

Then, the endpoints, which acts as a connector between the database and data lake. We will make the source endpoint first.

  1. click Endpoints at the left menu
  2. click create endpoint
  3. select source endpoint
  4. click select RDS DB Instance
  5. in RDS instance, fill rdspostgre
  6. in access to endpoint database, choose provide access infromation manually
  7. fill the password as master123 in database description
  8. open test endpoint connection option
  9. select VPC as DMSRDSVPC
  10. for replication instance, choose postgretos3ri
  11. click Run test

you will see the successful connection.

  1. click create endpoint

Before going to the target endpoint, we need to create the permission to access the data lake.

  1. go to IAM Console

  2. click Create role

  3. in AWS services, choose DMS

  4. click next:permissions

  5. in attach permissions policies, search for AmazonS3FullAccess

  6. click the checkbox on the left side

  7. click next:tags

  8. click next:review

  9. in review page, fill the role name as DMSAccessS3Role

  10. click create role

  11. In role page on IAM, search the role name of DMSAccessS3Role and click the name of the role

  12. copy the Role ARN at the top of the page

Now, we are going to create the target endpoint

  1. go to DMS Console
  2. click Create endpoint in endpoint homepage
  3. click target endpoint
  4. in endpoint identifier, fill Datalake
  5. in target engine, choose S3
  6. in service access role ARN, fill the Role ARN.
  7. in bucket name, fill the bucket name of the data lake we have created. yourname-datalake-workshop
  8. in endpoint-specific settings, fill the extra connection attributes addColumnName=true
  9. open test endpoint connection option
  10. select VPC as DMSRDSVPC
  11. for replication instance, choose postgretos3ri
  12. click Run test

you will see the successful status

  1. click Create Endpoint

Now, we need to execute the task.

  1. click Database migration tasks on the left menu
  2. click create task
  3. on task identifier, fill PostgreRDStoS3Rep
  4. for replication instance, choose postgretos3ri
  5. in source database endpoint, choose rdspostgre
  6. in target endpoint, choose datalake
  7. for migration type, choose Migrate existing data and replicate ongoing changes
  8. in task settings, change the target table preparation mode to Do nothing
  9. click Enable CloudWatch logs
  10. in table mappings, click Add new selection rule
  11. in schema, choose Enter a schema
  12. in schema name, enter public
  13. click create task

it will take several minutes to complete.

Once it's completed, we need to check the data lake

  1. go to S3 Console
  2. click your datalake name yourname-datalake-workshop
  3. you will see the folder public and contains every table inside.

BACK TO WORKSHOP GUIDE