Now, we are going to use DMS (Database Migration Servivce) as our real-time transfer data tools from our database to our datalake.
First, we need to make the subnet group.
- Go to DMS Console
- click
subnet groups - click
create subnet group - in subnet group page, fill the name as
PostgreToS3SG - fill the description as
Subnet group for Postgre to Datalake S3 - in VPC, choose
DMSRDSVPC - in add subnet, choose the private subnets (
Private1andPrivate2) - click
Create subnet group
Now, we need to make the replication Instance.
- in left menu, click
Replication instances - click
Create replication instance - in name, fill
PostgreToS3RI - in description, fill
Instance for replicate Postgre DB to Datalake - in VPC, choose
DMSRDSVPC
- scroll down and click
create
It will take a while to provision it.
Then, the endpoints, which acts as a connector between the database and data lake. We will make the source endpoint first.
- click
Endpointsat the left menu - click
create endpoint - select
source endpoint - click
select RDS DB Instance - in RDS instance, fill
rdspostgre - in access to endpoint database, choose
provide access infromation manually
- fill the password as
master123in database description - open
test endpoint connectionoption - select VPC as
DMSRDSVPC - for replication instance, choose
postgretos3ri - click
Run test
you will see the successful connection.

- click
create endpoint
Before going to the target endpoint, we need to create the permission to access the data lake.
-
go to IAM Console
-
click
Create role -
in AWS services, choose
DMS -
click
next:permissions -
in attach permissions policies, search for
AmazonS3FullAccess -
click the checkbox on the left side
-
click
next:review -
in review page, fill the role name as
DMSAccessS3Role -
In role page on IAM, search the role name of
DMSAccessS3Roleand click the name of the role
Now, we are going to create the target endpoint
- go to DMS Console
- click
Create endpointin endpoint homepage - click
target endpoint - in endpoint identifier, fill
Datalake - in target engine, choose S3
- in service access role ARN, fill the Role ARN.
- in bucket name, fill the bucket name of the data lake we have created.
yourname-datalake-workshop - in endpoint-specific settings, fill the extra connection attributes
addColumnName=true
- open
test endpoint connectionoption - select VPC as
DMSRDSVPC - for replication instance, choose
postgretos3ri - click
Run test
you will see the successful status
- click
Create Endpoint
Now, we need to execute the task.
- click
Database migration taskson the left menu - click
create task - on task identifier, fill
PostgreRDStoS3Rep - for replication instance, choose
postgretos3ri - in source database endpoint, choose
rdspostgre - in target endpoint, choose
datalake - for migration type, choose
Migrate existing data and replicate ongoing changes
- in task settings, change the target table preparation mode to
Do nothing - click
Enable CloudWatch logs
- in table mappings, click
Add new selection rule - in schema, choose
Enter a schema - in schema name, enter
public - click
create task
it will take several minutes to complete.

Once it's completed, we need to check the data lake
- go to S3 Console
- click your datalake name
yourname-datalake-workshop - you will see the folder
publicand contains every table inside.


