To do the Workshop, we will create the database and the network automatically using the Infrastructure-as-code service, called CloudFormation on AWS
- Download this Infrastructure Template
Then, after that, we are going to setup the Bastion host. This will be used to connect to the database
- go to CloudFormation Console
- click
Create Stack - click
Template is Ready - click
Upload a template fileon specify template - upload the template
- click
Next
- fill the stack name as
DataAnalyticsWorkshopStack - click
Next
- scroll down and click
Next - scroll down and click
Create Stack
This will take a while.
Once you are done, we are going to create the EC2, which will act as a bridge to the database.
- go to EC2 Console
- Click
Launch Instances - choose
Ubuntu Server 18.04 LTS (HVM), SSD Volume Typeand clickselect - In instance type, choose
t2.micro, and clickNext: configure instance details - In Instance details, choose the network as
DMSRDSVPC - in subnet, choose
Public1 - in Auto-assign Public IP, choose
Enable
- Scroll down, and open Advanced details
- in user data, copy this code and paste to the field.
#!/bin/bash
sudo apt update
sudo apt -y install vim bash-completion wget
sudo apt -y upgrade
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" |sudo tee /etc/apt/sources.list.d/pgdg.list
sudo apt update
sudo apt -y install postgresql-12 postgresql-client-12
sudo apt-get install unzip
This code will be used to install the postgre, so we can automatically use it.
- click
add storage
- in add storage, click
next:add tags - in add tags, click
next:configure security group - in security group name, type
BastionHostSG - in description, type
Firewall for Bastion Host - click
add rule - in type, click
PostgreSQL - in source, type
10.0.0.0/16this is the CIDR for the VPC - click
add rule - in type, click
PostgreSQL - in source, click the dropdown and choose
My IP - click
review and launch
- in review page, click
launch - in key options, choose
create a new key pair - fill the name as
BastionKey - click
download key pair - click
launch instances
- click
view instances
this will take a while.
During EC2 creation, let's create the datalake
- go to S3 Console
- click
create bucket - in bucket name, type
yourname-datalake-workshop - in region, choose
Singapore - scroll down and click
Create bucket
Now, let's go back to your EC2.
- go to EC2 Console
- click your instance and click connect

- go to ssh client
- open your terminal
- go to your
BastionKey.pemfile directory - type
chmod 400 BastionKey.pem - copy the example ssh cli, and paste it on your terminal. it looks like
ssh -iBastionKey.pemubuntu@ec2-000-000-000-000.ap-southeast-1.compute.amazonaws.com
you will be connected to your ec2. Now we need to setup our RDS Postgre
- go to RDS Console
- click
rdspostgre - copy the endpoint of your database. you may paste it on your text editor.

- go back to your terminal
- type
sudo su - postgres
it will make you as a postgres user. Now, we can connect to our postgre RDS
you will be connected to your database. Now, we need to turn on the CDC for the postgres for the migration step later.
in the template, there is a modification on Parameter group on RDS. These are the list that you need to modify if you provision your database manually:
"rds.logical_replication": "1",
"wal_sender_timeout": "0",
"max_wal_senders": "20",
"max_replication_slots": "50",
"shared_preload_libraries": "pglogical"
Now, we need to turn on the CDC on RDS.
- execute this query
create extension pglogical;
select * FROM pg_catalog.pg_extension;
you will see that there is a pglogical running. Means that we have successfully enabled the CDC.
- press
qin keyboard to exit - type
\qto quit postgres
We are going to take sample data.
- In postgres user, type
wget https://www.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip - type
unzip dvdrental.zip
Now, execute the restore command to backup data.
- Execute
pg_restore -h <your Endpoint here> -d dvdrental dvdrental.tar - input password as
master123
Now, let's check the data
you will see dvdrental database. Now, let's check the data
- in postgre, type
\c dvdrental - in postgre, type
\dt
it will display the list of all tables on the database.
Note in the data sample, there are columns that can't be migrated, and thus the data type either need to be changed, or drop the column. In production level case, we suggested to change the data type by using ALTER TABLE and refer to this documentation. For this workshop case, we just need to drop it, for the sake of simplicity.
- Execute this query
ALTER TABLE film
DROP COLUMN fulltext;
ALTER TABLE film
DROP COLUMN special_features;
ALTER TABLE film
DROP COLUMN rating CASCADE;

