This post describes an easy way of importing a CSV-file saved in Amazon S3 to a table in Amazon Aurora.
Technologies used in this post:
- Amazon Aurora is a cloud-based relation database which is compatible with both MySQL and PostgreSQL
- Amazon S3 is a cloud-based object storage
To get started we need to download a sample CSV file. I this case I will use this dataset: https://www.stats.govt.nz/assets/Uploads/Births-by-statistical-area-2-and-area-unit-for-comparison.csv
First, you need to upload this file to an S3 Bucket
Then connect to your Aurora database and create a new table
create table s3_import_test ( geography_type varchar(100), gCode varchar(100), gDescription varchar(100), birth_2013 int, birth_2014 int, birth_2015 int, birth_2016 int, birth_2017 int, inserted_date DATETIME )
To import the data we will use this statement
LOAD DATA FROM S3 's3://your-S3-url/Births-by-statistical-area-2-and-area-unit-for-comparison.csv' INTO TABLE s3_import_test CHARACTER SET UTF8 FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( geography_type , gCode , gDescription, birth_2013 , birth_2014 , birth_2015 , birth_2016 , birth_2017 , @inserted_date ) SET inserted_date = CURRENT_TIMESTAMP;
@inserted_date is a variable that will be assigned in the SET-statement at the end. I use variables a lot to call custom functions or to use some of the built-in functionality in Aurora.
And the result will look like this:
You can read more in the documentation here: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html