Getting started with Azure Data Factory

In this post, I will show you how to get started with Azure Data Factory. We will use the sample data from the AdventureWorksLT database. Please read this post on how to get access to it.

First, we will start by creating a table. I am creating this table in my AdventureWorksLT database for simplicity


CREATE TABLE dbo.FactSales
(
FactSalesId int NOT NULL IDENTITY (1, 1),
OrderDate date NULL,
DueDate date NULL,
ShipDate date NULL,
OrderQty smallint NULL,
UnitPrice money NULL,
ProductId int NULL,
ProductName nvarchar(50) NULL
) ON [PRIMARY]

Then we will open a browser and navigate to portal.azure.com. In the search box write “Data factories” and click on it

Create Data Factory 1

Then click on “Add”

Create Data Factory 2

Fill out the name and choose a subscription and resource group. For simplicity, I am not setting up Git in this post

Create Data Factory 3

When you click on “Create” the Data Factory will be created in the background for you. To open your new Data Factory you can either search for it in the Azure Portal or you can go to adf.azure.com. I prefer the latter option.

Create Data Factory 4

Click on “Continue”

To create a connection to our database click on “Connections” and then “New”

Create Data Factory 5

Search for “Azure SQL Database”

Create Data Factory 6

Write a name for the connection, choose SQL Server instance, fill out the user name and password and then click on “Test connection”.

Create Data Factory 7

It will then appear in the list of connections

Create Data Factory 8

In this case, I will only have one connection since I am both reading and writing to the same database.

Create Data Factory 9

To create a dataset click on “+” and then choose “Add Dataset”

Create Data Factory 10

Select “Azure SQL Database”

Create Data Factory 11

Create datasets for the following tables

Create Data Factory 12

Then add a new data flow

Create Data Factory 13

Choose “Mapping Data Flow”

Create Data Factory 14

Click on “Add source” and select DS_SalesOrderHead. Do the same again for DS_SalesOrderDetail

Create Data Factory 15

Click on the “+” after SalesOrderHead. Add a “Join” and configure as below

Create Data Factory 16

Since both datasets have several columns with the same name add a “Select” after the join. Check “Skip duplicate input” and “Skip duplicate output”

Create Data Factory 18.1

Click on “+” after the “Select” and add a “Sink”.

Create Data Factory 25

Choose manual mapping. Remove “ProductName” and “FactSalesId”

To execute the “Data Flow” we need to add a new “Pipeline”

Create Data Factory 20

Then drag “Data Flow (Preview)” on to the canvas. Choose “Mapping Data Flow”.

Create Data Factory 21

Click on “Publish All”.

Create Data Factory 22

Then click on “Trigger Now”

Create Data Factory 23

So if everything works out you can query FactSales and see this result

Create Data Factory 24

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s