In this post, I will show you how to get started with Azure Data Factory. We will use the sample data from the AdventureWorksLT database. Please read this post on how to get access to it.
First, we will start by creating a table. I am creating this table in my AdventureWorksLT database for simplicity
CREATE TABLE dbo.FactSales ( FactSalesId int NOT NULL IDENTITY (1, 1), OrderDate date NULL, DueDate date NULL, ShipDate date NULL, OrderQty smallint NULL, UnitPrice money NULL, ProductId int NULL, ProductName nvarchar(50) NULL ) ON [PRIMARY]
Then we will open a browser and navigate to portal.azure.com. In the search box write “Data factories” and click on it
Then click on “Add”
Fill out the name and choose a subscription and resource group. For simplicity, I am not setting up Git in this post
When you click on “Create” the Data Factory will be created in the background for you. To open your new Data Factory you can either search for it in the Azure Portal or you can go to adf.azure.com. I prefer the latter option.
Click on “Continue”
To create a connection to our database click on “Connections” and then “New”
Search for “Azure SQL Database”
Write a name for the connection, choose SQL Server instance, fill out the user name and password and then click on “Test connection”.
It will then appear in the list of connections
In this case, I will only have one connection since I am both reading and writing to the same database.
To create a dataset click on “+” and then choose “Add Dataset”
Select “Azure SQL Database”
Create datasets for the following tables
Then add a new data flow
Choose “Mapping Data Flow”
Click on “Add source” and select DS_SalesOrderHead. Do the same again for DS_SalesOrderDetail
Click on the “+” after SalesOrderHead. Add a “Join” and configure as below
Since both datasets have several columns with the same name add a “Select” after the join. Check “Skip duplicate input” and “Skip duplicate output”
Click on “+” after the “Select” and add a “Sink”.
Choose manual mapping. Remove “ProductName” and “FactSalesId”
To execute the “Data Flow” we need to add a new “Pipeline”
Then drag “Data Flow (Preview)” on to the canvas. Choose “Mapping Data Flow”.
Click on “Publish All”.
Then click on “Trigger Now”
So if everything works out you can query FactSales and see this result