Using GitHub in Azure Data Factory

Azure Data Factory integrates very well with both GitHub and Azure DevOps. If you have multiple developers working in the same factory, you can even merge changes from different branches. Try doing that on an SSIS package!

Okay, so how do we start?

Setting up integration with GitHub can be done when you create your new Data Factory or you can set it up after it is created.

In this sample, I will show how to integrate with GitHub after the Data Factory is created.

If you have never used GitHub before you should start by creating a new account. Personally, I have a free account. But you can see all the options here. Then you need to create a repository. In my case, this is set to “private”. Please read more about this on the GitHub site.

ADF Git 1

Click on “Create repository”. When you then navigate to your new repository you will see guidelines on how to set it up.

ADF Git 2

Command line? Do not panic! It is really easy. Download the setup file from https://git-scm.com/downloads. Install it.

Copy the commands to a command line and then execute them. If you get any warnings or errors please Google them. There is a lot of documentation out there.

Then go to https://adf.azure.com/ and choose the factory you want to connect to GitHub

ADF Git 3

Enter your repository settings. Please correct any warning and errors before continuing.

ADF Git 4

If this succeeds you will be asked to create a local branch that you could do your development in.

ADF Git 5

And when you open your Data Factory you will see that it is connected to GitHub.

ADF Git 6

So this is the flow you typically will use when you develop and want to publish something.

  • Do all development using your local branch
  • Merge your changes to the collaboration branch. In this case “master”
  • Publish your changes

It is not possible to publish changes from another branch than the collaboration branch. It will just give an error message.

So now I do a small change to my Data Factory (on my local branch “sindre_local_branch”). I am adding a wait task. So now my branches are different. To merge my change to the master branch I need to create a pull request.

ADF Git 7

This will redirect me directly to the GitHub site. On this site, you can see the changes and do the pull request.

ADF Git 8

Then you can add a comment if you want

ADF Git 9

Click on “Create pull request”

ADF Git 10

Click on “Merge pull request” and then “Confirm merge”.

ADF Git 11

And if everything works out you’ll see this message.

If you then go back to your Data Factory and change to the master branch, you will see that it is similar to the other one.

But what if I hire a new developer and that developer needs his own branch? Just create another branch.

ADF Git 12

Write a name for the new branch

ADF Git 13

And this branch will be a copy of the master branch.

Some other useful tips

  • You should probably add some documentation and description in a file called “README.md”
  • If you do not like to use the functionality on the GitHub page. There are a lot of Windows / Console apps out there that you could use. For example the GitHub Desktop
  • Never worked with Git before? Then I recommend you to read a tutorial on the basics.

 

Leave a comment