Azure Data Factory integrates very well with both GitHub and Azure DevOps. If you have multiple developers working in the same factory, you can even merge changes from different branches. Try doing that on an SSIS package!
Okay, so how do we start?
Setting up integration with GitHub can be done when you create your new Data Factory or you can set it up after it is created.
In this sample, I will show how to integrate with GitHub after the Data Factory is created.
If you have never used GitHub before you should start by creating a new account. Personally, I have a free account. But you can see all the options here. Then you need to create a repository. In my case, this is set to “private”. Please read more about this on the GitHub site.
Click on “Create repository”. When you then navigate to your new repository you will see guidelines on how to set it up.
Command line? Do not panic! It is really easy. Download the setup file from https://git-scm.com/downloads. Install it.
Copy the commands to a command line and then execute them. If you get any warnings or errors please Google them. There is a lot of documentation out there.
Then go to https://adf.azure.com/ and choose the factory you want to connect to GitHub
Enter your repository settings. Please correct any warning and errors before continuing.
If this succeeds you will be asked to create a local branch that you could do your development in.
And when you open your Data Factory you will see that it is connected to GitHub.
So this is the flow you typically will use when you develop and want to publish something.
- Do all development using your local branch
- Merge your changes to the collaboration branch. In this case “master”
- Publish your changes
It is not possible to publish changes from another branch than the collaboration branch. It will just give an error message.
So now I do a small change to my Data Factory (on my local branch “sindre_local_branch”). I am adding a wait task. So now my branches are different. To merge my change to the master branch I need to create a pull request.
This will redirect me directly to the GitHub site. On this site, you can see the changes and do the pull request.
Then you can add a comment if you want
Click on “Create pull request”
Click on “Merge pull request” and then “Confirm merge”.
And if everything works out you’ll see this message.
If you then go back to your Data Factory and change to the master branch, you will see that it is similar to the other one.
But what if I hire a new developer and that developer needs his own branch? Just create another branch.
Write a name for the new branch
And this branch will be a copy of the master branch.
Some other useful tips
- You should probably add some documentation and description in a file called “README.md”
- If you do not like to use the functionality on the GitHub page. There are a lot of Windows / Console apps out there that you could use. For example the GitHub Desktop
- Never worked with Git before? Then I recommend you to read a tutorial on the basics.