Skip to main content

Azure ML How to Get Your Data Moving Using Azure Data Factory

Author by Jeff Lipkowitz

When designing an Azure machine learning solution, one of the most critical decisions is understanding how any Azure machine learning solution pipeline will be executed. Within Azure AML there four major options that can be used. All options have their strengths and weaknesses.  In this blog I will talk in detail about each option. 



                       Major ML Pipeline Execution Options 



Python SDK Schedule 

A user can code to schedule a pipeline within Azure machine 

Python SDK Trigger 

This option uses case based on a trigger. For example, a user uploads data to a blob, stores it, and can execute a pipeline 

Logic App 

Used if a user needs advanced logic to execute the pipeline 

Azure Data Factory 

Can be used to trigger pipelines using a batch mythology  



Azure Data Factory 

Azure Data Factory can be used as tool to schedule pipelines using low to no-code. This is ideal for anyone from a Data Engineer to IT support who wants to handle the entire ML Ops process using one tool. In addition, it gives a visual representation of what is happening and also has built-in monitoring.  


Keeping all data pipelines and notebooks in one tool and one code set is great for organizations who have limited data science resources and want to have resources that are familiar with general data concepts. 


Below the image shows an example of how you can execute a pipeline. 



The first step is to add the pipeline to your workspace as seen below. 








The next step is to go into Machine Learning module of Data Factory. Select the Machine Learning Executions module.  The first and second module are machine learning modules for ML classic.