Tuesday, March 21, 2023

Cloud #2: Azure Data Factory

Azure Data Factory is Azure's cloud ETL service for scale-out serverless data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management. You can also lift and shift existing SSIS packages to Azure and run them with full compatibility in ADF. SSIS Integration Runtime offers a fully managed service, so you don't have to worry about infrastructure management.

How many releases of ADF are there?

There have been two releases of ADF. Azure Data Factory is a managed cloud service that's built for:

  • extract-transform-load (ETL)
  • extract-load-transform (ELT)
  • data integration projects

What components is it made up of?

Data Engineers use the components show below to work with ADF:

What are connectors?

The power and popularity of ADF is derived from it's readily available 90+ connectors for many systems. You cannot write your own connector in ADF so for those scenarios I wrote a Back Office Server with Background Asynchronous Tasks and scheduled workers that would run as jobs and ingest and process data.

Can SSIS packages run in the Azure Cloud?

Yes SSIS packages can be run on premise inside SQL Server or in the Azure Cloud using Azure Data Factory.

What is an Activity?

We can write custom activities to work with our own data. An example of any activity is the Copy Activity which can be used to move data between on premise and cloud data sources in Azure.

What are some of the building blocks of ADF?

An Azure subscription might have one or more Azure Data Factory instances (or data factories). Azure Data Factory is composed of the following key components:

  1. Pipelines
  2. Activities
  3. Datasets
  4. Linked services
  5. Data Flows
  6. Integration Runtimes

The best way to learn Azure ADF is to create a Data Pipeline and ingest some CSV data and store it into to Azure SQL Server. We have a writeup for this located here. You will need an Azure account to create data pipelines.

What is Azure Data Factory Studio?

ADF Studio is part of Azure. It is used to create Azure Data Factories. It is located here here.

What is a Blob Source and Blob Sink?

The Blob Source is the input location from where ADF receives data. The Blob Sink is the destination to which ADF outputs the data after it is done transforming it.

No comments: