Mastering Data Transformation with Azure Data Factory Derived Activity: A Complete Guide

Azure Data Factory is a powerful cloud-based data integration and transformation service offered by Microsoft. It enables users to create, schedule, and manage data pipelines for moving and transforming data from various sources to various destinations, both on-premises and in the cloud. One of the most useful activities in Azure Data Factory for data transformation is the Derived activity.

The Derived activity is a transformation activity that allows users to create new columns in the data flow by using expressions. It can be used for a wide variety of data transformation tasks, such as filtering, mapping, aggregating, and calculating data. It provides a simple and flexible way to create new columns by using functions, operators, and other expressions.

In this article, we will explore the Azure Data Factory Derived activity in detail and provide a complete guide on how to use it to master data transformation.

Benefits of Using Derived Activity in Azure Data Factory

The Azure Data Factory Derived activity offers several benefits for data transformation, including:

1. Flexibility

The Derived activity is a flexible activity that allows users to create new columns in the data flow by using expressions. It can be used for a wide range of data transformation tasks, such as filtering, mapping, aggregating, and calculating data.

2. Ease of Use

The Derived activity is easy to use and requires no coding skills. Users can create new columns in the data flow by using a simple and intuitive interface.

3. Reusability

The Derived activity is reusable, which means that users can create a set of expressions and reuse them in multiple data flows. This can save time and effort in creating and managing data transformation tasks.

4. Compatibility

The Derived activity is compatible with a wide range of data sources and destinations, including Azure Blob Storage, Azure SQL Database, and Data Lake Storage.

How to Use Derived Activity in Azure Data Factory

The Azure Data Factory Derived activity can be used in three simple steps:

1. Create a Data Flow

The first step is to create a data flow in Azure Data Factory. A data flow is a visual representation of the data transformation tasks that need to be performed.

2. Add a Derived Activity

The second step is to add a Derived activity to the data flow. To do this, users can simply drag and drop the Derived activity from the activities panel and place it in the data flow.

3. Configure the Derived Activity

The third and final step is to configure the Derived activity. This involves specifying the input columns, creating new columns using expressions, and specifying the output columns.

Example

To illustrate how to use the Derived activity in Azure Data Factory, let us consider an example where we want to create a new column in the data flow that calculates the total sales amount for each transaction. The input data contains columns for transaction ID, product ID, quantity, and price.

The following steps can be followed to achieve this:

  1. Create a data flow in Azure Data Factory.
  2. Add a source dataset that contains the input data.
  3. Add a Derived activity to the data flow.
  4. Configure the input columns by selecting the relevant columns from the source dataset.
  5. Create a new column by using the following expression: quantity * price
  6. Specify the output columns by selecting the relevant columns from the source dataset and the newly created column.
  7. Add a sink dataset to the data flow to store the transformed data.
  8. Run the data flow to transform the data.

Leave a Reply

Your email address will not be published. Required fields are marked *