Converting Page or Append Blobs to Block Blobs with Azure Data Factory
In this article, SaikumarMandepudi explains how to use Azure Data Factory to convert page or append blobs into block blobs, enabling access tier changes and storage cost optimization.
Introduction
Converting page or append blobs to block blobs can be necessary when optimizing storage costs in Azure. Certain blob types, like page or append blobs, cannot be directly moved to the archive access tier—only block blobs support access tier functionality. This article outlines how to convert page or append blobs into block blobs using Azure Data Factory (ADF), after which any standard method can be used to transition them to the archive tier.
Problem Context
- Some storage accounts have many infrequently accessed page blobs in the hot tier—often kept solely for backup purposes.
- Only block blobs can have their access tier changed in Azure Blob Storage (see documentation).
Azure Data Factory Solution
- The Azure Blob Storage connector in ADF supports copying data from block, append, or page blobs, and copying to block blobs (see ADF connector docs).
- No special configuration is required—the ADF copy activity will create destination blobs as block blobs by default.
Step-by-Step Guide
Step 1: Create Azure Data Factory (ADF) Instance
- In the Azure Portal, create a new Azure Data Factory resource using the quickstart guide.
- After creation, launch the ADF Studio UI.
Step 2: Create Datasets
- Navigate to Author > Datasets > New dataset.
- Select Azure Blob Storage, then set the format to “binary.”
- Create one dataset for the source (the storage account containing page or append blobs) and another for the destination.
Step 3: Create Linked Services
- Create a new linked service in ADF, referencing the storage account with the source blobs.
- Set the file path for the page (or append) blobs to convert.
- Create a separate dataset and corresponding linked service for the destination storage account (can be the same or different account, as required).
Step 4: Configure the Copy Data Pipeline
- Create a new pipeline in ADF.
- From Move and Transform, drag and drop the Copy data activity.
- Assign the previously created source and destination datasets.
- Select the “Recursively” option if you want to include subfolders and publish your changes.
- Adjust filters and copy behavior as required to suit your scenario.
Step 5: Debug and Validate
- Run the pipeline in debug mode.
- If successful, the output will display a “succeeded” status.
- Verify in the destination storage account that the blob type is now “block blob,” and the access tier defaults to Hot.
Next Steps: Changing Access Tier
Once blobs are converted to block blobs, you can change their access tier to Archive using methods such as:
- Azure Blob Lifecycle Management (LCM) policies
- Azure Storage actions
- Azure CLI or PowerShell scripts
See lifecycle management docs or bulk archive docs for references.
Conclusion
Using Azure Data Factory provides a streamlined approach to convert page or append blobs into block blobs, after which standard tools and policies can be used to transition the access tier and optimize storage costs. This approach is more efficient than developing custom scripts or utilities.
References
- Access tiers for blob data
- ADF Azure Blob Storage connector
- ADF Copy Activity Overview
- Azure Storage task quickstart
- Lifecycle management overview
- Bulk change to archive tier
This post appeared first on “Microsoft Tech Community”. Read the entire article here