Migration from Azure Synapse to Fabric

Migration from Azure Synapse to Fabric

 

Microsoft keeps unveiling new technologies, and one of their latest technologies is Microsoft Fabric – a highly integrated, end-to-end, and easy-to-use platform that offers a comprehensive suite of services, like Data Factory, Data Engineering, Data Warehousing, Data Science, Real-Time Analytics, and even Power BI.

Azure Synapse was initially the go-to solution for many companies for handling large volumes of data for data warehousing and big data analytics. But now, Microsoft has released Fabric to handle data tasks more conveniently.

There’s not much reason for companies not using Azure Synapse yet to pick it over Fabric. Opting for Fabric is way better – the spark engine kicks in faster, and Power BI fits in more smoothly.

However, If your company has already invested in Azure Synapse but wants to migrate to using Fabric, this article will provide insights into the ease of migration and explore essential considerations and best practices.

Why You Should Migrate to Microsoft Fabric

Migrating from Azure Synapse to Microsoft Fabric can significantly benefit your data analysis workflow. Here are some of the many reasons why you should migrate to Fabric.

Reusable Assets, Streamlined Workflows

Fabric operates on the “build once, reuse everywhere” philosophy. Your notebooks, pipelines, and other assets will automatically become accessible across all Fabric experiences, eliminating the need for duplication and manual transfer. This saves time, reduces redundancy, and allows your team to focus on extracting insights from data rather than managing fragmented resources.

Centralized Control

Managing data assets across multiple platforms can be a logistical nightmare for many organizations. However, with Fabric, you can have a centralized administration and governance for all your data tasks. You can set access controls, define metadata standards, and monitor data usage across the entire platform to ensure consistency, security, and compliance with your organization.

Unified Data Lake (OneLake)

A Data Lake is a concept in technology where all the data is stored in one place. However, the reality is that many organizations have multiple data lakes for storing their data, which can lead to issues like duplicate data and even maintenance problems.

But with Fabric’s OneLake, organizations can have a single, unified data lake, even if they have multiple branches and business domains. The idea is to reduce data duplication and make data management easier.

Simplified Analytics with AI

Azure OpenAI service with features like GPT-powered Copilot is integrated into Microsoft Fabric. GPT-powered Copilot allows you to use conversational language to interact with your data. You can use conversational language to analyze data, build models, develop data flows and pipelines, generate code and functions, and visualize results. You can even customize your own conversational language experience and publish them as plug-ins.

Migrating from Azure Synapse to Microsoft Fabric

Migrating from Azure Synapse to Fabric requires careful planning and consideration. It is advisable to start small and then prepare for multiple migrations. Conducting proper training for your teams helps ensure they have relevant experiences and knowledge of the processes required for migration.

Additionally, create an inventory of objects that need to be migrated and document the migration process so that it can be repeated for other SQL pools or workloads.

Furthermore, review Fabric’s performance guidelines and adjust your design before migrating. While some changes can be made after migration, making them before migration will ease the process and also save time.

Here are the different ways to migrate from Azure Snypase to Microsoft Fabric:

Migration with Fabric Data Factory

This migration method is for the low-code/no-code persona that is familiar with Azure Data Factory and Synapse Pipeline. It has a drag-and-drop UI option with simple steps on how to migrate your Synapse data to Fabric.

You can go about this in different ways:

Using the Data Factory Copy Wizard and ForEach Copy Activity

You can use the Data Factory Copy Wizard to connect to the source dedicated SQL pool, convert the dedicated SQL pool DDL syntax to Fabric, and copy data to Fabric Warehouse. The Copy Wizard offers a simple interface for converting DDL (data definition language) and copying tables from the dedicated SQL pool to Fabric Warehouse

There are two phases here – the Staging Phase, where you extract the data from the dedicated SQL pool into Azure Data Lake Storage (ADLS), and the Ingestion Phase, where you ingest the data from ADLS into the Fabric Warehouse.

This method is recommended only for dimension tables because it requires you to parallelize read and write operations during the transfer from the source to the staging phase (Dedicated SQL pool to ADLS). Therefore, before using Copy Wizard, carefully assess the size and throughput needs of your data.

Using Copy Activity For Each Fact Table with Partition Option

This method improves the throughput to load larger fact tables using the Fabric data pipeline. It allows you to divide tables into smaller, bite-sized chunks and send multiple data transfer requests simultaneously.

If your source table does not have partitions already, you can create and customize dynamic partitions within Microsoft Fabric. There are available options where you can select the partition column and the value range you need for the data.

Note that the dedicated SQL pool can only handle a maximum of 128 active queries at once. And you’re required to scale to a minimum of DWU6000 to allow all queries to be executed.

Migration with CETAS (Create External Table As Select)

CETAS is a SQL operation that allows you to create an external table based on the result of a SELECT query. It is particularly useful when you want to move or duplicate data without performing a traditional insert operation.

In this scenario, you can use CETAS commands to convert tables into parquet files and write them directly to the ADLS Gen2 container. Parquet files provide efficient data storage and take less bandwidth to move across the network. And since Microsoft Fabric stores data in Delta parquet format, data ingestion will be much faster.

Here’s how you can migrate from Azure Synapse to Fabric using CETAS:

Step 1. Prepare the Data Lake

Begin by creating an Azure Data Lake Storage (ADLS Gen2) container. This container will serve as a temporary staging area to house the extracted data during migration. ADLS Gen2 is scalable and integrates smoothly with both Synapse and Fabric.

Step 2: Extract the Data

Use CETAS commands in Synapse SQL to convert tables into Parquet datasets and write them directly to the ADLS Gen2 container.

Step 3: Align Schemas

Convert the database schema (DDL) from Synapse to Fabric-compatible T-SQL scripts. You can use data build tool (dbt) or SQL Database Projects to streamline this process and ensure a smooth transition between the two environments.

Step 4: Ingest Data into Fabric

Use either COPY commands or Azure Data Factory pipelines to load the Parquet datasets from ADLS Gen2 into Fabric tables.

Now, the data is accessible for analysis and querying within Fabric.

Migration via dbt (data build tool)

dbt is a tool that helps manage and transform data in a data warehouse. It’s commonly used for cleaning up data, creating new tables, and making data easier to analyze. If you are already using dbt in your current Synapse dedicated SQL pool environment, you can

dbt generates SQL scripts (DDL/DML) on the go every time you run it. So, if you have your data transformations described in dbt using SELECT statements, dbt can quickly translate them to work on different platforms by just changing a few settings.

Important considerations when migrating from Azure Synapse to Fabric

Microsoft Fabric is a new technology and should be adopted gradually, especially if you have invested heavily in Synapse. You can start by combining both platforms and strategically aligning them with your business strategy. Microsoft Fabric and Azure Synapse are both Azure Data Lake Storage Gen2 (ADLS Gen2) and can easily connect with each other’s data lakes.

Microsoft Fabric OneLake offers a shortcut feature that facilitates the connection to existing data lakes. You can use the OneLake shortcut to connect your current lake folders in Azure Synapse to OneLake. This will transform your designated data lake folder into accessible files or tables within your Fabric Workspace. Since Fabric seamlessly integrates with Power BI, you can directly query your data on OneLake from Power BI, eliminating the need for tedious code conversions into CSV/Parquet files for importation into a Power BI Dataset.

Secondly, you can implement migration by publishing your Gold datasets to OneLake first and allowing Synapse to continue managing the Bronze and Silver layers. Adopting a hybrid environment like this is a good approach toward the full adoption of Microsoft Fabric.

When you’ve derived significant value from Fabric and have become very familiar with its operations, you can expand and connect to more external data sources using Dataflows Gen2 and process data efficiently within the Fabric Workspace. You can even incorporate streaming data directly into Fabric or completely migrate your Synapse data warehouse into the Fabric Workspace.

Microsoft Fabric is a fantastic, all-in-one tool that will disrupt the industry, and there are many exciting things to do with it. You can explore Power BI Datamarts and build machine learning models comfortably within your Fabric Workspace. Once you’ve established an operational model for your business on Microsoft Fabric, you can use it as a permanent hub for all your data operations.

Limitations of Microsoft Fabric

Here are some of the limitations of Microsoft Fabric you need to be aware of:

  • Limited Customization Support

Fabric comes with fixed behaviors; one example is its support for authentication. It only supports Azure AD (Microsoft’s authentication service). If you use Azure AD for your apps and all your tools support OAuth, this limitation might not affect you.

However, Fabric might be limiting if you want to connect your on-premise apps to your Data Warehouse or if your strategy involves multiple cloud providers like AWS and GCP. You might need to create custom solutions on Azure to communicate with OneLake because direct customization of Fabric is not allowed.

  • Regional availability

Not all Azure regions currently support Fabric data warehousing and SQL analytics endpoint. West India, UAE Central, Poland, Israel, and Italy are currently not supported for the Warehouse and SQL analytics endpoint. Therefore, if you operate in any of these regions, you should monitor Microsoft’s Fabric documentation for updates on regional availability, as they may expand support to more regions in the future.

Conclusion

Azure Synapse has held its ground for many years as one of the best platforms for handling big data for data warehousing and analytics. But now, Microsoft Fabric has shifted data analytics workflow to a more unified and convenient environment.

Microsoft Fabric emphasizes simplicity and efficiency. The “build once, reuse everywhere” philosophy liberates you from repetitive tasks and lets you focus on extracting insights from your data rather than managing fragmented resources.

Centralized control gives you the reins for consistency, data security, and compliance within your organization. Fabric OneLake, on the other hand, breaks down the silos of having multiple data warehouses and lake houses and provides a single, unified data lake to store all your data.

In light of these advancements, it is worthwhile for companies seeking a more efficient and future-ready solution for their data analytics needs to consider migrating from Azure Synapse to Fabric.

 

Subscribe

It’s The Bright One, It’s The Right One, That’s Newsletter.

© 2023 DataGlyphix All Rights Reserved.