Microsoft Fabric

Microsoft Fabric is the latest release of the data and analytics platform from Microsoft, announced in May of this year. This release is significant for organizations and data professionals utilizing Microsoft’s data stack for building and managing analytics solutions. Microsoft unveiled several exciting announcements, building upon existing components and further enhancing the promise of full integration of products and services used by different data roles and teams within an organization. This integration aims to make management, collaboration, and data sharing between teams easier than ever.

Notably, Microsoft Fabric introduces significant changes that position it as the top choice for organizations seeking a seamless experience in developing and managing their data projects. It also enables them to leverage their data assets to advance their capabilities and prepare for the next wave of data processing and analytics, particularly with the aid of AI.

With Microsoft Fabric, the platform is now well-equipped to accommodate the demands of the AI-driven future, making it an attractive option for organizations looking to stay at the forefront of data-driven innovation and analytics.

In this post, I will be covering the following points:

  1. The exciting changes introduced in this release.
  2. The benefits resulting from these changes.
  3. What is the migration path?

OneLake – the onedrive for your data

 

With Microsoft Fabric now a full-fledged SAAS solution, there is no longer a need to provision multiple storage accounts to cater to the data lake requirements of various teams within the organization. Fabric handles all of this seamlessly in the background. With Onelake, you can bypass the lengthy storage design process, provisioning, deployment, and individual storage account management.

One storage for all teams, simpler way of accessing files

Even better, once you set up Fabric, the storage becomes easily shareable across different teams or business units within your organization. The storage is also globally distributed. For larger organizations with multiple geographical locations, Onelake automatically provisions storage in each location on your behalf. Despite being distributed, the storage appears as a single logical unit, and you’ll only see one storage account.

Navigating through Onelake is remarkably simple. Microsoft markets it as being like OneDrive for your data, and it organizes files just like the familiar Windows/File Explorer structure, with folders and files. You can even add Onelake to your computer’s directory, similar to OneDrive, where you can conveniently access and interact with all the contents that are available to you. Pretty awesome, isn’t it?

Shortcuts

Onelake has also introduced a feature similar to the Windows way of referencing existing files or folders in Windows Explorer, enabling you to access those files as if they exist in the location where you need them. This feature, called ‘Shortcut,’ functions similarly to Windows Explorer’s shortcut, allowing you to reference existing files or data without the need to recreate them. This significantly reduces data duplication, minimizes the need for data processing and staging, and simplifies data sharing across teams and organizations. Data remains where it is, under the control of its owner, eliminating the need to move data outside of the owner’s domain. It truly provides a data mesh experience!

Delta/Parquet format!

I believe the most significant change of all is that Fabric is shifting away from proprietary formats and has adopted the open standard delta/parquet, which gained popularity through Databricks. This transition to the delta/parquet format enables seamless access to data regardless of the service you are using. Whether you are running SQL queries or utilizing it for machine learning processing via Spark notebooks, all compute technologies within Fabric will now be able to comprehend this open format. This eliminates the need for costly preprocessing and transformations, making data analysis and extracting insights much quicker and more efficient.

DirectLake storage mode for PowerBI

Another significant advantage of moving to the Delta/Parquet format is the introduction of the new storage mode called DirectLake, which is a fantastic feature in Fabric. This storage mode combines the two existing modes in Power BI, namely Direct Query and Import modes.

Direct Query allows for near real-time analysis of data from relational storage without the need to import the data into a Power BI dataset. Queries are directly issued to the database, and the database returns the results to Power BI. However, this approach has limitations on the types of queries that can be issued to the database, and it can become slow depending on the aggregations required by the visualizations within your report.

Import mode, on the other hand, is blazing fast, as the data is stored in tabular form, and the analysis services query engine for a tabular database is highly performant. However, with import mode, as the name suggests, the data must be imported first before you can use it. This can be challenging for big data processing scenarios, where pulling out all the data, both current and historical, for analysis can be time-consuming.

With DirectLake, these challenges are now gone, and the advantages of the two approaches are combined into one. As the name suggests, DirectLake issues the query directly to the data that sits in Onelake. The table is formatted using Delta/Parquet, a columnar type that is efficiently compressed and compatible with the analysis services engine. It can process the data the same way as with a tabular database. With this mode, there is no need for scheduled processing of importing data. The data can stay right where it is, and Power BI will handle the reading directly. No need to import, plus it’s blazing fast. Magnificent!

One compute for all the services!

Another great update with Fabric is the shared capacity model, where the capacity you have subscribed to is shared across all services within Fabric. This means that a capacity unit of F64, for example, will be used by any service within Fabric, such as Data Warehouse, Data Factory, Power BI, etc., and there is no need to acquire separate licenses for each. This makes it extremely simple to manage and monitor resources, performance, and costs.

What is the migration path

Given that the product is still in public preview, Microsoft has not announced any official advice or recommendations related to the upgrade or migration path yet. This information will likely be available once the product nears general availability, which is expected to happen later this year. However, since Fabric includes most of the services already existing in the current offerings, the upgrade will likely be straightforward, especially for the most commonly used services such as Data Factory/Synapse Pipelines, Dedicated SQL pool, Spark Notebooks, etc.

The one major work I’m seeing is the conversion of the existing data to the new delta/parquet format. Microsoft will probably have a migration tool to assist with this come general availability, but if not, it’s something to consider for development work if you plan to transition to using Fabric right away. I will cover further details on the things to consider when migrating to Fabric in a separate post.

Conclusion

With these exciting capabilities, it’s no doubt that moving to Fabric is the way to go. However, as with any products that are newly released, there will always be caveats, and these must be weighed against the benefits that can be gained with early adoption. My advice is to wait for a further 6 months or 1 year for the product to be further polished to avoid any potential problems, especially when running a production system.

Subscribe

It’s The Bright One, It’s The Right One, That’s Newsletter.

© 2023 DataGlyphix All Rights Reserved.