Self service analytics and insights (Preview)

Note

The Microsoft Purview Data Catalog is changing its name to Microsoft Purview Unified Catalog. All the features will stay the same. You'll see the name change when the new Microsoft Purview Data Governance experience is generally available in your region. Check the name in your region.

Self-service analytics and insights refer to data, tools, and platforms that enable business users to access, analyze, and generate insights from data independently. The Purview data governance application publishes domain and dimension models of metadata and metadata models into OneLake, allowing customers to analyze and generate insights by bringing their own tools and compute. Self-service analytics of data governance metadata is valuable in driving continuous improvement of customers' data estate health management and fostering a data-driven culture across the organization by democratizing access to data estate insights and health management.

Key Components

  • Data Models: 3NF Models with domains and dimensions details
  • Metadata: Data governance metadata that includes:
    • Governance domains
    • data products
    • data assets
    • glossary terms
    • subscription request
    • data quality rules
    • dimensions
    • data quality scores and more are coming soon.

Benefits

  • Empowerment: Empowers data practitioners, data product owners, data stewards, and analysts to explore data governance metadata and link metadata from various sources to derive insights.
  • Flexibility and Efficiency: Customer will able to create custom reports in addition to out-of-the-box reports in health management.
  • Agility: Allows customers organizations to respond more swiftly to health management issues and remediation.
  • Cost-Effective: Reduces the need for setting up platforms and build tools. All data is available in OneLake and customer will able to use available tools (Fabric semantic model, PBI reporting, Data flow, and notebook) in OneLake.

Currently available reports (out-of-the-box)

Here are the available out-of-the-box reports. These reports aren't customizable.

  • Classic assets - an overview of assets by type and collection, and their curation status.
  • Classic catalog adoption - to understand at a glance how your data catalog is being used. your glossary, providing a snapshot of terms and their status.
  • Classic classifications - an overview of assets classified and the types of classifications.
  • Classic data stewardship - an overview of assets classified and the types of classifications.
  • Classic glossary - health and use of glossary terms.
  • Classic sensitivity labels -an overview of assets that have sensitivity labels applied and the types of labels applied.
  • Data governance - The data governance health report allows your team to track your health progress at a glance, and identify areas that need more work.
  • Data quality health - Data quality dimensions and data quality rules performance reporting.

Screenshot that shows purview metadata insight.

Subscribe Microsoft Purview Catalog metadata

You can subscribe to Microsoft Purview's data governance metadata for analytics and to derive insights. To subscribe, follow the below listed steps:

  1. Select Settings from the left pane, select Data catalog, then select solution integrations.

    Screenshot that shows how to navigate subscription page.

  2. Select Edit.

  3. Add Storage type.

  4. Add Location URL

  5. Add folder name at the end of the URL - example: /DEH (see the screenshot)

  6. Grant contributor accesses to Microsoft Purview Manage Service Identity (MSI) to your fabric workspace.

  7. Test connection.

    Screenshot that shows how to configure subscription connection.

Data model of self-serve analytics metadata

The 3NF domain model is part of the normalization process in relational database design, which ensures that the database is free of redundancy and update anomalies. A database schema is in the Third Normal Form if it meets the requirements of the First and Second Normal Forms and all its attributes are functionally dependent only on the primary key. The purpose of 3NF domain model is used to structure data in a way that minimizes duplication and ensures data integrity. It focuses on breaking down data into smaller, related tables where each piece of information is stored only once.

Characteristics:

  • Elimination of Transitive Dependencies: Nonkey attributes shouldn't depend on other nonkey attributes.
  • Logical Grouping: Data is logically grouped into domains based on its function or meaning.
  • Entity-Relationship Diagrams (ERDs): Commonly used to represent 3NF domain models, showing how entities relate to each other.
Table name Description Relationship keys
Access Policy Provisioning State Information about Provisioning state is stored in this table. ProvisioningStateId
Access Policy Resource Type Access porlich resource information is stored in this table. ResourceTypeId
Access Policy Set Overall information about access policy, policy use case details, and where the policy applied one, etc. are stored in this table. AccessPolicySetId, ResourceTypeId, ProvisioningStateId
Classification Data asset classification information is stored in this table. ClassificationId
Custom Access Use Case Access use case information is stored in this table. AccessPolicySetId
Data Asset Data asset name, description, and source information is stored in this table. DataAssetId, AccountId, DataAssetTypeId
Data Asset Column Data asset column name, column description, and references are stored in this table. DataAssetId, ColumnId, DataAssetTypeId, DataTypeId, AccountId
Data Asset Column Classification Assignment Data classification assignment related reference keys are stored in this table. DataAssetId, ColumnId, ClassificationId
Data Asset Domain Assignment Data asset governance domain assignment related information are available in this table. DataAssetId, BusinessDomainId
Data Asset Owner Data asset owner information stored in this table. DataAssetOwnerId
Data Asset Owner Assignment Data asset owner assignment information is stored in this table. DataAssetId, DataAssetOwnerId
Data Asset Type Data Type Data asset type information is stored in this table. DataTypeId, DataAssetTypeId
Data Product Data product name, description, use cases, status, and other relevant information stored in this table. DataProductId, DataProductTypeId, AccountId
Data Product Asset Assignment Data product and data asset assignment information are stored in this table. DataProductId, DataAssetId
Data Product Governance Domain Assignment Data product and governance domain assignment information are stored in this table. DataProductId, BusinessDomainId
Data Product Documentation Data product documentation reference information is stored in this table. DataProductId, DocumentationId
Data Product Owner Data product owner information is stored in this table. DataProductId, DataProductOwnerId
Data Product Status Data product status (like published or draft) related information stored in this table. DataProductStatusId
Data Product Terms Of Use Data product usage terms information is stored in this table. DataProductId, TermOfUsedId, DataAssetId
Data Product Type The information about data product types - Master, Reference, Operational, etc. are stored in this table. DataProductTypeId
Data Product Update Frequency The information about how often this data product's data are updated is stored in this table. UpdateFrequencyId
Data Quality Asset Rule Execution Data quality scanning results stored in this table RuleId, DataAssetId, JobExecutionId
Data Quality Job Execution Data quality job execution status is stored in this table. JobExecutionId
Data Quality Rule Information about data quality rules is stored in this table. RuleId, RuleTypeId, BusinessDomainId, DataProductId, DataAssetId
Data Quality Rule Column Execution Information about data quality rules pass and fail count, data quality score in columns level, and including data quality job execution details are stored in this table. RuleId, DataAssetId, ColumnId
Data Quality Rule Type Data quality rule type and associated dimensions are stored in this table. RuleTypeId
Data Subscription Request Information about data subscribers, policies applied, subscription request status, and other relevant information stored in this table. SubscriberRequestId
Glossary Term Information about glossary term, description, and overall status of the glossary term are stored in this table. GlossaryTermId, ParentGlossaryTermId, AccountId
Glossary Term Governance Domain Assignment Information about glossary term governance domain assignment and statuses are stored in this table. GlossaryTermId, BusinessDomainId
Glossary Term Data Product Assignment Information about glossary term data product assignment is stored in this table. GlossaryTermId, DataProductId
Policy Set Approver Policy set and the approver information are stored in this table. SubscriberRequestId, AccessPolicySetId
Relationship Information about source type and target information are stored in this table. AccountId, SourceId, TargetId

Create a semantic model

A semantic model in the context of data and analytics refers to a structured representation of data that defines the meaning, relationships, and rules within a specific domain. It provides a layer of abstraction that helps users understand and interact with complex data by making it more intuitive and accessible, especially in the context of business intelligence (BI) and analytics platforms. A semantic model is always required before any reports can be built. Within the warehouse, a user can add warehouse objects - tables or views to their default Power BI semantic model. They can also add other semantic modeling properties, such as hierarchies and descriptions. These properties are then used to create the Power BI semantic model's tables. Users can also remove objects from the default Power BI semantic model.

To reate semantic model from the Microsoft Purview Data Governance metadata domain and dimension models:

  1. Open a warehouse in your Fabric workspace.

  2. Publish the domains and dimensions models files to delta tables

    1. Select the ellipsis button (...)
    2. Select **Load to Tables > New table
    3. New table name prepopulated
    4. Select File type parquet

    Screenshot that shows how to publish to delta table from parquet manually.

  3. You can also use shortcut to shortcut the domain and dimensions model from OneLake and within OneLake

    1. Select the ellipsis button (...) of Tables
    2. Select New Shortcut and select Microsoft OneLake from New Shortcut sources page.
    3. Select domains and dimensions models table to shortcut

    Screenshot that shows how to publish to delta table via shortcut.

After you've published all files to delta tables either manually or via shortcut, you're able to add the delta tables to semantic model.

  1. Switch to SQL analytics endpoint page from the Lakehouse page.
  2. Select Reporting from top left corner of the SQL analytics endpoint page.
  3. Select Manage default semantic model.
  4. Select the tables from dbo > Tables that you want to add to the semantic model for reporting.

Screenshot that shows how to add table to semantic model manually.

To add objects such as tables or views to the default Power BI semantic model, select Automatically update semantic model.

Screenshot that shows the self-serve analytics semantic model.

Create a Power BI report

Power BI is natively integrated in the whole Fabric experience. This native integration includes a unique mode, called DirectLake, for accessing the data from the lakehouse to provide the most performant query and reporting experience. DirectLake is a groundbreaking new capability that allows you to analyze very large semantic models in Power BI. With DirectLake, you load parquet-formatted files directly from a data lake without needing to query a data warehouse or lakehouse endpoint, and without needing to import or duplicate data into a Power BI semantic model. DirectLake is a fast path to load the data from the data lake straight into the Power BI engine, ready for analysis.

In traditional DirectQuery mode, the Power BI engine directly queries the data from the source for each query execution, and the query performance depends on the data retrieval speed. DirectQuery eliminates the need to copy data, ensuring that any changes in the source are immediately reflected in query results.

For more details follow the guideline: how to create a Power BI report in Microsoft Fabric.

Important

  • Default refreshing cycle is every 24 hours.
  • Purview MSI need contributor access to your Fabric workspace.
  • Fabric OneLake support is in Public Preview. You can subscribe purview metadata from any supported Azure region to publish into your Fabric workspace for self-service analytics.
  • ADLSg2 support is in private preview. To subscribe to Purview metadata for ADLSg2 storage, please contact your Microsoft account team or Customer Support team to allowlist your tenant. You will need to provide the following information for allowlisting: Tenant ID, Organization name, Purview Account name, Purview Account ID, Azure Region, and Azure Subscription ID.

Coming soon

  • ADLSg2 support is currently in private preview, public preview is planned for November 30, 2024.
  • Scheduling the data refreshing job is not yet supported. Tentative ETA for this feature is 30th November.