Purview: Service-Side Auto-Labeling for data using custom Sensitive Info Type

Sergio Londono 851 Reputation points
2025-04-18T17:59:23.01+00:00

Hello Team,

I am in the process to enable Purview Service-Side Auto-Labeling.

I am located in location with custom PII, There is not OOB sensitive info type for our local ID.

for sure, in SPO and ODfB we have sensitive data that contain this local ID that needs to be protected by Purview Service-Side Auto-labeling.

So, I just create one scenario that I would like you help me to confirm if old data will be protected by Service-Side Auto

Scenario:

  1. Jan 01 2020, It was created a 10 millions files in Sharepoint and OneDrive that contain the "LOCALID" which is PII for my location, from Jan 01 2025 to Dec 31 2025 there is not sensitive Info type able to detect this LOCALID.

2.In Jan 01 2025, an "data privacy officer" create a new custom "sensitive info type" to detect "LOCALID"

3.In March 01 2025, someone create file2 that contain "LOCALID" PII

4.In Apr 01 2025, The "data privacy officer" create a service-side Auto-labeling to apply the sensitivity label "Internal" to all data in sharepoint and OneDrive

Questions:
Will Data classification service will detect the data created between Jan 01 2025 to Dec 31 2025, will this data collected in Content Explorer?

Will service-side Auto-labeling apply the label "Internal" to files created between Jan 01 2025 to Dec 31 2025, the files are not being modified since 2024, meaning, the files are at REST without any modification?

If yes, how long it will take the Service-side Auto-labeling to protect the legacy data?

IF not, how can I force Data discovery service to re-scan the data between Jan 01 2025 to Dec 31 2025 to appear in the content explorer and be labeled by Service-side Auto-Labeling?

I believe this is an scenario very frequent, I need to protect legacy data in SPO, it is not feasible do it manually file by file.

Best regards,

Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,532 questions
{count} votes

Accepted answer
  1. Chandra Boorla 11,750 Reputation points Microsoft External Staff
    2025-04-21T16:37:52.03+00:00

    @Sergio Londono

    Thank you for calling that out, you are absolutely right.

    Toggling -NoCrawl:$false via PowerShell does not trigger a reindex if the site is already set to allow crawling. It's a common misconception, and I appreciate you highlighting that nuance.

    To ensure legacy content is picked up by the crawler and subsequently detected by the Data Classification Service, the recommended approach is to explicitly trigger a reindex using one of the following:

    Best Practice - Use SharePoint UI

    Go to Site Settings --> Search and Offline Availability --> Click Reindex Site

    This action guarantees a full reprocessing of the site's content, independent of existing crawl settings.

    For Automation - Use PnP PowerShell

    Connect-PnPOnline -Url <SiteURL> -Interactive
    Request-PnPReindexWeb -Force
    

    This is a more reliable method than toggling crawl settings and works well for scripted or automated workflows.

    Summary

    Best Practice – Use the SharePoint UI Reindex Site button for simplicity and reliability.

    Automation – Use Request-PnPReindexWeb if you need to integrate reindexing into a script or pipeline.

    By reindexing via either method, legacy content will be rescanned, detected by Purview’s Data Classification Service, and labeled according to your service-side auto-labeling policies.

    Thanks again for surfacing this - it's an important distinction that helps clarify how reindexing is actually triggered.

    I hope this information helps. Please do let us know if you have any further queries.

    Thank you.


1 additional answer

Sort by: Most helpful
  1. Sergio Londono 851 Reputation points
    2025-04-19T17:14:14.5466667+00:00

    Hello @Chandra Boorla

    Crawling is allowed by default.

    1. Content in the site is included in:
      • Microsoft Search SharePoint Search
        Copilot for Microsoft 365 (if licensed and enabled)
        

    if the -NoCrawl is already false and then I do "Set-SPOSite -Identity <SiteURL> -NoCrawl:$false" This won't trigger reindex the site

    I found this information related reindex a sharepoint site:

    https://learn.microsoft.com/en-us/sharepoint/crawl-site-content

    1. On the site, select Settings , and then select Site settings. If you don't see Site settings, select Site information, and then select View all site settings.
    2. Under Search, select Search and offline availability.
    3. In the Reindex site section, select Reindex site.

    User's image

    can you please confirm which approach is better from your point of view?

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.