Secure by default with Microsoft Purview and protect against oversharing - Phase 3

Article
09/20/2024

This guide is divided into four phases:

Introduction
Phase 1: Foundational – Start with default labeling
Phase 2: Managed – Address files with highest sensitivity
Phase 3: Optimized – Expand to your entire Microsoft 365 data estate (this page)
Phase 4: Strategic – Operate, expand, and retroactive actions

Secure by default with Microsoft Purview and protect against oversharing - Blueprint

In the previous phases, we laid out the security foundation and discussed the priority sites. We covered both client-side and service-side auto-labeling functionalities. For a comparison table, consult: Automatically apply a sensitivity label in Microsoft 365.

Phase 3: Optimized - Expand to your entire Microsoft 365 data estate

In this phase, we explain options to help iteratively address all your Microsoft 365 data estate.

Before, we recommended initial policies to familiarize users. In this phase, we're ready to use them progressively in scenarios. Auto-labeling is best for scenarios where you need higher sensitivity than your default label.

We also discuss how to retroactively label existing sites and set default library labels.

Auto-label sensitive files on clients (low thresholds)

Client-side auto-labeling provides the opportunity for users to decide on applying a recommended label, or to report a false positive. It can be done with the 300+ Sensitive Information Types (SITs) available and Trainable classifiers.

At a high level, we recommend the following approach. The thresholds are provided as examples only.

Identify the relevant SIT for your industry.
Recommend a label with lower SIT thresholds (1-9).
Automatically apply a label with higher thresholds (10+) and/or Trainable classifiers.

Your client default label affects your auto-labeling strategy. While this guide recommends setting this to Confidential\All employees, we also provide alternatives when the Office client defaults to General, and then to Confidential\All employees when saved in SharePoint.

Tip

If your default is set to Confidential\All employees, your auto-labeling strategy is less complex and focused for Highly Confidential labels.

You can progressively deploy this with more SITs/trainable classifiers over time as you identify more business scenarios. With defaults and client-side auto-labeling, you're now addressing all new and updated content.

Simulate auto-labeling sensitive files at rest

Service-side auto-labeling labels files at rest in SharePoint and OneDrive, and provides more conditions. We currently support auto-labeling up to 100k files per day in your organization.

Tip

Learn more about auto-labeling with Playbook - Service Side Auto-labeling

While client-side auto-labeling is limited to sensitive content, service-side auto-labeling adds support for contextual conditions such as:

Content is shared
File extension is
Document name contains words or phrases
Document property is
Document size equals or is greater than
Document created by

These conditions, combined with selecting specific sites and/or user’s OneDrive, allows your organizations to prioritize which content to label first.

For example, if your organization uses templates with document properties or document name prefixes, you can run a policy across all SharePoint sites and OneDrive. You could also prioritize based on file size or documents created by your leadership teams.

You can finalize labeling all documents by using Office/PDF file extensions in batches of SharePoint sites, and set to match their respective site’s label, starting with higher sensitivity sites, progressively catching General sites.

Finally, you can implement more service-side auto-labeling for Highly Confidential content, often with higher thresholds than used in client-side auto-labeling to reduce potential false positives.

Reduce false positives with advanced classifiers

In this section, we cover the basis of advanced classifiers and when to use them.

In context of this secure by default blueprint, we focused the use of classifiers with auto-labeling for highly confidential content, where advanced classifiers are limited to trainable classifiers. In most cases, Sensitive Information Types (SITs) are a mix of patterns and keywords. Templates such as Protected Health Information (PHI) and Personally Identifiable Information (PII) can return many false positive as they aren’t able to determine context or can be false positives for your organization.

Purview Administrators can reduce false positives by:

Increase required confidence and/or threshold counts.
Looking for multiple SITs with AND instead of OR operator.
Clone a SIT into a custom SIT and fine tune the requirements.
Use multiple Regex expressions instead of a single but wide-ranging one.
Force word matching.
Use trainable classifiers, exact data match (EDM), and document fingerprinting.

Tip

Learn more about these options here: Tips and tricks for maximizing accuracy and reducing false positive detections in MIP and DLP

Trainable classifiers use machine learning to identify document patterns. Microsoft Purview provides several pretrained classifiers such as legal documents, strategic business documents, and financial information. Custom classifiers can also be created and trained from a SharePoint document library.

By using both SITs and trainable classifiers, you can narrow down your scope – for example, contains credit cards SITs and Financial information trainable classifier.

Exact data match and document fingerprinting aren't currently available to auto-labeling but should be considered in your overall Microsoft Purview Data Loss Prevention (DLP) strategy. Similar to trainable classifiers, they can both help reduce false positives. With EDM, you can, for example, find contains SSN out of the box SIT, and then verify against your EDM SIT to verify it’s an SSN from one of your customers or employees. EDM allows you to securely store a hash of information to look for.

Document Fingerprinting operates differently than Trainable Classifiers by identifying document templates and using them in DLP policies. This is most useful if your organization has standardized templates. You can use these templates to create precise fingerprinting.

Automate and improve Microsoft 365 protection to historical and in use data

In the final step of this phase, we review options to retroactively apply labels on your existing SharePoint sites and apply default library labels accordingly.

At this point, we have configured defaults throughout the environment and stopped the proliferation of unlabeled sites and documents. We started addressing labeling sites and libraries manually on priority sites and we're looking at scaling this throughout your complete Microsoft 365 content estate.

There are a few strategies to consider:

Use Site Owners – Communicate to site owners that they must configure a label on their site and default library. If you intend to use #2, include mentions that it will automatically receive a new default at a target date.
Run automation scripts on remaining unlabeled sites – Use the Graph API to identify unlabeled sites and configure the container label and default library label to "Confidential\All employees"
Optionally, prevent sharing of unlabeled files only – With previous measures such as DLP on unlabeled content and file auto-labeling, you can choose to let sites expire naturally over scripting retroactive actions for all sites.
Capture a timeline of unlabeled sites – If you're planning to use service-side auto-labeling for all your historical data based on container labels, capture when container labels are added and progressively add newly labeled sites in your auto-labeling policies.

Your risk posture defines how to best approach between all strategies, or possibly use them progressively. While we recommend securing all your data estate, it can be a complex task depending on its size. Start small and iterate often.

Scripting Sensitivity Labels to SharePoint sites can be done with 'Set-PnPTenantSite' and the 'SensitivityLabel' parameter.

For Default Library Label, it requires setting the 'DefaultSensitivityLabelForLibrary' parameter using REST API on a library. A sample is provided in this article.

Share via

Secure by default with Microsoft Purview and protect against oversharing - Phase 3

Phase 3: Optimized - Expand to your entire Microsoft 365 data estate

Auto-label sensitive files on clients (low thresholds)

Simulate auto-labeling sensitive files at rest

Reduce false positives with advanced classifiers

Automate and improve Microsoft 365 protection to historical and in use data

Phase 3 - Summary

See also

Feedback

Additional resources