Secure by default with Microsoft Purview and protect against oversharing - Phase 3
This guide is divided into four phases:
- Introduction
- Phase 1: Foundational – Start with default labeling
- Phase 2: Managed – Address files with highest sensitivity
- Phase 3: Optimized – Expand to your entire Microsoft 365 data estate (this page)
- Phase 4: Strategic – Operate, expand, and retroactive actions
In the previous phases, we laid out the security foundation and discussed the priority sites. We covered both client-side and service-side auto-labeling functionalities. For a comparison table, consult: Automatically apply a sensitivity label in Microsoft 365.
Phase 3: Optimized - Expand to your entire Microsoft 365 data estate
In this phase, we explain options to help iteratively address all your Microsoft 365 data estate.
Before, we recommended initial policies to familiarize users. In this phase, we're ready to use them progressively in scenarios. Auto-labeling is best for scenarios where you need higher sensitivity than your default label.
We also discuss how to retroactively label existing sites and set default library labels.
Auto-label sensitive files on clients (low thresholds)
Client-side auto-labeling provides the opportunity for users to decide on applying a recommended label, or to report a false positive. It can be done with the 300+ Sensitive Information Types (SITs) available and Trainable classifiers.
At a high level, we recommend the following approach. The thresholds are provided as examples only.
- Identify the relevant SIT for your industry.
- Recommend a label with lower SIT thresholds (1-9).
- Automatically apply a label with higher thresholds (10+) and/or Trainable classifiers.
Your client default label affects your auto-labeling strategy. While this guide recommends setting this to Confidential\All employees, we also provide alternatives when the Office client defaults to General, and then to Confidential\All employees when saved in SharePoint.
Tip
If your default is set to Confidential\All employees, your auto-labeling strategy is less complex and focused for Highly Confidential labels.
You can progressively deploy this with more SITs/trainable classifiers over time as you identify more business scenarios. With defaults and client-side auto-labeling, you're now addressing all new and updated content.
Simulate auto-labeling sensitive files at rest
Service-side auto-labeling labels files at rest in SharePoint and OneDrive, and provides more conditions. We currently support auto-labeling up to 100k files per day in your organization.
Tip
Learn more about auto-labeling with Playbook - Service Side Auto-labeling
While client-side auto-labeling is limited to sensitive content, service-side auto-labeling adds support for contextual conditions such as:
- Content is shared
- File extension is
- Document name contains words or phrases
- Document property is
- Document size equals or is greater than
- Document created by
These conditions, combined with selecting specific sites and/or user’s OneDrive, allows your organizations to prioritize which content to label first.
For example, if your organization uses templates with document properties or document name prefixes, you can run a policy across all SharePoint sites and OneDrive. You could also prioritize based on file size or documents created by your leadership teams.
You can finalize labeling all documents by using Office/PDF file extensions in batches of SharePoint sites, and set to match their respective site’s label, starting with higher sensitivity sites, progressively catching General sites.
Finally, you can implement more service-side auto-labeling for Highly Confidential content, often with higher thresholds than used in client-side auto-labeling to reduce potential false positives.
Reduce false positives with advanced classifiers
In this section, we cover the basis of advanced classifiers and when to use them.
In context of this secure by default blueprint, we focused the use of classifiers with auto-labeling for highly confidential content, where advanced classifiers are limited to trainable classifiers. In most cases, Sensitive Information Types (SITs) are a mix of patterns and keywords. Templates such as Protected Health Information (PHI) and Personally Identifiable Information (PII) can return many false positive as they aren’t able to determine context or can be false positives for your organization.
Purview Administrators can reduce false positives by:
- Increase required confidence and/or threshold counts.
- Looking for multiple SITs with AND instead of OR operator.
- Clone a SIT into a custom SIT and fine tune the requirements.
- Use multiple Regex expressions instead of a single but wide-ranging one.
- Force word matching.
- Use trainable classifiers, exact data match (EDM), and document fingerprinting.
Tip
Learn more about these options here: Tips and tricks for maximizing accuracy and reducing false positive detections in MIP and DLP
Trainable classifiers use machine learning to identify document patterns. Microsoft Purview provides several pretrained classifiers such as legal documents, strategic business documents, and financial information. Custom classifiers can also be created and trained from a SharePoint document library.
By using both SITs and trainable classifiers, you can narrow down your scope – for example, contains credit cards SITs and Financial information trainable classifier.
Exact data match and document fingerprinting aren't currently available to auto-labeling but should be considered in your overall Microsoft Purview Data Loss Prevention (DLP) strategy. Similar to trainable classifiers, they can both help reduce false positives. With EDM, you can, for example, find contains SSN out of the box SIT, and then verify against your EDM SIT to verify it’s an SSN from one of your customers or employees. EDM allows you to securely store a hash of information to look for.
Document Fingerprinting operates differently than Trainable Classifiers by identifying document templates and using them in DLP policies. This is most useful if your organization has standardized templates. You can use these templates to create precise fingerprinting.
Automate and improve Microsoft 365 protection to historical and in use data
In the final step of this phase, we review options to retroactively apply labels on your existing SharePoint sites and apply default library labels accordingly.
At this point, we have configured defaults throughout the environment and stopped the proliferation of unlabeled sites and documents. We started addressing labeling sites and libraries manually on priority sites and we're looking at scaling this throughout your complete Microsoft 365 content estate.
There are a few strategies to consider:
- Use Site Owners – Communicate to site owners that they must configure a label on their site and default library. If you intend to use #2, include mentions that it will automatically receive a new default at a target date.
- Run automation scripts on remaining unlabeled sites – Use the Graph API to identify unlabeled sites and configure the container label and default library label to "Confidential\All employees"
- Optionally, prevent sharing of unlabeled files only – With previous measures such as DLP on unlabeled content and file auto-labeling, you can choose to let sites expire naturally over scripting retroactive actions for all sites.
- Capture a timeline of unlabeled sites – If you're planning to use service-side auto-labeling for all your historical data based on container labels, capture when container labels are added and progressively add newly labeled sites in your auto-labeling policies.
Your risk posture defines how to best approach between all strategies, or possibly use them progressively. While we recommend securing all your data estate, it can be a complex task depending on its size. Start small and iterate often.
Scripting Sensitivity Labels to SharePoint sites can be done with 'Set-PnPTenantSite' and the 'SensitivityLabel' parameter.
For Default Library Label, it requires setting the 'DefaultSensitivityLabelForLibrary' parameter using REST API on a library. A sample is provided in this article.
Phase 3 - Summary
- Automatically apply a sensitivity label in Microsoft 365
- Minimum versions for sensitivity labels in Microsoft 365 Apps
- Manage sensitivity labels in Office apps
- Automatically apply a sensitivity label in Microsoft 365
- Tips and tricks for maximizing accuracy and reducing false positive detections in MIP and DLP
See also
- Playbook - Service Side Auto-labeling
- Customize a built-in sensitive information type
- Learn about trainable classifiers
- Learn about exact data match based sensitive information types
- About document fingerprinting
- Trainable classifier definitions
- Sensitive information type entity definitions
- Sensitive information type limits
- Set-PnPTenantSite | PnP PowerShell
- "Default" label for a document library - Script sample
- Configure a default sensitivity label for a SharePoint document library
Continue to Phase 4: Strategic – Operate, expand, and retroactive actions