Overview

SCU Cost Estimate

This agent typically consumes 0,01 - 1 SCUs per analysis run, depending on the volume of classification data and the depth of analysis mode selected. Deep analysis mode may use more SCUs.

Introduction

Classification Optimizer helps you get the most out of your Microsoft Purview data classification. If you've ever wondered why certain sensitive information keeps slipping through, or why you're drowning in false positives, this agent is for you. It analyzes how your Sensitive Information Types (SITs) are actually performing in the real world, finds patterns you didn't know existed, and tells you exactly how to improve your classification accuracy.

What It Does

  • Analyzes real-world SIT performance based on actual detection data, not just theory

  • Finds co-occurrence patterns showing which sensitive types appear together

  • Identifies classification gaps where important data isn't being detected

  • Spots redundant classifiers that overlap and create unnecessary complexity

  • Recommends new composite SITs based on patterns that consistently appear together

  • Suggests parameter tuning to reduce false positives and improve accuracy

  • Identifies trainable classifier candidates for complex, context-dependent patterns

  • Provides prioritized recommendations with statistical backing (support, lift, confidence)

Use Cases

1. Reducing False Positives

Your DLP policies are firing constantly, but half the alerts aren't real issues. Users are getting frustrated with blocking that doesn't make sense. Classification Optimizer analyzes which SITs are causing problems and recommends specific parameter adjustments (confidence levels, instance counts, thresholds) to improve precision without sacrificing protection.

2. Improving Detection Accuracy

Important sensitive data is getting through your policies. You suspect your classifiers aren't catching everything they should. The agent identifies coverage gaps, analyzes detection patterns, and recommends new SIT combinations or trainable classifiers to catch what you're currently missing.

3. Simplifying Complex Classification Schemes

Over time, you've accumulated dozens or hundreds of SITs, and nobody knows which ones are actually valuable anymore. Classification Optimizer shows you which classifiers are redundant, which consistently appear together (and should be combined), and which aren't detecting anything useful. Finally clean up that classifier sprawl.

4. Building Better Composite Classifiers

You know certain types of sensitive data tend to appear together (like passport numbers with birth dates), but creating the right composite SITs manually is guesswork. The agent analyzes co-occurrence patterns with statistical metrics, then recommends exactly which SITs should be combined and with what parameters.

5. Meeting Regulatory Requirements More Effectively

Compliance frameworks require specific data protections, but your current SITs aren't aligned with those requirements. Classification Optimizer identifies strategic gaps tied to regulatory needs and recommends new classifiers or trainable classifiers to close those gaps.

Why Classification Optimizer?

The Problem You're Dealing With
How This Helps

False positive overload: DLP alerts everywhere, but most aren't real issues

Precision tuning: Specific parameter recommendations to reduce noise while maintaining protection

Important data slipping through: Your policies aren't catching everything they should

Gap analysis: Identifies what's being missed and recommends new detection patterns

Classifier chaos: Too many SITs, unclear which ones matter

Usage analytics: Shows which classifiers are actually valuable vs redundant

Manual guesswork: Building composite SITs based on intuition instead of data

Pattern discovery: Statistical analysis reveals which SITs consistently co-occur

Time-consuming analysis: Manually reviewing classification effectiveness takes forever

Automated insights: Complete analysis with prioritized recommendations in minutes

No strategic direction: Unclear where to focus classification improvement efforts

Prioritized roadmap: Recommendations ranked by impact with supporting metrics

How It Works

What goes in:

  • Purview alert data and detection events from your specified time range (default 30 days)

  • Existing SIT configurations and detection patterns

  • Classification analytics and usage data

  • Security events showing actual SIT detections

  • SharePoint, Exchange, and file classification data

What it does:

  • Calculates baseline metrics for each SIT (detection frequency, distribution)

  • Builds a co-occurrence matrix showing which SITs appear together

  • Applies statistical analysis (support, lift, conditional probability)

  • Identifies patterns, gaps, and optimization opportunities

  • Generates recommendations with technical justification and priority ranking

What you get:

  • Baseline SIT performance metrics

  • Co-occurrence patterns with statistical significance

  • New composite SIT recommendations with suggested parameters

  • Parameter tuning guidance for existing SITs

  • Trainable classifier candidates for complex patterns

  • Policy optimization suggestions (scoping, rule tuning, groupings)

  • Prioritized action plan with expected impact

  • Debug output (optional) showing detailed analysis steps

Last updated

Was this helpful?