Here's something that trips up even experienced security teams: you can't protect what you haven't classified. It sounds obvious, but look around most organizations and you'll find sensitive data scattered across shared drives, cloud buckets, and Slack channels with zero labeling, zero handling rules, and zero accountability. Everyone assumes someone else took care of it.
Data classification is the foundation that every other security control sits on top of. Access controls, encryption policies, retention schedules, incident response playbooks - all of them depend on knowing what kind of data you're dealing with. And yet, it's the step that gets skipped more than any other. Let's fix that.
Why Now? The Mid-Year Compliance Wake-Up Call
If you're reading this in June, you're sitting at a natural checkpoint. GDPR turned eight years old on May 25th. Half the year is behind you. If your organization hasn't done a data inventory or classification review yet this year, this is the perfect time to get it done before the back half gets hectic.
Beyond the calendar, the regulatory landscape keeps getting more specific about how organizations handle different types of data. GDPR, CCPA, HIPAA, PCI DSS - they all assume you know what data you have and where it lives. Classification is how you build that knowledge.
The 4-Tier Classification System
You don't need a complicated framework with a dozen categories. Four tiers cover the vast majority of use cases. Here's the system I recommend to every client.
Tier 1: Public
Data that's intentionally available to anyone. Your marketing website, published blog posts, open-source code, press releases. If it leaked, nobody would care because it was already out there on purpose.
- No access restrictions required
- No encryption required at rest (still use HTTPS in transit)
- Retain as long as it's useful, delete when it's not
Tier 2: Internal
Data meant for employees and authorized contractors, but not the general public. Think internal memos, project plans, meeting notes, org charts, non-sensitive business documents. A leak would be embarrassing or mildly disruptive, but not catastrophic.
- Access limited to authenticated employees and approved contractors
- Encryption at rest recommended but not always mandatory
- Standard backup and retention policies apply
- Share via approved internal platforms only
Tier 3: Confidential
This is where it gets serious. Customer PII, financial records, employee HR data, proprietary business strategies, source code for commercial products. A breach at this level triggers regulatory notification requirements and real business damage.
- Access restricted to specific roles with documented business need
- Encryption required at rest and in transit
- Audit logging on all access
- No sharing via email without encryption, no personal devices
- Defined retention periods with secure deletion procedures
Tier 4: Restricted
The highest sensitivity level. Encryption keys, authentication secrets, payment card data, health records under HIPAA, data subject to legal hold, and anything where unauthorized access could result in severe financial, legal, or safety consequences.
- Access limited to named individuals with explicit approval
- Strong encryption required everywhere, no exceptions
- Multi-factor authentication required for access
- Real-time monitoring and alerting on access
- Strict retention limits with cryptographic deletion
- Regular access reviews (quarterly at minimum)
"The goal isn't to classify everything as Restricted. The goal is to know what actually needs that level of protection so you can focus your resources where they matter."
How to Inventory Your Data Assets
Before you can classify anything, you need to know what you have. This is where most teams stall out because the task feels overwhelming. Here's how to make it manageable.
Start with Data Sources, Not Individual Files
Don't try to catalog every file in your organization. Instead, identify the systems and repositories where data lives:
- Databases (production, staging, analytics)
- Cloud storage (S3 buckets, Azure Blob, Google Cloud Storage)
- SaaS applications (CRM, HRIS, accounting software)
- File shares and collaboration tools (SharePoint, Google Drive, Confluence)
- Email systems and archives
- Code repositories
- Backup systems
- AI/ML training data stores and model registries
Interview Data Owners
Every system has someone who knows what's in it. Talk to them. Ask three questions: What data goes into this system? Where does it come from? Who accesses it? You'll learn more in a 15-minute conversation than in hours of automated scanning.
Document What You Find
For each data source, record the data types it contains, the approximate volume, who owns it, who has access, and where it flows to. A simple spreadsheet works fine for this. You're not building a data catalog product. You're building a working inventory.
Classifying Common Data Types
Once you have your inventory, it's time to assign classification levels. Here's how common data types typically map to the four tiers.
Customer PII (Names, Emails, Addresses)
Classification: Confidential. Any data that can identify a specific individual falls here. This includes names, email addresses, phone numbers, physical addresses, and any combination of data points that could identify someone. GDPR and CCPA both have specific requirements for this category.
Financial Records
Classification: Confidential to Restricted. Revenue figures, invoices, and general accounting data are typically Confidential. Payment card numbers, bank account details, and anything covered by PCI DSS move up to Restricted.
Source Code
Classification: Internal to Confidential. Open-source projects are Public by definition. Internal tooling and scripts are typically Internal. Proprietary product code that represents competitive advantage belongs at Confidential. Code containing embedded secrets or security-critical logic should be treated as Restricted.
AI Training Data
Classification: Varies widely. This is the one that catches people off guard. AI training data inherits the classification of its source material, and often should be classified higher. If you trained a model on customer support tickets containing PII, that training dataset is at minimum Confidential. If the model itself can reproduce sensitive training data through prompt extraction, the model weights may need classification too.
Employee HR Data
Classification: Confidential to Restricted. General employment records like job titles and department assignments are Confidential. Salary information, performance reviews, medical accommodations, and disciplinary records are Restricted.
Authentication Credentials
Classification: Restricted. Always. API keys, database passwords, encryption keys, service account tokens. No exceptions and no shortcuts here.
Mapping Tiers to Handling Rules
Classification only matters if each tier comes with specific, enforceable handling rules. Here's a practical mapping you can adapt for your organization.
Encryption
- Public: TLS in transit only
- Internal: TLS in transit, encryption at rest recommended
- Confidential: TLS in transit required, AES-256 at rest required
- Restricted: TLS 1.3 in transit, AES-256 at rest, envelope encryption with HSM-managed keys
Access Controls
- Public: No restrictions
- Internal: Authentication required, role-based access
- Confidential: Role-based access with documented justification, MFA recommended
- Restricted: Named-individual access lists, MFA required, just-in-time access where possible
Retention and Disposal
- Public: Retain as needed, standard deletion
- Internal: Defined retention periods, standard deletion
- Confidential: Defined retention periods, secure deletion with verification
- Restricted: Minimum retention periods, cryptographic erasure, deletion certificates
Sharing and Transfer
- Public: No restrictions
- Internal: Approved internal platforms only
- Confidential: Encrypted transfer only, recipient verification, no personal email
- Restricted: Encrypted transfer with prior approval, data loss prevention controls, audit trail required
"Handling rules without enforcement are just suggestions. And suggestions don't pass audits."
Exercise: Classify Your Top 20 Data Assets
Theory is great, but you need to actually do this. Here's a practical exercise you can run this week.
Step 1: List Your Top 20
From your data inventory, pick the 20 most important data assets. If you don't have an inventory yet, start with the obvious ones: your production database, your CRM, your code repository, your HR system, your financial records, your customer communication logs. Don't overthink the selection. You can always add more later.
Step 2: Build Your Classification Matrix
Create a simple table with these columns:
- Data Asset: What is it?
- Data Types: What kind of information does it contain?
- Current Location: Where does it live?
- Data Owner: Who is responsible for it?
- Classification Tier: Public, Internal, Confidential, or Restricted
- Current Handling: How is it being handled today?
- Required Handling: How should it be handled based on its classification?
- Gap: What needs to change?
Step 3: Fill In the Gaps
The "Gap" column is where the real value lives. For each asset, compare current handling against the required handling rules for its classification tier. Every gap is an action item. Prioritize by risk: Restricted data with gaps gets fixed first, then Confidential, and so on.
Step 4: Assign Owners and Deadlines
Every gap needs an owner and a timeline. "We should probably encrypt that database" is not a plan. "Sarah will enable TDE on the customer database by June 30th" is a plan. Be specific. Be accountable.
Common Mistakes to Avoid
Having done this with dozens of organizations, I see the same pitfalls come up again and again.
- Over-classifying everything: If every document is Confidential, nothing is. Your team will develop classification fatigue and start ignoring the labels entirely. Be honest about what's truly sensitive and what's just internal.
- Classifying once and forgetting: Data changes. A document that was Internal during planning might become Public after launch, or it might become Confidential once customer data gets added. Build review cycles into your process.
- Ignoring derived data: A report that aggregates Confidential data is itself Confidential. An AI model trained on Restricted data inherits that classification. Don't let derived assets fly under the radar.
- No enforcement mechanism: Classification labels without corresponding technical controls are decoration. If data is marked Confidential but anyone can access it without authentication, you have a classification policy, not a classification program.
Making Classification Stick
The organizations that succeed with data classification treat it as a living process, not a one-time project. Here's what that looks like in practice:
- Quarterly reviews: Every quarter, review your classification matrix. Has anything changed? New data sources? New regulations? Updated risk assessments?
- Onboarding integration: Every new employee should learn your classification tiers during their first week. Make it part of the onboarding checklist.
- Automated scanning: Use DLP tools to detect when classified data appears in unauthorized locations. You can't watch everything manually, so let the tools help.
- Annual policy updates: Review and update your classification policy at least once a year. The June GDPR anniversary is a natural anchor point for this.
Data classification isn't glamorous. Nobody gets excited about spreadsheets and labeling exercises. But every security control you care about depends on it. Encryption means nothing if you don't know what needs to be encrypted. Access controls are guesswork without classification tiers to map them to. Incident response is slower when you don't know what kind of data was exposed. Start with the exercise above. Classify your top 20 assets this week. It'll take a few hours, and it'll make every security decision you make for the rest of the year more informed and more effective.