Pseudonymizer: Protect Your Sensitive Data

The Pseudonymizer is an on-device personal data filter that detects sensitive information in what you type—names, emails, phone numbers, addresses, IDs, organizations, dates, ages, and more—and swaps them with realistic fakes before any of it reaches the AI model. The model never sees your real values. Substitutions are unmasked locally before tool calls run, so your output stays accurate.

How to Use It

Click the shield icon next to the send button in the composer.

  • Gray: Off
  • Blue: Personal Data Filter (standard privacy mode)
  • Teal: PHI / Limited Data Set (HIPAA-grade filtering)
  • Emerald: PHI / Safe Harbor (strict de-identification for shared datasets)

Or go to Settings > Privacy > Pseudonymizer to configure your default mode.

What You See

While the Pseudonymizer is on, a thin colored ring around the composer matches the active privacy mode. Under each message you send, a chip shows "🛡 pseudonymized — N substitutions" with a link to view the real → fake pairs and their categories.

Example:

🛡 pseudonymized — 3 substitutions

Click the chip to expand the full list and see which replacements were made.

Privacy Modes

Personal Data Filter (Blue)

Standard on-device filtering for everyday privacy. Detects and replaces:

  • Names (first, last, full)
  • Email addresses
  • Phone numbers
  • Physical addresses (street, city, state)
  • ID numbers (SSN, driver's license, passport, etc.)
  • Organization names and legal entities
  • Dates (specific dates keep the year)
  • Ages

Best for: General conversations, brainstorming, sharing context without exposing personal details.

PHI Limited Data Set (Teal)

HIPAA-compliant filtering for healthcare and medical discussions. Replaces the 16 HIPAA-protected categories:

  • Individual names
  • Geographic subdivisions smaller than state (city, neighborhood, ZIP codes more specific than first 3 digits)
  • All dates except year (birth year, admission year, etc.)
  • Phone numbers, email addresses, fax numbers
  • Medical record numbers, health insurance numbers, account numbers
  • License plate numbers
  • Vehicle identifiers
  • Device serial numbers
  • URLs and IP addresses
  • Biometric identifiers
  • Photos and images (masked)
  • Any unique identifiers or codes

Best for: Working with healthcare data, patient records, medical research, or any HIPAA-regulated content that you want to keep private but still use with the AI.

PHI Safe Harbor (Emerald)

Strict de-identification under HIPAA's Safe Harbor rule (§164.514(b)(2)). This removes all identifiers that could reasonably identify an individual or their household members:

  • Everything in Limited Data Set, plus:
  • All dates except year (birth dates, admission dates, etc. — only the year remains)
  • Full geographic addresses (city, state, ZIP — only the state remains)
  • Ages over 89 (shown as "89+")

Best for: Preparing data for sharing with researchers, creating de-identified datasets for analysis, or complying with strict data-sharing agreements that don't require a Data Use Agreement.

Multilingual Substitutions

The Pseudonymizer generates culturally and linguistically appropriate substitutes:

  • Spanish names stay Spanish-sounding
  • Japanese names stay Japanese-sounding
  • Arabic names stay Arabic-sounding
  • Organization names follow locale conventions (Inc., LLC, GmbH, S.A., etc.)
  • City names are geographically plausible

Handles complex cases: mixed-script names, CJK names with middle initials, honorifics (Dr., Mr., Ms., Prof., etc.), and organization legal suffixes.

Hallucination Inspector

The AI sometimes invents realistic-sounding names that don't match any substitution in your session. The Pseudonymizer detects these and flags them with a warning: "The assistant mentioned names that weren't in your original message. These might be made up. Check them before using."

This helps you catch cases where the model generated new fake names when it should have reused your substitutions.

How It Works Behind the Scenes

  1. Detection: When you hit send, the Pseudonymizer scans your message for sensitive patterns (regex + ML-backed detection).
  2. Generation: For each detected value, it generates a culturally appropriate fake in the same category.
  3. Substitution: Your message gets rewritten with the fakes before sending to the model.
  4. Storage: The real → fake mapping stays local in your session. Only the pseudonymized message is sent.
  5. Unmasking: When tools run (web search, API calls, file operations), the real values are restored so your output stays accurate and usable.
  6. Audit: You see the substitution log under each message.

Limitations

First-time use: The first time you enable the Pseudonymizer, it downloads a local detector model (a few MB). Progress is shown while it downloads.

iOS progress logging: On iOS, the model download shows a heartbeat progress log so you know it's working.

Perplexity searches: The Pseudonymizer refuses to leak real values to Perplexity AI. If a Perplexity search would contain pseudonymized identifiers (which could reveal the substitution mapping), the search is blocked with a clear explanation: "Pseudonymizer blocks this search to prevent leaking real values. Try rephrasing without sensitive details, or disable the Pseudonymizer for this message."

Manual review: The Pseudonymizer catches most PII patterns, but no detector is 100% accurate. For highly sensitive data, always review the message before sending.

Availability

The Pseudonymizer is free and works on all platforms: browser extension, desktop app (macOS, Windows, Linux), and mobile apps (iOS, Android).

See Also

  • Privacy & Data — How data is stored and handled
  • Platform & Setup — Pseudonymizer on every platform
  • Settings > Privacy — Configure Pseudonymizer defaults

This guide is maintained by the Caiioo team using Slate, our built-in editor.