Image Anonymization

Available on: Basic, Pro, and Business plans

Last Updated: February 2026

Overview#

Image Anonymization lets you upload photos, screenshots, or scanned documents and automatically detect and redact personal information (PII) found in the image text. The feature combines Tesseract OCR for text extraction with Microsoft Presidio for PII detection, mapping detected entities back to pixel coordinates for precise redaction.

How It Works#

The image anonymization pipeline has four stages:

1. Upload Image#

Upload your image in any supported format (JPEG, PNG, TIFF, BMP, WebP, or GIF). The image is sent to the server for processing.

2. OCR Text Extraction#

Tesseract OCR scans the image and extracts all visible text. Each word is returned with:

The extracted text
Bounding box coordinates (pixel-level)
Confidence score (0–100%)
Language detection (48 languages supported)

3. PII Detection#

The extracted text is passed to Microsoft Presidio Analyzer, which uses NLP models to detect 25+ entity types including:

Category	Entity Types
Personal	PERSON, DATE_OF_BIRTH, AGE
Contact	EMAIL_ADDRESS, PHONE_NUMBER, URL
Financial	CREDIT_CARD, IBAN_CODE, US_BANK_NUMBER
Identity	US_SSN, US_PASSPORT, UK_NHS, SG_NRIC_FIN
Location	LOCATION, IP_ADDRESS, US_DRIVER_LICENSE
Medical	MEDICAL_LICENSE, NRP
Other	CRYPTO, DATE_TIME, DOMAIN_NAME

4. Redaction#

PII locations are mapped back to pixel coordinates on the original image. Detected PII regions are covered with solid-color rectangles. You can choose from 6 fill colors:

Black (default)
White
Red
Blue
Green
Yellow

The redacted image is returned in the same format as the original.

Supported Image Formats#

Format	Extensions	Notes
JPEG	`.jpg`, `.jpeg`	Most common photo format
PNG	`.png`	Screenshots, graphics with transparency
TIFF	`.tif`, `.tiff`	Scanned documents, high quality
BMP	`.bmp`	Uncompressed bitmap images
WebP	`.webp`	Modern web image format
GIF	`.gif`	Static GIF images (first frame only)

Maximum file size: Depends on your plan (Basic: 5 MB, Pro: 10 MB, Business: 20 MB)

API Usage#

Analyze Image (detect PII without redacting)#

POST /analyze-image
Content-Type: multipart/form-data

Parameters:
  - image: (file) The image to analyze
  - language: (string, optional) Language hint for OCR (default: "en")
  - score_threshold: (float, optional) Minimum confidence score (default: 0.3)

Response:

{
  "entities": [
    {
      "entity_type": "PERSON",
      "text": "John Doe",
      "score": 0.95,
      "bbox": { "x": 120, "y": 45, "width": 180, "height": 30 }
    }
  ],
  "ocr_text": "Full extracted text...",
  "image_dimensions": { "width": 800, "height": 600 }
}

Redact Image (detect and redact PII)#

POST /redact-image
Content-Type: multipart/form-data

Parameters:
  - image: (file) The image to redact
  - language: (string, optional) Language hint for OCR (default: "en")
  - fill_color: (string, optional) Redaction color (default: "black")
  - score_threshold: (float, optional) Minimum confidence score (default: 0.3)
  - entities: (string[], optional) Specific entity types to redact

Response: The redacted image file (same format as input)

Processing Time#

Image anonymization typically takes 3–20 seconds depending on:

Image resolution and file size
Amount of text in the image
Number of PII entities detected
Server load

Higher resolution images with more text content will take longer to process.

Limitations#

Text-only PII detection: The system detects PII in text extracted via OCR. It does not detect faces, license plates, or other visual PII.
OCR quality depends on image quality: Blurry, low-resolution, or heavily stylized text may not be extracted accurately. For best results:
- Use images with at least 150 DPI
- Ensure text is clearly legible
- Avoid heavy background patterns behind text
Handwritten text: OCR accuracy is significantly lower for handwritten text compared to printed or digital text.
Multi-column layouts: Complex document layouts (newspapers, multi-column PDFs saved as images) may affect OCR accuracy.
Token cost: Image anonymization costs tokens based on the amount of text extracted and entities detected, similar to text anonymization.

Best Practices#

Use high-quality images: Higher resolution images produce better OCR results and more accurate PII detection.
Crop to relevant areas: If only part of the image contains sensitive text, crop it first to reduce processing time and improve accuracy.
Check results: Always review the redacted output to ensure all PII was detected, especially for unusual formats or low-quality scans.
Choose appropriate fill colors: Use a fill color that contrasts with the background to make redactions clearly visible.
Set appropriate thresholds: Lower the confidence threshold (e.g., 0.2) to catch more potential PII at the cost of more false positives.

FAQ#

Can I redact specific entity types only?#

Yes. Use the entities parameter in the API to specify which entity types to redact (e.g., only PERSON and EMAIL_ADDRESS).

Does image anonymization work offline?#

No. Image anonymization requires server-side processing with Tesseract OCR and Presidio. The Desktop App does not currently support image anonymization.

What happens to my original image?#

Your original image is processed in memory and never stored on our servers. Only the redacted output is returned to you. All processing follows our zero-knowledge principles.

Can I undo a redaction?#

No. Image redaction is permanent and irreversible. The original pixel data under redaction rectangles is destroyed. Always keep a backup of your original image.

How many tokens does image anonymization cost?#

Token costs depend on the amount of text extracted from the image and the number of PII entities detected. A typical document scan costs 5–15 tokens.