Image Anonymization
Available on: Basic, Pro, and Business plans
Last Updated: February 2026
Overview#
Image Anonymization lets you upload photos, screenshots, or scanned documents and automatically detect and redact personal information (PII) found in the image text. The feature combines Tesseract OCR for text extraction with Microsoft Presidio for PII detection, mapping detected entities back to pixel coordinates for precise redaction.
How It Works#
The image anonymization pipeline has four stages:
1. Upload Image#
Upload your image in any supported format (JPEG, PNG, TIFF, BMP, WebP, or GIF). The image is sent to the server for processing.
2. OCR Text Extraction#
Tesseract OCR scans the image and extracts all visible text. Each word is returned with:
- The extracted text
- Bounding box coordinates (pixel-level)
- Confidence score (0–100%)
- Language detection (48 languages supported)
3. PII Detection#
The extracted text is passed to Microsoft Presidio Analyzer, which uses NLP models to detect 25+ entity types including:
| Category | Entity Types |
|---|---|
| Personal | PERSON, DATE_OF_BIRTH, AGE |
| Contact | EMAIL_ADDRESS, PHONE_NUMBER, URL |
| Financial | CREDIT_CARD, IBAN_CODE, US_BANK_NUMBER |
| Identity | US_SSN, US_PASSPORT, UK_NHS, SG_NRIC_FIN |
| Location | LOCATION, IP_ADDRESS, US_DRIVER_LICENSE |
| Medical | MEDICAL_LICENSE, NRP |
| Other | CRYPTO, DATE_TIME, DOMAIN_NAME |
4. Redaction#
PII locations are mapped back to pixel coordinates on the original image. Detected PII regions are covered with solid-color rectangles. You can choose from 6 fill colors:
- Black (default)
- White
- Red
- Blue
- Green
- Yellow
The redacted image is returned in the same format as the original.
Supported Image Formats#
| Format | Extensions | Notes |
|---|---|---|
| JPEG | .jpg, .jpeg | Most common photo format |
| PNG | .png | Screenshots, graphics with transparency |
| TIFF | .tif, .tiff | Scanned documents, high quality |
| BMP | .bmp | Uncompressed bitmap images |
| WebP | .webp | Modern web image format |
| GIF | .gif | Static GIF images (first frame only) |
Maximum file size: Depends on your plan (Basic: 5 MB, Pro: 10 MB, Business: 20 MB)
API Usage#
Analyze Image (detect PII without redacting)#
POST /analyze-image
Content-Type: multipart/form-data
Parameters:
- image: (file) The image to analyze
- language: (string, optional) Language hint for OCR (default: "en")
- score_threshold: (float, optional) Minimum confidence score (default: 0.3)
Response:
{
"entities": [
{
"entity_type": "PERSON",
"text": "John Doe",
"score": 0.95,
"bbox": { "x": 120, "y": 45, "width": 180, "height": 30 }
}
],
"ocr_text": "Full extracted text...",
"image_dimensions": { "width": 800, "height": 600 }
}
Redact Image (detect and redact PII)#
POST /redact-image
Content-Type: multipart/form-data
Parameters:
- image: (file) The image to redact
- language: (string, optional) Language hint for OCR (default: "en")
- fill_color: (string, optional) Redaction color (default: "black")
- score_threshold: (float, optional) Minimum confidence score (default: 0.3)
- entities: (string[], optional) Specific entity types to redact
Response: The redacted image file (same format as input)
Processing Time#
Image anonymization typically takes 3–20 seconds depending on:
- Image resolution and file size
- Amount of text in the image
- Number of PII entities detected
- Server load
Higher resolution images with more text content will take longer to process.
Limitations#
-
Text-only PII detection: The system detects PII in text extracted via OCR. It does not detect faces, license plates, or other visual PII.
-
OCR quality depends on image quality: Blurry, low-resolution, or heavily stylized text may not be extracted accurately. For best results:
- Use images with at least 150 DPI
- Ensure text is clearly legible
- Avoid heavy background patterns behind text
-
Handwritten text: OCR accuracy is significantly lower for handwritten text compared to printed or digital text.
-
Multi-column layouts: Complex document layouts (newspapers, multi-column PDFs saved as images) may affect OCR accuracy.
-
Token cost: Image anonymization costs tokens based on the amount of text extracted and entities detected, similar to text anonymization.
Best Practices#
- Use high-quality images: Higher resolution images produce better OCR results and more accurate PII detection.
- Crop to relevant areas: If only part of the image contains sensitive text, crop it first to reduce processing time and improve accuracy.
- Check results: Always review the redacted output to ensure all PII was detected, especially for unusual formats or low-quality scans.
- Choose appropriate fill colors: Use a fill color that contrasts with the background to make redactions clearly visible.
- Set appropriate thresholds: Lower the confidence threshold (e.g., 0.2) to catch more potential PII at the cost of more false positives.
FAQ#
Can I redact specific entity types only?#
Yes. Use the entities parameter in the API to specify which entity types to redact (e.g., only PERSON and EMAIL_ADDRESS).
Does image anonymization work offline?#
No. Image anonymization requires server-side processing with Tesseract OCR and Presidio. The Desktop App does not currently support image anonymization.
What happens to my original image?#
Your original image is processed in memory and never stored on our servers. Only the redacted output is returned to you. All processing follows our zero-knowledge principles.
Can I undo a redaction?#
No. Image redaction is permanent and irreversible. The original pixel data under redaction rectangles is destroyed. Always keep a backup of your original image.
How many tokens does image anonymization cost?#
Token costs depend on the amount of text extracted from the image and the number of PII entities detected. A typical document scan costs 5–15 tokens.