The challenge: Unlock medical data value while safeguarding privacy
To unlock the value in its terabytes of medical images and clinical notes, one of the world’s largest global medical technology companies had to anonymize and format all of its data while maintaining HIPAA compliance. That meant identifying and anonymizing all instances of PII and PHI on every image and text file.
However, identifying PII and PHI is difficult because sensitive information gets embedded in unpredictable ways (e.g., a physician’s handwritten notes about a patient, machine metadata, a half-visible hospital or manufacturer logo). Moreover, the sensitive and confidential nature of medical data prohibited the use of publicly available large language models (LLMs) to develop a solution.
The medical technology company needed a partner with expertise in machine learning (ML) data recognition models, optical character recognition (OCR) algorithms and custom AI models. They chose TELUS Digital.

Our approach: Custom ML models detect and blur sensitive information in medical images and text files
Our healthcare client needed an ML system sophisticated enough to identify and protect sensitive information while preserving the scientific value of medical data. We helped them develop a single system combining three powerful components:
- An advanced image processing system using Tesseract OCR to detect and blur sensitive information within image files.
- A natural language processing (NLP) engine built on the spaCy framework to identify and anonymize PHI within text files.
- A data review and annotation system for generating future models’ training data, ensuring the ML model is ever-improving in accuracy and performance.
A human-in-the-loop mechanism deepens safeguards while also optimizing performance. The system tags each identification with a confidence score, signaling when manual review may be needed. Developer feedback then helps the system learn, another mechanism for driving better performance over time.

The results: Accelerated R&D plus new potential revenue streams
By successfully developing a HIPAA-compliant data anonymization system for our client’s image and text medical files, we made it possible for them to integrate data anonymization into new and existing healthcare solutions.
This also allows them to share valuable health data with research partners faster and more securely, accelerating innovation while protecting patient privacy.
Data anonymization also creates new potential revenue opportunities for our client. For instance, our client could use their anonymized data to:
- Train new AI models for detecting diseases
- Apply predictive analytics for drug discovery
- Sell the data for similar research and development purposes
