Terabytes of Medical Data Unlocked: HIPAA-Compliant Anonymization Accelerates R&D

See how a global medical technology company achieved HIPAA-compliant data anonymization by detecting and blurring PII and PHI in its image and text files.

The challenge: Unlock medical data value while safeguarding privacy

To unlock the value in its terabytes of medical images and clinical notes, one of the world’s largest global medical technology companies had to anonymize and format all of its data while maintaining HIPAA compliance. That meant identifying and anonymizing all instances of PII and PHI on every image and text file.

However, identifying PII and PHI is difficult because sensitive information gets embedded in unpredictable ways (e.g., a physician’s handwritten notes about a patient, machine metadata, a half-visible hospital or manufacturer logo). Moreover, the sensitive and confidential nature of medical data prohibited the use of publicly available large language models (LLMs) to develop a solution.

The medical technology company needed a partner with expertise in machine learning (ML) data recognition models, optical character recognition (OCR) algorithms and custom AI models. They chose TELUS Digital.

Process for anonymizing health data

Our approach: Custom ML models detect and blur sensitive information in medical images and text files

Our healthcare client needed an ML system sophisticated enough to identify and protect sensitive information while preserving the scientific value of medical data. We helped them develop a single system combining three powerful components:

  • An advanced image processing system using Tesseract OCR to detect and blur sensitive information within image files.
  • A natural language processing (NLP) engine built on the spaCy framework to identify and anonymize PHI within text files.
  • A data review and annotation system for generating future models’ training data, ensuring the ML model is ever-improving in accuracy and performance.

A human-in-the-loop mechanism deepens safeguards while also optimizing performance. The system tags each identification with a confidence score, signaling when manual review may be needed. Developer feedback then helps the system learn, another mechanism for driving better performance over time.

Machine learning for medical data

The results: Accelerated R&D plus new potential revenue streams

By successfully developing a HIPAA-compliant data anonymization system for our client’s image and text medical files, we made it possible for them to integrate data anonymization into new and existing healthcare solutions.

This also allows them to share valuable health data with research partners faster and more securely, accelerating innovation while protecting patient privacy.

Data anonymization also creates new potential revenue opportunities for our client. For instance, our client could use their anonymized data to:

  • Train new AI models for detecting diseases
  • Apply predictive analytics for drug discovery
  • Sell the data for similar research and development purposes
Health data anonymization UI
The future of healthcare innovation lies in how well organizations manage and use the vast quantity of data generated every day. With the right partner, medical technology companies can turn ‘byproduct’ data into innovation and commercialization opportunities, leveraging advanced data management and analytics to power research and deliver real-world impact.Sydnor GammonVP, Business Development, TELUS Digital

Multi-part machine learning (ML) system for the de-identification and anonymization of medical data

  • 1

    The challenge

    A global medical technology company needed to unlock the value of its health data by anonymizing terabytes of medical images and clinical notes.

  • 2

    Our approach

    We created a multi-part machine learning (ML) system to identify and blur all personally identifiable information (PII) or protected health information (PHI) in image and text files.

  • 3

    The results

    Data anonymization integrated into new and existing solutions enabled faster, safer data sharing with development partners and new data monetization opportunities.

Unlock the value in your health data

Reach out now to discuss AI/ML solutions for medical data

Let's connect

Enterprise AI engineering for business transformation

Get the technical expertise you need to design, develop and deploy AI systems.

Learn more