Oncology Data Precision
Licensing, De-identification, & Annotation for NLP Model Innovation
Revolutionizing Cancer Care with Cutting-Edge NLP Technologies
The client, a major player in the healthcare industry, required an advanced NLP solution to process a substantial volume of oncology medical records. As part of a pivotal initiative to refine oncology research, the need to balance detailed data analysis with stringent privacy standards is paramount. This case study outlines our contributions to enhancing the client’s research endeavors through high-fidelity data annotation, rigorous de-identification practices, and the application of Natural Language Processing (NLP) techniques, all within the regulatory framework provided by HIPAA.
Volume
Challenges
The project required a nuanced understanding of clinical documentation, precise identification of medical entities, and the ability to apply negation labels accurately, all within a secure framework that protects patient privacy according to HIPAA regulations. The endeavor demanded not only technical expertise in handling large volumes of complex data but also a strategic approach to incorporate feedback and maintain quality across all stages of the annotation process.
Detailed Description of Services
Category | Description |
---|---|
Comprehensive Clinical Data Coverage | Spanning various note types, care settings, and oncological subspecialties, ensuring a robust dataset reflective of diverse clinical scenarios. |
Rigorous De-identification | Ensuring all labeled records are de-identified in compliance with HIPAA’s Safe Harbor method, assuring client confidence in data privacy and security. |
Annotation Guidelines | Creation and implementation of standard data annotation guidelines for preparing Labeled Records in line with HIPAA standards. |
Advanced Annotation Strategies | Manual annotation of 10,000 pages of oncology-related records was carried out with a detailed focus on identifying negation statuses and other pertinent information in accordance with established guidelines. |
Rigorous Quality Assurance | Attain the specified quality standard outlined in the guideline |
Solution
Our approach involved the following key strategies:
Customized Oncology Dataset Compilation
From a vast archive of over 5 MN EHR’s, a carefully chosen subset of data was extracted, aimed at addressing client’s specialized requirements for oncology data with a focus on genomic entities. The collection process involved creating an exhaustive list of tumor markers, genes, variants, & TNM stages, employing keyword searches to pinpoint documents abundant in this data. Regular expressions were used to identify a range of genetic variations and cancer stages. This approach, combined with a broad data coverage encompassing various document types, specialties, care settings, and data from multiple physicians, ensured a comprehensive & relevant oncology dataset.
Rigorous De-identification
The process adhered strictly to HIPAA’s Safe Harbor method for de-identification, which guarantees the client’s confidence in data privacy and security. This involves removing all Protected Health Information (PHI) and replacing it with labeled placeholders, thereby maintaining the utility of the data while protecting patient confidentiality.
De-identification Variables
Category | Subcategory |
---|---|
Name | Patient name, Physician name, Nurse practitioner name, Family member name, Medical center name, Clinic name, Nursing home name, Company name, University name |
Age | |
Date | Date pattern, Month Year pattern, Day Month pattern, Day Year pattern, Day, Month, Year, Season |
Location | Country, State, City, Street, ZIP Code, Room number, Suite number, Floor number |
ID | Social security number, Medical record number, Health plan beneficiary number, Account number, Certificate/License number, Biometric id, Record id, Accession number, Vehicle identification number, License plate numberDevice identifiers and serial number |
Contact | Telephone number, Fax number, Email address, Web URL, IP address |
On September 25, 2106, at 11:00 am, Mr. Harry Pace, aged 90, was admitted to Forrest General Hospital for a scheduled hip surgery, previously consulted by his primary care physician Dr. Jose Martin, and attended by Kendra Reith, MD. During his stay, he was under the care of Mary Hu, N.P., and Suzan Ray, R.N., with R. Charles Melancon, PA, also being consulted. His operation, conducted on the same day as admission, was successful with no complications reported. Following surgery, Mr. Pace was transferred to Room 202, Floor 2, for recovery. His wife, Emma Pace, was present throughout and was provided with all necessary updates. During his brief stay, his medical records, including MRN MR99062619 and Account KV000014764, were handled according to the standard protocols of Gracewood Nursing Home, his previous residence. He was discharged later the same day to the care of Oakland Outpatient Clinic for further recuperation. Throughout the process, all procedures were documented and secured with adherence to confidentiality standards.
On [Date Pattern], at 11:00 am, Mr. [Patient Name], aged [Age], was admitted to [Medical Center Name] for a scheduled hip surgery, previously consulted by his primary care physician Dr. [Physician Name], and attended by [Physician Name] MD. During his stay, he was under the care of [Nurse Practitioner], N.P., and [Nurse Practitioner], R.N., with [Physician Name], P.A., also being consulted. His operation, conducted on the same day as admission, was successful with no complications reported. Following surgery, Mr. [Patient Name] was transferred to Room no. [Room Number], Floor no. [Floor Number], for recovery. His wife, [Family Member Name], was present throughout and was provided with all necessary updates. During his brief stay, his medical records, including MRN [Medical Record Number] and Account [Account Number], were handled according to the standard protocols of [Nursing Home Name], his previous residence. He was discharged later the same day to the care of [Clinic Name] for further recuperation. Throughout the process, all procedures were documented and secured with adherence to confidentiality standards.
Annotation Guidelines & Advanced Annotation Techniques
Insights AI was instrumental in establishing and implementation of standard data annotation guidelines ensured that all Labeled Records were prepared consistently and in compliance with HIPAA standards. Moreover 10,000 pages from various medical records were meticulously annotated, with a focus on the detailed labeling of negation statuses and other clinically relevant entities including various oncology sub specialties. The annotation were carried out by a team of expert annotators with specialized knowledge in oncology and data privacy regulations.
Complex Annotation Criteria
Subcategory | |
---|---|
Date Annotation (Oncology) | Diagnosis Date, Stage Date, Onset, Procedure Date, Med Date Started, Med Date Ended, Radiation Date Started, Radiation Date Ended |
Disease (Oncology) | Cancer Problem, Histology, Clinical Status, Body Site, Behaviour, Grade, Cancer Stage, TNM stage, Tumour Marker Test, Dimensions, Code |
Treatment (Oncology) | Cancer Medicine, Drug Dosage, Frequency, Cancer Surgery, Surgery Result, Radiation Modality, Radiation Dosage |
Genomics | Variation Code, Gene Studied, Method, Specimen |
Negation | Negative, Possible Negative, Uncertain, Possible Positive |
Clinical NER | Cancer problem – Body Site, Histology – Body Site, Behaviour – Body Site, Cancer Surgery – Relationships Body Site, Radiation Modality – Body Site, Histology – Grade, Cancer Problem – Dimension |
Example:
“Patient Jane Doe was diagnosed with Stage IIIB non-small cell lung cancer (NSCLC), specifically adenocarcinoma, on 03/05/2023. The cancer is located in the right lower lobe of the lung. It is classified as T3N2M0 according to the TNM staging system, with a tumor size of 5 cm x 3 cm. An EGFR exon 19 deletion was identified through PCR analysis of the tumor biopsy specimen. Chemotherapy with Carboplatin AUC 5 and Pemetrexed 500 mg/m² was initiated on 03/20/2023 and is to be administered every 3 weeks. External beam radiation therapy (EBRT) at a dose of 60 Gy in 30 fractions commenced on 04/01/2023. The patient’s treatment is ongoing, and there is no evidence of brain metastases on the recent MRI. The possibility of lymphovascular invasion is yet to be determined, and the patient’s tolerance for the full chemotherapy regimen remains uncertain.
Oncology Clinical Note Statement
Rigorous Quality Assurance
Implemented a flexible project management framework that facilitated the effective integration of client feedback while upholding stringent quality standards. A comprehensive quality assurance protocol was enforced, aligning with the guidelines to reach the requisite quality benchmarks. This protocol featured successive rounds of review and verification, securing the precision and dependability of the annotated data. Such meticulous quality oversight is crucial in crafting a dependable NLP solution, vital for informed clinical decision-making and research excellence.
Outcome
Successfully delivered 10,000 high-quality, De-identified Labeled Records, providing a secure and valuable dataset for the client’s NLP model development. The meticulous application of NLP and adherence to HIPAA de-identification standards resulted in a highly refined dataset that will underpin the client’s ongoing and future oncology research efforts, ultimately aiming to enhance oncology patient outcomes and care delivery efficiency.
The success of the project illustrates our ability to handle complex medical data with precision, contributing to the client’s aim of improving patient care outcomes & accelerating the pace of healthcare innovation.
Our partnership with Insights AI has been instrumental in advancing our NLP capabilities within the oncology domain. The professional handling of 10,000 medical records, annotated with detailed negation and other clinical entities, demonstrated their commitment to excellence and compliance. Moreover, their commitment to privacy standards like HIPAA have provided us with invaluable resources to drive our AI initiatives of developing a cutting-edge oncological treatments and diagnostics forward.