One-line solution summary:
We speed the identification of rare disease patients by using innovative machine learning with electronic health records.
Pitch your solution.
Rare diseases are under-recognized and misdiagnosed because of a lack of awareness of front-line physicians. As a result, many patients with rare diseases face profound delays in diagnosis and treatment. We sought to accelerate the diagnosis of rare diseases by algorithmically identifying key phenotypic features in their electronic health record. Building on published and preliminary data, we developed a machine-learning approach that leverages a training set of well-established, already-diagnosed (ground-truth) patient data at UCLA and across the University of California with advanced methods to generate a “risk assessment” based on phenotype data. The impact of our work addresses a common issue across all rare diseases, the delays in patient identification for definitive diagnosis. We could expand our impact by building our algorithm into the electronic workflow of clinical care of EHR systems like EPIC and Cerner.
What specific problem are you solving?
We propose to specifically tackle the problem of identifying patients with rare immune diseases, called inborn errors of immunity (IEI) or primary immunodeficiency diseases, which affect around 0.4% of all people. Because rare genetic immune diseases have phenotypes of infections, autoimmunity, and inflammation, they are often referred to see specialist doctors in infectious diseases, rheumatology, pulmonology, or hematology for the clinical care that addresses symptoms but not the genetic and immunological underpinnings. Thus, patients with IEIs “hide” in the medical system, traveling from one specialist to another, delaying the identification of a medically actionable, genetic diagnosis. The delays to be diagnosed with immune deficiencies range from 5-15 y. Our preliminary data show that our approach can speed diagnosis in some cases 1-5 years faster than the current standard of care. IEI patients who face delays in getting appropriate treatment, for example, immunoglobulin infusions, cost the healthcare system an extra $50-80K/year/patient. Prompt identification of patients with IEIs is paramount to reduce the risk of irrevocable sequelae such as bronchiectasis, encephalitis, or kidney failure. Proper diagnosis improves outcomes, including high rates of medical actionability. Furthermore, making diagnoses faster has positive psychosocial outcomes—patients lacking a definitive diagnosis report low empowerment, frustration, and confusion.
What is your solution?
To identify rare disease patients, the traditional approach calculates “risk scores” for each patient in the EHR as a weighted linear combination of counts of EHR-derived phenotypes. This approach has significant limitations, including ignoring the reoccurrence of phenotypes (e.g., multiple pneumonias) and key aspects of disease trajectory (e.g., bronchiectasis occurs after pneumonias). We propose to leverage non-linear machine learning methods that use the set of “ground-truth” IEI patients to train discriminative models to discern IEI cases versus controls. We will use the curated list of “ground-truth” patients within a 80%-20% cross-validation framework to estimate the accuracy our method. We focus primarily on the random forests approach, a machine learning method that has been shown to be highly accurate for EHR-based classification. As input to the approach, we will empirically identify EHR features (PheCodes and laboratory values) and jointly estimate a random forest discriminative model in cross-validation framework from the EHR data. We have applied our method in the UCLA dataset of 1.5M patients, and are expanding to all five University of California sites.
Who does your solution serve, and in what ways will the solution impact their lives?
Inborn errors of immunity (IEIs) (also called primary immunodeficiency diseases, PIDs) make up over 400 individually rare genetic disorders that phenotypically impact the immune system, resulting in susceptibility to infections, autoimmunity, or dysregulated inflammation. Though each disease is individually rare, we recognize now that some kind of IEI affects around 0.4% of all people. These patients go unrecognized because of our healthcare model, and as a result their diagnoses are delayed by 5-15 years, leading to profound and irrevocable morbidities like lung scarring (bronchiectasis), unnecessary testing, and excess healthcare costs of $50-80k/year compared to after diagnosis. Our preliminary data show that we can speed the diagnosis by up to 5 years already. Our team includes a senior immunologist (Dr. Butte) who has worked in the field of IEIs for 15 years, and who is a close partner of the Jeffrey Modell Foundation and the Immune Deficiency Foundation. He is a proud advocate of IEIs, and a much sought after speaker in the IEI community at local, national, and international level. Dr. Butte has highlighted to the community the major need to speed the diagnosis of patients with IEIs.
Which dimension of the Challenge does your solution most closely address?
Leverage big data and analytics to improve the detection and diagnosis of rare diseasesExplain how the problem you are addressing, the solution you have designed, and the population you are serving align with the Challenge.
It is hopeless to improve the diagnosis of IEIs simply by increasing awareness of frontline physicians, because IEI patients are managed by doctors across primary care, pulmonology, infectious diseases, rheumatology, hematology, GI, and others. We believe the best approach to speeding the diagnosis of rare immune diseases lies in algorithmic approaches on the electronic health record data. We propose an automated method that surveils the EHR, scores patients based on phenotypic features of IEIs, and flags the highest scoring ones for review. These patients will be brought to attention quickly, referred to an immunologist for proper genetic diagnosis and to avail necessary and life-extending treatments.
In what city, town, or region is your solution team headquartered?
Los Angeles, CA, USAExplain why you selected this stage of development for your solution.
We have developed our approach in the EHR database for UCLA, one of the largest and best databases in the country. We would expand to the other four University of California sites, which already share an electronic health record system next. Then we would propose to scale internationally to all health systems that based their EHR on EPIC and Cerner systems.
What is your solution’s stage of development?
Pilot: An organization deploying a tested product, service, or business model in at least one community.Who is the Team Lead for your solution?
Manish Butte, MD PhD
Which of the following categories best describes your solution?
A new technologyWhat makes your solution innovative?
Our solution is innovative in combining phenotype codes (phecodes), laboratory values, and chart review into risk scoring for rare diseases. There are many criticisms of using EHRs and ICDs to define conditions or categorize patients. We overcome these limitations while benefiting from efficiency and scale through use of Phenotype Codes (PheCodes), which offers better replication rates and p values than ICDs.
Empirical phenotype gathering surpasses current approaches that use phenotypes garnered from OMIM. Clinical exome analysis uses HPO terms garnered from OMIM to make gene lists that are used for filtering. OMIM lists, however, are gathered from expert opinions and one-off publications, at best. Empirically deriving phenotypes and their frequencies from ground-truth rare disease patients is technically and conceptually innovative and will allow for a more accurate picture of disease phenotypes.
We have innovated an innovative approach that links patient age to phenotypes. In the same way that pediatric medicine is associated with different exposures, pathogens, and phenotypes than adult medicine, the presentation of rare diseases within different age groups is also different. Our approach is technically innovative by incorporating information on patient age, sex, and ethnicity into the risk scores we generate using PheCodes.
Because our approach tags every EHR phenotype and laboratory value to age, another technical innovation here allows us to follow disease trajectories as a “pseudotime” as patients accumulate phenotypes over time and predict which phenotypes occur “after” other phenotypes.
Describe the core technology that powers your solution.
To identify rare disease patients, our technologically approach entails calculating a “risk score" for each patient in the EHR. Individual diagnostic entries in the EHR are converted to phenotype codes (phecodes). We empirically estimate weights for the various phecodes from the data itself. We will focus primarily on the random forests approach, a machine learning method that has been shown to be highly accurate for EHR-based classification. We will use a curated list of “ground-truth” patients within a 80%-20% cross-validation framework. We have shown that our risk score captures all our IEI patients in the top 0.1% of scores. We further refine the method by adding temporal features into the scores, including the reoccurrence of phenotypes (e.g., multiple pneumonias or sinus infections) and the order they occur. In preliminary data, we can accelerate the diagnosis of IEIs by up to 5 years. Our approach also includes a classifier that discerns whether patients have a likelihood of having an "infection only" phenotype versus those who have autoimmunity and inflammatory complications, which profoundly reduce survival.
Provide evidence that this technology works. Please cite your sources.
Our work was recently awarded an NIH R01 grant (2021-2026) based on preliminary data. Those early data are as yet unpublished, but we expect to submit for publication in the next few weeks. We are happy to share a late stage manuscript as needed.
Does this technology introduce any risks? How are you addressing or mitigating these risks in your solution?
Our algorithm has been approved by the UCLA IRB and the IRBs of UCSF, UC Irvine, UC Davis, and UC San Diego. Our code queries the de-identified database of UCLA, and after identification of subjects with high "risk scores," we use an honest broker to reveal medical record numbers in bulk. We then perform manual chart review by the immunology team. Further contact is performed by the honest broker to primary care providers and with patients. This approach eliminates privacy concerns at the front end (by working only with de-identified data) and other ethical concerns.
Please select the technologies currently used in your solution:
Which of the UN Sustainable Development Goals does your solution address?
Select the key characteristics of your target population.
In which countries do you currently operate?
In which countries will you be operating within the next year?
How many people does your solution currently serve? How many will it serve in one year? In five years?
Our current approach targets the ~5M patients at UCLA and within the next five years, the 15M patients across the University of California health system. We will expand after that to all EHRs running EPIC and Cerner, which comprise over 200M patients.
What are your impact goals for the next year and the next five years, and -- importantly -- how will you achieve them?
Our milestones include algorithm improvements; improving the link between EHR phenotypes to rare and common genetic variants; and improving the EHR client to flag patients with high "risk scores" for rare immune diseases. We have built a strong team at UCLA to launch this effort, and partnered with immunologists and data scientists at each of the five UCs and Vanderbilt. Our initial goals include replication across sites while improving the algorithm.
How are you measuring your progress toward your impact goals?
Rare diseases affect individuals from all walks of life, from infancy through elderly years. Our approach accelerates the diagnosis of rare immune diseases, saving costs and reducing morbidity and mortality. We will measure progress through a variety of quantitative approaches. In algorithm development, we employ a ground-truth dataset and 20/80 cross validation techniques to ensure our approach accurately finds IEI patients. In improving outcomes for patients, we survey patients using validated instruments about quality of life. We survey frontline providers about awareness of rare diseases. These approaches are based on our experience as leaders in the NIH-funded Undiagnosed Diseases Network and our own experiences as leaders in rare diseases care.
What type of organization is your solution team?
Nonprofit
How many people work on your solution team?
We have two leading faculty, one in immunology and one in computational medicine. The team includes one nurse, two graduate students, two staff programmers, a clinical research coordinator. All personnel share partial time with other projects and clinical activities at UCLA.
How long have you been working on your solution?
2
How are you and your team well-positioned to deliver this solution?
Our team is a collaboration of two faculty at the lead: Bogdan Pasaniuc, Phd, an Associate Professor of Computational Medicine, Human Genetics and Pathology and Laboratory Medicine at Geffen School of Medicine at UCLA. He has a Ph.D. in computer science and an extensive experience in the development of scalable methods for the analysis of large-scale data genetic variation. He has a track record of developing publicly available methods for the genetics community. He is actively involved in the Institute of Precision Health at UCLA that aims to genotype over 150k patients linked to their electronic health records (EHR) to use computation to improve medical outcomes. He has published over 50 papers as a senior author (out of more than 80 co-authored papers).
Our immunology lead is Manish Butte, MD PhD, a practicing physician and maintaining board certification in both Pediatrics and Allergy & Immunology. His clinical focus is on primary immunodeficiency diseases and other rare, genetic, inborn errors of immunity. He sees children and adults with these disorders and train clinical and research fellows in this area. He has funded clinical research projects to study T-cell responses to invasive fungal infections and to develop a pipeline approach for validating novel mutations in genetic immune diseases (including a new R01 in 2021). He is a funded co-investigator in the NIH-funded Undiagnosed Diseases Network site at UCLA and a co-founder of the UCLA California Center for Rare Diseases. He has published over 100 papers and has over 13000 citations.
What is your approach to building a diverse, equitable, and inclusive leadership team?
UCLA is committed to Justice, Equity, Diversity, and Inclusive hiring and employment practices. Dr. Butte sits on the Pediatics JEDI council and all faculty are trained in hiring practices that follow JEDI principles.
Is your team led or managed by a person with a rare disease?
No. Dr. Butte has been caring for patients with rare inborn errors of immunity as his primary clinical role since finishing fellowship training in 2006. He is a well known advocate of patients with IEIs
Do you primarily provide products or services directly to individuals, to other organizations, or to the government?
Individual consumers or stakeholders (B2C)Why are you applying to Solve?
We at UCLA have been addressing many problems faced by patients with Rare Diseases. Here we are applying for the Horizon Prize to further our efforts to use computational approaches to accelerating the identification and diagnosis of patients with rare immune diseases, who range from infants to adults and from all walks of life.
In which of the following areas do you most need partners or support?
Please explain in more detail here.
We would use funding to hire staff programmers and graduate students to expand our preliminary technological approaches.
What organizations would you like to partner with, and how would you like to partner with them?
We would love to partner with rare disease organizations across the MIT solve network.
Solution Team
to Top
Solution Name:
UCLA