By Richard Gliklich, MD, CEO, OM1  |   May 22, 2023

Integrated evidence generation (IEG) is a framework for generating evidence to support decision-making in healthcare. It involves the integration of multiple sources of real-world data (RWD), including electronic health records (EHR), claims data, and patient-generated data, to provide a more comprehensive and accurate view of the effectiveness, safety, and value of healthcare interventions.

To make these data fit for purpose, the data used and analyses generated must meet the evidentiary standard for the particular use case. For example, a regulatory use case will have a different evidentiary bar for fit than a peer-reviewed publication use case.

FDA Guidance Documents 

It’s important to set the table with Considerations for the Use of Real-World Data and Real-World Evidence to Support Regulatory Decision-Making for Drug and Biological Products: Guidance for Industry, a guidance document from the FDA. It defines RWD as “data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources”. It also goes to great lengths to define an interventional study, (or clinical trial) as a study in which participants are “assigned to one or more interventions, according to a study protocol, to evaluate the effects of those interventions on subsequent health-related biomedical or behavioral outcomes”, from a non-interventional study, (or observational study) as a study in which “patients received the marketed drug of interest during routine medical practice and are not assigned to an intervention according to a protocol”.

Transparency regarding data collection and analysis is where making decisions regarding RWD for a regulatory purpose versus a research purpose is essential. One suggestion the document makes is, if certain RWD are owned by third parties, have agreements in place to ensure that all relevant patient-level data can be provided to FDA, and that source data necessary to verify the RWD are made available for inspection. This suggests that the FDA intends to inspect sites that are participating in the provision of RWD and expect data to be auditable by an investigator visiting a site. The source data needs to be available to be compared to the data within the database.


Technology Platforms & Specialized Networks

We think about this technology in three categories:

  1. OM1 Origin™: connecting and sourcing platform

This technology connects to the source record systems such as EMR systems, labs, and existing registries. This meets the FDA requirement for maintaining the reliability of the RWD and data integrity, beginning with extraction of the data from its origin (i.e., data accrual). Automation is a game changer in these programs because it reduces burden on the practices and lowers cost.

  1. OM1 EMR Engine™: data processing and enrichment platform

The Engine brings in data from Origin to aggregate and de-identify so we can follow each person as a unique patient within the system while reducing the risk of exposure. Once this is done, additional data can be derived from it including unstructured data and linking to additional data sources. If this is done, both the source of data and the linkage parameters that will be used need to be included in the protocol as well. Data traceability (including all transformations) and validation of derived endpoints are key areas for review in a FDA submission. One of the most important uses of the OM1 EMR Engine™ is to generate validated structured variables from unstructured clinical notes and reports as described below. Engine is also available independently as a service for processing data from EMRs or processing text and other unstructured data for validated endpoint extraction.

  1. OM1 Apps: advanced analytics platform

We leverage the clinical narrative in two ways. One is where we use medical language processing (MLP) to convert unstructured data to structured data to extract variables and identify key outcomes within a disease. Our process is validated against the ‘gold standard’, clinician abstraction.

The second use of unstructured data is endpoint estimation. Using the clinical narrative, key endpoints can be estimated or imputed using validated models even though they may not have been recorded during the evaluation. For example, EDSS, SLEDAI, CDAI, BASDAI, PHQ9I, and NYHA are all endpoints for which we have published validated estimation models

A third use of the data through OM1 Apps is to enable identification of subjects who might qualify for a particular study or trial at the site.

Automated data collection: Reducing Site Burden & Decreasing Costs

To create reusable sites for evidence generation, OM1 Origin™ connects directly to site EMR and other systems. We then utilize our extensive library of adapters specific to each site’s EMR instance to identify a sub-network within a network for a specific disease. The sites would then sign off on a protocol, if it were a regulatory study, bringing to light that they will be inspected by the FDA and have monitoring set-up.

On top of this, some programs require ancillary data collection including clinical outcomes assessments, patient reported outcomes (PRO), images or biospecimens, coupled with automated EMR extraction. According to the guidance documents, these are not considered clinical investigations under part 312. Applicable requirements for protection of human subjects must be met.

Regulatory-focused Research using Automated Networks

Automation is changing the paradigm by reducing site burden, reducing cost and often enabling a waiver of informed consent. The traditional model is to build studies one at a time with unique data collection efforts. In an automated model, the data aggregation infrastructure is intended to be reusable and largely passive for the research team. New studies or substudies can be more efficiently added because the heavy lifting of data origination and processing has already been put in place. This can save enormous cost in large single programs or programs with multiple planned studies. In all programs, it reduces site burden, incentivizing participation and reducing time to full enrollment and database lock.