Chapter 4: Selecting and Defining Outcome Measures for Registries

1. Introduction

As discussed in Chapter 3, the outcomes captured in a registry-based study should be selected primarily based on the research questions of interest, with consideration given to the feasibility of capturing the desired outcomes within the study scope and budget. It is also important to consider the perspectives of multiple stakeholders when determining which outcomes are most relevant.

The selection and definition of patient outcomes of interest is a critical step in designing a patient registry. The outcomes of interest, together with the exposures(s) of interest, drive many of the decisions regarding the study duration, the necessary data elements, and the source(s) of the data. For example, in determining the study duration and frequency of followup, registry developers should consider when the outcomes of interest may be observed (e.g., three months after treatment, one year after treatment). When selecting data elements, it is important to define the critical data elements to capture the outcomes of interest, along with any information that is necessary for risk adjustment. In evaluating potential data sources, registry developers should consider whether the outcomes of interest are available in the data source and the reliability of such data. Decisions about data management, such as the need for adjudication or validation of outcomes, are also informed by the specific outcomes of interest.

This chapter describes a framework that can be used to guide the selection and definition of outcome measures for use within patient registries. Types of outcome measures and considerations in defining outcome measures are discussed, as well as the rationale for standardization of outcome measures and resources for finding standardized outcome measures. Considerations related to study design, data collection and management, and analysis are addressed in Chapters 3, 11, and 13, respectively.

2. Outcome Measures Framework

2.1 Development of the Outcome Measures Framework

Over the past eight years, the Agency for Healthcare Research and Quality (AHRQ) has supported a series of projects to understand how registries select and define outcome measures and to develop tools to support harmonization of outcome measures. This work launched in 2011 with a series of stakeholder meetings designed to gather information on how outcome measures were collected in existing patient registries and how stakeholders would like to see information on outcome measures presented. In parallel, background research was conducted to identify existing models or systems designed to categorize and/or present information on data elements, outcome measures, or quality measures. Based on the background research and stakeholder feedback, the initial Outcome Measures Framework (OMF) was created in early 2012 and revised following a series of web-based meetings and document review cycles with stakeholders. The OMF was finalized in December 2012.¹

The second phase of the OMF project began in 2013 with a systematic literature review of systems used to standardize language and definitions for outcome measures and other data elements, including systems for registries, clinical trials, electronic health records (EHRs), and quality reporting systems. The literature review identified 61 publications on three major topics: harmonizing data elements, key components of outcome measures, and governance plans for existing models. Many of the publications described efforts to harmonize data elements or create core sets of outcome measures; these efforts were identified as useful models for developing standardized outcome measures through a consensus-driven process. At the time this review was completed (2014), no existing efforts with the same or substantially similar goals as the OMF project were identified.²

In 2015, a qualitative analysis was conducted to test the robustness of the OMF and identify any areas for improvement. Outcome measures from four diverse condition areas – depression, asthma, rheumatoid arthritis, and cardiac surgery – were abstracted from patient registries listed on ClinicalTrials.gov in June 2015 and mapped to the OMF. The condition areas were selected to represent different types of conditions, treatment options, providers, care settings, and patient populations. Two of the condition areas (rheumatoid arthritis and cardiac surgery) were selected for further analysis, and additional outcome measures were abstracted from patient registry-run websites and the published literature and mapped to the OMF. Across the four condition areas, 416 outcome measures were identified and reviewed. Most measures mapped directly to the OMF; analysis of the measures that did not map directly to the OMF resulted in minor modifications to the framework. The analysis demonstrated the robustness of the OMF for classifying a diverse group of outcome measures and highlighted its potential for supporting the development of standardized outcome measures in a range of condition areas.³

Throughout each phase of the development of the OMF, stakeholder feedback has been actively sought and incorporated into the framework. Over 400 stakeholders representing registry stewards, healthcare provider organizations, professional societies, academia, research and consulting organizations, government agencies, patient/consumer organizations, journal editors, payers, and pharmaceutical and medical device companies have participated in the various meetings and review activities.

2.2 Structure of the Outcome Measures Framework

The OMF (Figure 4-1) is a hierarchy with three levels: domains, subcategories of data elements, and data elements. The domains – characteristics, treatments, and outcomes – represent the process by which characteristics of the participant, disease, and provider influence treatment, and by which characteristics and treatment together influence outcomes. The process may be iterative, in that outcomes of one treatment may determine additional courses of treatment. At the second level, subcategories of data elements are presented to help guide the definition of an outcome measure. For example, information on the intent of a treatment (palliative vs. curative vs. management) is important when determining the appropriate outcomes to measure. Lastly, at the third level are the categories of data elements that would be used to define an outcome measure, such as those that capture the patient demographics and diagnosis. These categories are intentionally broad so that the framework can be used across condition areas; not all categories will be relevant in a specific condition area.

Figure 4-1. Outcome Measures Framework*

*Modified from Gliklich RE, Leavy MB, Karl J, et al. J Comp Eff Res. 2014;3(5):473-80.

In the Outcomes domain, outcome measures are grouped into five main categories: survival, clinical response or status, events of interest, patient-reported, and resource utilization. These categories represent both final outcomes, such as mortality, as well as intermediate outcomes, such as clinical response. While final outcomes may be most important in some condition areas, inclusion of intermediate outcomes such as clinical response makes the framework applicable to chronic conditions such as asthma or diabetes, where tracking patient-reported outcomes and disease progression over time is critical. It is also important to note that outcome measures may fit in more than one category. As an example, patient-reported outcomes may be used to assess clinical response (or status) for some conditions (e.g., depression).

Finally, two categories—Experience of Care and Impact on Non-Participant—are included below the Outcomes domains section. These measures fall outside of the structure of the OMF, in that they do not reflect an outcome of treatment for an individual patient; however, these are important concepts to capture in some condition areas. For example, a registry may wish to capture a birth outcome for a woman receiving treatment during pregnancy. Registries also may wish to understand patients’ experiences of care, particularly as they relate to specific issues encountered during treatment, such as care coordination and provider communication in oncology. These categories are discussed in more detail in the “Types of Outcome Measures” section below.

The Characteristics domain describes attributes of the patient, disease, and provider that may be important for risk adjustment. The framework provides examples that can be modified for specific clinical areas. For example, in asthma, key characteristics to collect include the patient’s age, race, ethnicity, age of onset of symptoms, history of near fatal asthma exacerbation, comorbidities, and type of provider.

The framework is a common model intended to be applied to specific conditions in potentially differing ways. For that reason, recommendations for measurement frequency are not specified in the model, but should be specified when applying the OMF to specific condition areas. Different timeframes and measurement frequencies may be appropriate depending on the condition area and outcome measure of interest. Further, some decisions regarding frequency of measurement are made by registries with a goal to minimize administrative and respondent burden. As these data elements are incorporated into interoperable health IT systems, those limitations may become fewer, allowing for new time points for some measures to be added (e.g., longer followup). Chapters 2 and 3 discuss considerations related to determining the duration of observation and the frequency of followup.

3. Types of Outcome Measures

As shown in the OMF, outcome measures can be grouped into five major categories that are relevant across a broad range of condition areas.

3.1 Survival Measures

Survival measures are important endpoints for many registries. Some survival measures, such as all-cause mortality, can be defined and captured consistently across many types of registries. All-cause mortality is broadly relevant for most condition areas and is useful for registry operations (e.g., determining that a patient has died instead of classifying the patient as lost to followup). Cause-specific mortality can be more challenging to capture because of the difficulty of ascertaining cause of death in a consistent and accurate fashion. For example, in lung cancer, pneumonia may be the immediate cause of death, while lung cancer is the underlying cause of death. Other causes of death, such as suicide, may be underreported. Because of these issues, registries interested in cause-specific mortality may consider capturing all-cause mortality as well. In registries that focus on a specific procedure or treatment, treatment-related mortality may be of interest. For example, the definition of procedure-related deaths following catheter ablation is “all-cause mortality within 30 days of the procedure or during the index procedure hospitalization (if the postoperative length of stay is > than 30 days). Procedure-related deaths include those related to a complication of the procedure or treatment for a complication of the procedure.”⁴

In some condition areas, such as oncology, survival measures include the concepts of progression-free survival and disease-free survival. In the context of cancer drugs and biologics, the U.S. Food and Drug Administration (FDA) defines progression-free survival (PFS) as the time from randomization until objective tumor progression or death. Disease-free survival (DFS) is defined as the time from randomization until recurrence of tumor or death from any cause.⁵ Overall survival is a critical outcome in oncology research, but it presents challenges in some contexts, such as when the natural course of the disease is lengthy or when a new treatment results in only incremental improvements in survival. Other survival measures, such as DFS and PFS, can be important endpoints in these circumstances. These endpoints are also useful in studies that examine multiple rounds of treatment (e.g., first line, second line, etc.), as each treatment can be examined individually. However, unlike overall survival, PFS and DFS are not precisely measured, can be subject to assessment bias (e.g., measurement of tumor size), and may be defined differently in different studies.⁶ The FDA has developed guidance on the use of PFS and DFS in the context of oncology clinical trials; much of this information is relevant for registry developers as well.⁵

3.2 Clinical Response or Status

Clinical Response measures capture the clinician’s assessment of whether the patient is responding to treatment – meaning improving, worsening, or remaining stable – or, for patients not receiving treatment, whether the patient’s clinical status is changing. These measures can be challenging to capture for several reasons. First, for many condition areas, a uniform approach to assessing clinical response has not been clearly articulated by the providers who treat those conditions. Moreover, clinicians freely admit that it can be difficult to date the onset and resolution of exacerbations of chronic diseases. It should also be noted that in some condition areas, different outcomes may be used depending on the intent of treatment. For example, in atrial fibrillation, recurrence of atrial fibrillation is an important outcome for patients undergoing ablation procedures, but this outcome is not relevant for patients receiving anticoagulation therapy. In some cases, clinical response is best measured using patient-reported outcomes (e.g., improvement or worsening in pain, asthma control). In general, clinical response measures should be valid and reproducible across different care settings and different providers and should be relevant to patients and providers.

3.3 Events of Interest

Events of interest typically include complications, adverse events related to treatment, or events associated with disease progression. For example, stroke is an important event for studies of atrial fibrillation, while exacerbation is an important event in studies of asthma. Clear, unambiguous definitions are critical for capturing events of interest consistently across sites.

3.4 Patient-Reported Outcomes

Patient-reported outcomes (PROs) reflect the patients’ perceptions of their status and their perspective on health and disease. PROs have become an increasingly important avenue of investigation in many condition areas, and their importance is widely recognized. However, identification and selection of specific PROs for use within registries can be challenging. These challenges are discussed further in the “Selecting PROs” section below.

3.5 Resource Utilization

Resource utilization measures capture the patient’s interactions with the healthcare system. In some cases, the outcomes of interest are specific events (e.g., hospitalizations), while in other cases, the overall economic burden of the condition is important to capture (e.g., office visits, medications, hospitalizations, etc.). For some conditions, impact on work productivity and missed days of school are also outcomes of interest.

3.6 Composite Endpoints

Composite endpoints are composed of a specified set of outcomes of interest and are often used when the individual outcomes of interest are rare and/or when the outcomes are related clinically.⁷ In a composite endpoint, the patient is considered to have reached the endpoint if any of the individual outcomes occurs. An example of a composite endpoint is major adverse cardiovascular or neurological events (MACNE), defined as a composite of cardiovascular death, myocardial infarction, stroke/non-central nervous system (CNS) systemic embolism, or transient ischemic attack.⁸

A related approach to tracking outcomes over time is the characterization of the patient’s condition by a set of scores that leverage a range of patient data and can be assessed as repeat measures over time. Disease activity indices in rheumatoid arthritis are one example.⁹'¹⁰

4. Selecting Patient-Reported Outcomes

The process of choosing which PRO measure(s) to include in a registry can be challenging, largely because the number of available measures is overwhelming. As discussed in Chapter 3, clear and careful definition of the target population, concept to be measured, and purpose of the registry is an important first step. In addition, when selecting measures, burden on the participant is a major consideration. The inclusion of multiple PROs can be tempting, but they may deter patient participation if the burden is excessive.

As a first step, researchers should search for existing PRO instruments that will assess the outcomes of interest. Traditional literature searches can yield results, but may be quite time-consuming. The Mapi Institute maintains the Patient-Reported Outcome and Quality of Life Instruments Database (https://eprovide.mapi-trust.org/about/about-proqolid), allowing users to search a large and relatively comprehensive database for PRO instruments that best address the specific needs identified. The Online Guide to Quality-of-life Assessment (http://www.olgaqol.com/) is another database of existing QOL instruments. Additionally, the U.S. National Institutes of Health PROMIS Initiative (http://www.healthmeasures.net/) is developing rigorously tested item banks across a broad range of domains and subdomains (functioning, disability, symptoms, distress, and role participation).¹¹ The PROMIS Initiative is also actively evaluating methods to achieve brevity in instruments through techniques such as computer adaptive testing. Importantly, these measures are publicly available.

Item banks represent another option for developing PRO surveys. In general, item banks contain comprehensive collections of items that pertain to a particular construct (e.g., dyspnea).¹² Item banks generally rely on item response theory (IRT), in which the unit of focus is the item rather than the entire instrument. As such, instruments can be constructed using IRT that employ only those items which provide the most useful and relevant information, eliminating questions with little added value, without compromising psychometric qualities.¹³ The PROMIS Initiative is an example of an item bank. A Computer Adaptive Test (CAT) is the dynamic application of an item bank using an algorithm that can narrow the number of items that need to be presented to a patient in order to arrive at a scale score. This can be a useful tool for limiting respondent burden for some PRO uses, although CAT scales require a continuous connection to the internet. An example of this are the CAT versions of the PROMIS scales.

Many properties of PRO instruments should be considered when choosing the appropriate instrument for a specific registry. These include the developmental history and conceptual framework; psychometric properties; content, construct, and criterion validity; reliability; and ability to detect change. The interpretability of the scores and the availability of alternate forms (e.g., different languages, different modalities for administration) are also important. Extensive literature exists on these topics; in particular, the COSMIN study (COnsensus-based Standards for the selection of health status Measurement INstruments) checklist is a useful tool for helping to guide the selection of a measurement instrument.¹⁴ Registries should also consider the intended use of the data; for example, registries that are intended to inform regulatory decision making should follow the U.S. Food and Drug Administration (FDA) guidance on PROs.¹⁵ Case Examples 8 and 9 provide an examples of the use of PRO instruments in registries.

5. Standardized Outcome Measures

5.1 Rationale for Standardization

Currently, registries, clinical trials, quality improvement initiatives, and other data collection efforts frequently measure different outcomes or use different definitions of the same outcome measure. For example, a technology assessment to determine the safety and efficacy of retinal prosthesis systems for halting disease progression in patients with retinitis pigmentosa reported 74 different outcome measures used in 11 studies.¹⁶ Only three of the 74 outcome measures were reported by three or more studies, and only four of the outcome measures had evidence of validity and reliability. This type of variation in the selection of outcome measures is common across condition areas and has been well-documented in the literature.¹⁷⁻²⁰

Variation in the definition of a specific outcome measure is equally problematic. Consider, for example, the definitions of bleeding that are used in cardiovascular research. A systematic review and meta-analysis published in 2014 found that 10 different definitions of major bleeding are currently used in clinical trials and patient registries for patients undergoing percutaneous coronary intervention (PCI).²¹ The definitions include different clinical events (e.g., blood transfusion, hemorrhage), different laboratory parameters, and different outcomes (e.g., mortality), and the incidence of major bleeding, naturally, varies depending on the definition used by the study. In one example cited by the authors, non-coronary artery bypass graft-related major bleeding occurred in 0.87% of patients according to one definition but in 3.1% of the same population according to another definition. While PCI studies are measuring the same outcome, “major bleeding,” comparison across studies is challenging because of the variations in definition. An earlier review, published in 2007, identified the same issue with bleeding definitions in PCI studies, leading the authors to conclude that “different bleeding definitions can lead to markedly different conclusions about the safety of an antithrombotic regimen.”²²

To address these issues, many consensus-based efforts with different intended uses and scopes have been launched. For example, the National Institutes of Health (NIH) has focused on harmonization of data elements by supporting multiple efforts to develop common data elements (CDEs), both for specific disease areas as well as for general use. The Office of Rare Diseases Research (ORDR), within NIH, has developed CDEs for use in any rare disease registry in conjunction with the Global Rare Diseases Patient Registry (GRDR) being developed through the ORDR. NIH also has launched a repository to facilitate access to CDE resources.²⁴ The Pew Charitable Trusts is also working on a collaborative project with the Duke Clinical Research Institute to develop registry data standards for concepts collected frequently in registries.²⁵

At the outcome measure level, some efforts have focused on standardizing the definition of a single outcome, such as myocardial infarction,²⁶ while others have focused on harmonizing the outcome measure concepts captured across studies in a specific disease area. OMERACT (Outcome Measures in Rheumatology), a long-standing, independent, and international initiative, is an example of the latter type of effort. Over the past 20 years, OMERACT has developed core sets of outcome measures for use in rheumatoid arthritis, osteoarthritis, psoriatic arthritis, fibromyalgia, and other rheumatic disease research through a well-documented, repeatable process that has served as a model for other efforts.²⁷ The International Consortium for Health Outcomes Measurement (ICHOM) also develops standard sets of outcome measures in different clinical areas, with the goal of improving healthcare quality and patient outcomes.²⁸ Finally, some efforts have focused on improving the methodology used to develop and report on consensus-based standards²⁹'³⁰ or increasing access to standards that have already been developed.³¹ A full review of existing efforts is beyond the scope of this chapter; more information can be found in a 2014 literature review on this topic published by AHRQ,² the COMET Initiative database,³¹ and the NIH CDE Repository.³²

The use of established standardized outcome measures or other data standards, when available, is essential so that registries can maximally contribute to evolving medical knowledge. Standard terminologies—and to a greater degree, higher level groupings into core datasets for specific conditions—not only improve efficiency in establishing registries but also promote more effective sharing, combining, or linking of datasets from different sources. Furthermore, the use of well-defined standards for data elements and data structure ensures that the meaning of information captured in different systems is the same. This is critical for “semantic” interoperability between information systems and to maximize the value of registries as tools in learning health systems and a national research infrastructure.

Yet, despite many efforts, many new registries do not use existing standardized measures or data elements. Researchers may not be aware of existing standards, may disagree with the standards or wish to measure different outcomes, or may be uncertain about the quality or value of using the existing standards.¹⁷ A 2016 report from The Pew Charitable Trusts examined barriers to use of existing data standards in patient registries and found that registry stewards frequently have not participated in the development of data standards, resulting in standards that may not meet the needs of registries and their stakeholders.³³

5.2 OMF Standardized Measures

Recently, AHRQ supported an effort to develop minimum sets of harmonized outcome measures in five condition areas using the OMF as a conceptual model.³⁴ These minimum measure sets contain outcome measures that are feasible to capture in registries and routine clinical practice and that are important to providers, patients, payers, and other stakeholders. In addition to narrative definitions, the outcome measures were mapped to standardized terminologies to facilitate consistent collection and implementation within electronic health records and other systems.

For this project, standardized outcome measures were developed for the five condition areas using a reproducible process involving registry sponsors and other stakeholders, such as clinicians and representatives from patient advocacy organizations, payers, funding agencies, regulatory bodies, and research organizations. The five condition areas – atrial fibrillation, asthma, depression, non-small cell lung cancer, and lumbar spondylolisthesis – were selected to represent different types of conditions (chronic, acute, mental health), treatment modalities, care providers and care settings, and patient populations. Within each condition area, workgroups made up of registry sponsors and other stakeholders produced a minimum set of standardized measures that could be captured in future registries as well as in clinical practice in the condition area of interest; workgroups also identified characteristics of the patient, disease, and provider that are necessary to support appropriate risk adjustment for the measures included in the minimum set. Measure sets for atrial fibrillation and asthma³⁵ have been published, and publications describing the other measure sets are forthcoming.

6. Conclusions

The selection of outcome measures is a critical step in designing a patient registry. When selecting and defining outcome measures, consideration should be given to the outcome’s relevance to patients, providers, and other key stakeholders; whether it can be collected accurately and consistently across participating registry sites; and whether it is feasible to capture within the registry scope and budget. The OMF offers a useful model for selecting and defining outcome measures within registries. In addition, the use of standardized outcome measures is encouraged whenever feasible to facilitate consistency in data collection and comparability of results across registries and other efforts in learning health systems.

References for Chapter 4

1. Gliklich RE, Leavy MB, Karl J, et al. A framework for creating standardized outcome measures for patient registries. J Comp Eff Res. 2014;3(5):473-80. PMID: 25350799. DOI: 10.2217/cer.14.38.	2. L&M Policy Research, LLC, Quintiles Outcome. Registry of Patient Registries Outcome Measures Framework: Literature Review Findings and Implications. OMF Literature Review Report. (Prepared under Contract No. 290-2014-00004-C.) AHRQ Publication No. 16-EHC036-EF. Rockville, MD: Agency for Healthcare Research and Quality; September 2016. www.effectivehealthcare.ahrq.gov/reports/final/cfm.
3. Gliklich RE BK, Eisenberg F, Hanna J, Leavy MB, Campion D, Christian JB. Registry of Patient Registries Outcome Measures Framework: Information Model Report. Methods Research Report. (Prepared by L&M Policy Research, LLC, under Contract No. 290-2014-00004-C.) AHRQ Publication No. 17(18)-EHC012-EF. Rockville, MD: Agency for Healthcare Research and Quality; February 2018. www.effectivehealthcare.ahrq.gov/reports/final/cfm. DOI: https://doi.org/10.23970/AHRQROPRMETHODS.	4. Calkins H, Gliklich RE, Leavy MB, et al. Harmonized outcome measures for use in atrial fibrillation patient registries and clinical practice: Endorsed by the Heart Rhythm Society Board of Trustees. Heart Rhythm. 2019;16(1):e3-e16. PMID: 30449519. DOI: 10.1016/j.hrthm.2018.09.021.
5. U.S. Food and Drug Administration. Guidance for Industry. Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics. December 2018. https://www.fda.gov/media/71195/download. Accessed June 10, 2019.	6. Gutman SI, Piper M, Grant MD, et al. Progression-Free Survival: What Does It Mean for Psychological Well-Being or Quality of Life? Methods Research Report. (Prepared by the Blue Cross and Blue Shield Association Technology Evaluation Center Evidence-based Practice Center under Contract No. 290-2007-10058-I.) AHRQ Publication No. 13-EHC074-EF. Rockville, MD: Agency for Healthcare Research and Quality. April 2013. www.effectivehealthcare.ahrq.gov/reports/final.cfm.
7. U.S. Food and Drug Administration. Multiple Endpoints in Clinical Trials Guidance for Industry. DRAFT Guidance for Industry. January 2017. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/multiple-endpoints-clinical-trials-guidance-industry. Accessed June 10, 2019.	8. Inohara T, Shrader P, Pieper K, et al. Association of Atrial Fibrillation Clinical Phenotypes With Treatment Patterns and Outcomes: A Multicenter Registry Study. JAMA Cardiol. 2018;3(1):54-63. PMID: 29128866. DOI: 10.1001/jamacardio.2017.4665.
9. Salaffi F, Cimmino MA, Leardini G, et al. Disease activity assessment of rheumatoid arthritis in daily practice: validity, internal consistency, reliability and congruency of the Disease Activity Score including 28 joints (DAS28) compared with the Clinical Disease Activity Index (CDAI). Clin Exp Rheumatol. 2009;27(4):552-9. PMID: 19772784.	10. Anderson J, Caplan L, Yazdany J, et al. Rheumatoid arthritis disease activity measures: American College of Rheumatology recommendations for use in clinical practice. Arthritis Care Res (Hoboken). 2012;64(5):640-7. PMID: 22473918. DOI: 10.1002/acr.21649.
11. Cella D, Yount S, Rothrock N, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Med Care. 2007;45(5 Suppl 1):S3-S11. PMID: 17443116. DOI: 10.1097/01.mlr.0000258615.42478.55.	12. Flynn KE, Dombeck CB, DeWitt EM, et al. Using item banks to construct measures of patient reported outcomes in clinical trials: investigator perceptions. Clin Trials. 2008;5(6):575-86. PMID: 19029206. DOI: 10.1177/1740774508098414.
13. Cella D, Riley W, Stone A, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005-2008. J Clin Epidemiol. 2010;63(11):1179-94. PMID: 20685078. DOI: 10.1016/j.jclinepi.2010.04.011.	14. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539-49. PMID: 20169472. DOI: 10.1007/s11136-010-9606-8.
15. U.S. Food and Drug Administration. Guidance for Industry: Patient Reported Outcome Measures: Use in Medical Product Development and Labeling Claims. December 2009. https://www.fda.gov/media/77832/download. Accessed June 5, 2019.	16. Fontanarosa J TJ, Samson DJ, VanderBeek BL, Schoelles, K. Retinal Prostheses in the Medicare Population. AHRQ Project ID: RPST0515. Rockville, MD: Agency for Healthcare Research and Quality; 2016.
17. Tunis SR, Clarke M, Gorst SL, et al. Improving the relevance and consistency of outcomes in comparative effectiveness research. J Comp Eff Res. 2016;5(2):193-205. PMID: 26930385. DOI: 10.2217/cer-2015-0007.	18. Curtis JR, Jain A, Askling J, et al. A comparison of patient characteristics and outcomes in selected European and U.S. rheumatoid arthritis registries. Semin Arthritis Rheum. 2010;40(1):2-14 e1. PMID: 20674669. DOI: 10.1016/j.semarthrit.2010.03.003.
19. Kirkham JJ, Clarke M, Williamson PR. A methodological approach for assessing the uptake of core outcome sets using ClinicalTrials.gov: findings from a review of randomised controlled trials of rheumatoid arthritis. BMJ. 2017;357:j2262. PMID: 28515234. DOI: 10.1136/bmj.j2262.	20. Maddox TM, Albert NM, Borden WB, et al. The Learning Healthcare System and Cardiovascular Care: A Scientific Statement From the American Heart Association. Circulation. 2017;135(14):e826-e57. PMID: 28254835. DOI: 10.1161/CIR.0000000000000480.
21. Kwok CS, Rao SV, Myint PK, et al. Major bleeding after percutaneous coronary intervention and risk of subsequent mortality: a systematic review and meta-analysis. Open Heart. 2014;1(1):e000021. PMID: 25332786. DOI: 10.1136/openhrt-2013-000021.	22. Steinhubl SR, Kastrati A, Berger PB. Variation in the definitions of bleeding in clinical trials of patients with acute coronary syndromes and undergoing percutaneous coronary interventions and its impact on the apparent safety of antithrombotic drugs. Am Heart J. 2007;154(1):3-11. PMID: 17584547. DOI: 10.1016/j.ahj.2007.04.009.
23. Rubinstein YR, McInnes P. NIH/NCATS/GRDR® Common Data Elements: A leading force for standardized data collection. Contemp Clin Trials. 2015;42:78-80. PMID: 25797358. DOI: 10.1016/j.cct.2015.03.003.	24. Sheehan J, Hirschfeld S, Foster E, et al. Improving the value of clinical research through the use of Common Data Elements. Clin Trials. 2016;13(6):671-6. PMID: 27311638. DOI: 10.1177/1740774516653238.
25. Registry Data Standards. Duke Clinical Research Institute. https://dcri.org/registry-data-standards/. Accessed June 10, 2019.	26. Thygesen K, Alpert JS, Jaffe AS, et al. Third universal definition of myocardial infarction. Journal of the American College of Cardiology. 2012;60(16):1581-98. PMID: 22958960. DOI: 10.1016/j.jacc.2012.08.001.
27. OMERACT. Outcome Measures in Rheumatology. https://omeract.org/. Accessed June 10, 2019.	28. International Consortium for Health Outcomes Measurement (ICHOM). https://www.ichom.org/. Accessed June 10, 2019.
29. Kirkham JJ, Davis K, Altman DG, et al. Core Outcome Set-STAndards for Development: The COS-STAD recommendations. PLoS Med. 2017;14(11):e1002447. PMID: 29145404. DOI: 10.1371/journal.pmed.1002447.	30. Kirkham JJ, Gorst S, Altman DG, et al. Core Outcome Set-STAndards for Reporting: The COS-STAR Statement. PLoS Med. 2016;13(10):e1002148. PMID: 27755541. DOI: 10.1371/journal.pmed.1002148.
31. The COMET (Core Outcome Measures in Effectiveness Trials) Initiative. http://www.comet-initiative.org/. Accessed June 4, 2019.	32. NIH Common Data Elements (CDE) Repository. National Library of Medicine. National Institutes of Health. https://cde.nlm.nih.gov/. Accessed June 4, 2019.
33. Next Steps to Encourage Adoption of Data Standards for Clinical Registries. The Pew Charitable Trusts. November 2016. https://www.pewtrusts.org/en/research-and-analysis/fact-sheets/2016/11/next-steps-to-encourage-adoption-of-data-standards-for-clinical-registries. Accessed June 10, 2019.	34. Agency for Healthcare Research and Quality. Outcome Measures Framework. https://effectivehealthcare.ahrq.gov/topics/registry-of-patient-registries/outcome-measures-framework. Accessed June 10, 2019.
35. Gliklich RE, Castro M, Leavy MB, et al. Harmonized outcome measures for use in asthma patient registries and clinical practice. J Allergy Clin Immunol. 2019 Sep;144(3):671-681.e1. PMID: 30857981. DOI: 10.1016/j.jaci.2019.02.025.

Case Examples for Chapter 4

Case Example 9. Developing and validating a patient-administered questionnaire

Description	The Benign Prostatic Hypertrophy (BPH) Registry and Patient Survey was a multicenter, prospective, observational registry examining the patient management practices of primary care providers and urologists, and assessing patient outcomes, including symptom amelioration and disease progress. The registry collected patient-reported and clinician-reported data at multiple clinical visits.
Sponsor	sanofi-aventis
Year Started	2004
Year Ended	2007
No. of Sites	403
No. of Patients	6,928

Challenge

Lower urinary tract symptoms associated with benign prostatic hyperplasia (LUTS/BPH) have a strong relationship to sexual dysfunction in aging males. Sexual dysfunction includes both erectile dysfunction (ED) and ejaculatory dysfunction (EjD), and healthcare providers treating patients with symptoms of BPH should evaluate men for both types of dysfunction. Providers can use the Male Sexual Health Questionnaire (MSHQ), a validated, self-administered, sexual function scale, to assess dysfunction, but the 25-item scale can be perceived as too long. To assess EjD more efficiently, it was necessary to develop a brief, patient-administered, validated questionnaire.

Proposed Solution

The team used representative, population-based samples to develop a short-form scale for assessing EjD. The team administered the 25-item MSHQ to three populations: a sample of men from the Men’s Sexual Health Population Survey, a subsample of men from the Urban Men’s Health Study, and a sample of men enrolled in the observational registry.

Using the data from the sample populations, the team conducted a series of analyses to develop the scale. The team used factor analysis to help select the items from the scale that had the highest correlations with the principal factors. Using conventional validation, the team examined reliability (both internal consistency and test-retest repeatability). To assess validity, tests of repeatability and discriminant/convergent validity were used to determine that the short form successfully discriminated between men with no to mild LUTS/BPH and those with moderate to severe LUTS/BPH. Lastly, the team examined the correlation between the 7-item ejaculation domain of the 25-item MSHQ and the new short-form scale, using data from the observational registry.

Results

Based on the results of these analyses, the team selected three ejaculatory function items and one ejaculation bother item for inclusion in the new MSHQ-EjD Short Form. The new scale demonstrates a high degree of internal consistency and reliability, and it provides information to identify men with no to mild LUTS/BPH and those with moderate to severe LUTS/BPH.

Key Point

Developing new instruments for collecting patient-reported outcomes requires careful testing of the new tool in representative populations to ensure validity and reliability. Registries can provide a large sample population for validating new instruments.

For More Information

Rosen RC, Catania JA, Althof SE, et al. Development and validation of four-item version of Male Sexual Health Questionnaire to assess ejaculatory dysfunction. Urology. 2007;69(5):805–9. PMID: 17482908. DOI: 10.1016/j.urology.2007.02.036.

Rosen R, Altwein J, Boyle P, et al. Lower urinary tract symptoms and male sexual dysfunction: the Multinational Survey of the Aging Male. Eur Urol. 2003;44:637–49. PMID: 14644114.

Case Example 10. Using validated measures to collect patient-reported outcomes

Description	The Study to Help Improve Early evaluation and management of risk factors Leading to Diabetes (SHIELD) is a household panel registry designed to assess the prevalence and incidence of diabetes mellitus and cardiovascular disease; disease burden and progression; risk predictors; and knowledge, attitudes, and behaviors regarding health in the U.S. population. The study involves three distinct phases: an initial screening survey, a baseline survey, and yearly followup surveys for 5 years.
Sponsor	AstraZeneca Pharmaceuticals LP
Year Started	2004
Year Ended	2009
No. of Sites	Not applicable
No. of Patients	More than 211,000 individuals were included in the screening survey; approximately 15,000 individuals were followed for 5 years.

Challenge

The SHIELD registry used survey methodologies to collect health information from a large sample of adults. The goal of the study was to capture participants’ perspectives and views on diabetes and cardiovascular disease, risk factors for the diseases, and burden of the diseases. The study investigators, noting that treatment for diabetes and cardiovascular disease relies heavily on patient self-management, felt that it was particularly important to gather information on activities, weight control, health attitudes, quality of life, and other topics directly from the participant, without a physician as an intermediary. The investigators also wanted to follow participants over time to better understand disease progression and changes in health behaviors or activities.

To achieve the study goals, the registry needed to collect health-related data directly from participants in such a way that the data would be reliable, valid, and comparable across participant groups and over time.

Proposed Solution

The investigators decided to use validated patient-reported outcomes measures (PROs) to collect information on health status and behaviors. The PROs allowed the data from the registry to be compared with data collected in other registries to assess the generalizability of data on the study population. In addition, the PROs already took into account issues such as recall bias and interpretability of the questions, and self-administered instruments eliminated the possibility of introducing interviewer bias.

The registry included seven PROs: (1) the 12-item Short Form Health Survey (SF-12) and European Quality of Life (EuroQoL) EQ-5D instrument, to assess health-related quality of life; (2) the Sheehan Disability Scale, to assess the level of disruption in work, social life, and family/home life; (3) the 9-item Patient Health Questionnaire, to assess depression; (4) the Work Productivity and Activity Impairment Questionnaire: General Health, to assess work productivity and absenteeism; (5) the Diet and Health Knowledge Survey; (6) the Press-Ganey Satisfaction questionnaire; and (7) the International Physical Activity Questionnaire, to assess health-related physical activity and sedentary behaviors.

The investigators considered many factors, such as length, ease of use, format, and scoring system, when selecting the PROs to include in the survey. For example, a major reason for selecting the SF-12 rather than the SF-36 as a measure of quality of life was the length of the forms (12 vs. 36 items). The survey was entirely paper-based, with participants mailing back completed forms. The validated scoring algorithms were used to account for missing or illegible values on the completed forms. All participants were able to read and write English.

Results

The registry had a generally high response rate for the surveys. The response rates were 63.7 percent for the screening survey, 71.8 percent for the baseline survey, and between 71 and 75 percent for the annual surveys. In terms of missing data, participants who returned the survey forms tended to complete all of the questions in the appropriate manner. However, the registry was missing longitudinal data from some participants. For example, a participant may have returned the completed form in 2005, failed to return the form in 2006, and returned the form again in 2007. The investigators must account for the missing 2006 values when conducting longitudinal analyses. The data from the survey were sufficient to support comparisons over time and across participant groups, leading to several publications.

Key Point

Utilization of standardized, validated instruments in a registry can offer many benefits, including enhanced scientific rigor, the ability to compare patient views over time, and the ability to compare registry data with data from other sources to assess the representativeness of the registry population. It should be noted that significant initial planning is necessary to identify appropriate PROs, obtain the necessary permissions, and include them in a registry. Issues with missing data must be considered in the planning phases for a registry. This registry considered missing data within returned survey questionnaires. In addition, an acceptable followup rate should be stated a priori so that response rates can be better interpreted with respect to their potential for introducing bias.

For More Information

Gavin JR III, Rodbard HW, Fox KM. Association of overweight and obesity with health status, weight management, and exercise behaviors among individuals with type 2 diabetes mellitus or with cardiometabolic risk factors. Risk Management and Healthcare Policy. 2009;2:1–7. PMID: 22312203. DOI: 10.2147/RMHP.S4562.

Grandy S, Chapman RH, Fox KM, et al. Quality of life and depression of people living with type 2 diabetes mellitus and those at low and high risk for type 2 diabetes: findings from the Study to Help Improve Early evaluation and management of risk factors Leading to Diabetes (SHIELD). Int J Clin Pract. 2008;62:562–8. PMID: 18266708. DOI: 10.1111/j.1742-1241.2008.01703.x

Grandy S, Fox KM. EQ-5D visual analog scale and utility index values in individuals with diabetes and at risk for diabetes: findings from the Study to Help Improve Early evaluation and management of risk factors Leading to Diabetes (SHIELD). Health Qual Life Outcomes. 2008;6:18. PMID: 18304340. DOI: 10.1186/1477-7525-6-18.

Grandy S, Fox KM, Bazata DD, et al. Association of self-reported weight change and quality of life, and exercise and weight management behaviors among adults with type 2 diabetes mellitus. Cardiol Res Pract. 2012;2012:892564. PMID: 22645696. DOI: 10.1155/2012/892564.

Rodbard HW, Bays HE, Gavin JR III, et al. Rate and risk predictors for development of self-reported type 2 diabetes mellitus over a 5-year period: the SHIELD study. Int J Clin Pract. 2012;66:684–691. PMID: 22698420. DOI: 10.1111/j.1742-1241.2012.02952.x.