How will AI be used in health care settings?
Artificial intelligence (AI) shows tremendous promise for applications in health care. Tools such as machine learning algorithms, artificial neural networks, and generative AI (e.g., Large Language Models) have the potential to aid with tasks such as diagnosis, treatment planning, and resource management. Advocates have suggested that these tools could benefit large numbers of people by increasing access to health care services (especially for populations that are currently underserved), reducing costs, and improving quality of care.
This enthusiasm has driven the burgeoning development and trial application of AI in health care by some of the largest players in the tech industry. To give just two examples, Google Research has been rapidly testing and improving upon its “Med-PaLM” tool, and NVIDIA recently announced a partnership with Hippocratic AI that aims to deploy virtual health care assistants for a variety of tasks to address a current shortfall in the supply in the workforce.
What are some challenges or potential negative consequences to using AI in health care?
Technology adoption can happen rapidly, exponentially going from prototypes used by a small number of researchers to products affecting the lives of millions or even billions of people. Given the significant impact health care system changes could have on Americans’ health as well as on the U.S. economy, it is essential to preemptively identify potential pitfalls before scaleup takes place and carefully consider policy actions that can address them.
One area of concern arises from the recognition that the ultimate impact of AI on health outcomes will be shaped not only by the sophistication of the technological tools themselves but also by external “human factors.” Broadly speaking, human factors could blunt the positive impacts of AI tools in health care—or even introduce unintended, negative consequences—in two ways:
- If developers train AI tools with data that don’t sufficiently mirror diversity in the populations in which they will be deployed. Even tools that are effective in the aggregate could create disparate outcomes. For example, if the datasets used to train AI have gaps, they can cause AI to provide responses that are lower quality for some users and situations. This might lead to the tool systematically providing less accurate recommendations for some groups of users or experiencing “catastrophic failures” more frequently for some groups, such as failure to identify symptoms in time for effective treatment or even recommending courses of treatment that could result in harm.
- If patterns of AI use systematically differ across groups. There may be an initial skepticism among many potential users to trust AI for consequential decisions that affect their health. Attitudes may differ within the population based on attributes such as age and familiarity with technology, which could affect who uses AI tools, understands and interprets the AI’s output, and adheres to treatment recommendations. Further, people’s impressions of AI health care tools will be shaped over time based on their own experiences and what they learn from others.
In recent research, we used simulation modeling to study a large range of different of hypothetical populations of users and AI health care tool specifications. We found that social conditions such as initial attitudes toward AI tools within a population and how people change their attitudes over time can potentially:
- Lead to a modestly accurate AI tool having a negative impact on population health. This can occur because people’s experiences with an AI tool may be filtered through their expectations and then shared with others. For example, if an AI tool’s capabilities are objectively positive—in expectation, the AI won’t give recommendations that are harmful or completely ineffective—but sufficiently lower than expectations, users who are disappointed will lose trust in the tool. This could make them less likely to seek future treatment or adhere to recommendations if they do and lead them to pass along negative perceptions of the tool to friends, family, and others with whom they interact.
- Create health disparities even after the introduction of a high-performing and unbiased AI tool (i.e., that performs equally well for all users). Specifically, when there are initial differences between groups within the population in their trust of AI-based health care—for example because of one group’s systematically negative previous experiences with health care or due to the AI tool being poorly communicated to one group—differential use patterns alone can translate into meaningful differences in health patterns across groups. These use patterns can also exacerbate differential effects on health across groups when AI training deficiencies cause a tool to provide better quality recommendations for some users than others.
Barriers to positive health impacts associated with systematic and shifting use patterns are largely beyond individual developers’ direct control but can be overcome with strategically designed policies and practices.
What could a regulatory framework for AI in health care look like?
Disregarding how human factors intersect with AI-powered health care tools can create outcomes that are costly in terms of life, health, and resources. There is also the potential that without careful oversight and forethought, AI tools can maintain or exacerbate existing health disparities or even introduce new ones. Guarding against negative consequences will require specific policies and ongoing, coordinated action that goes beyond the usual scope of individual product development. Based on our research, we suggest that any regulatory framework for AI in health care should accomplish three aims:
- Ensure that AI tools are rigorously tested before they are made fully available to the public and are subject to regular scrutiny afterward. Those developing AI tools for use in health care should carefully consider whether the training data are matched to the tasks that the tools will perform and representative of the full population of eventual users. Characteristics of users to consider include (but are certainly not limited to) age, gender, culture, ethnicity, socioeconomic status, education, and language fluency. Policies should encourage and support developers in investing time and resources into pre- and post-launch assessments, including:
- pilot tests to assess performance across a wide variety of groups that might experience disparate impact before large-scale application
- monitoring whether and to what extent disparate use patterns and outcomes are observed after release
- identifying appropriate corrective action if issues are found.
- Require that users be clearly informed about what tools can do and what they cannot. Neither health care workers nor patients are likely to have extensive training or sophisticated understanding of the technical underpinnings of AI tools. It will be essential that plain-language use instructions, cautionary warnings, or other features designed to inform appropriate application boundaries are built into tools. Without these features, users’ expectations of AI capabilities might be inaccurate, with negative effects on health outcomes. For example, a recent report outlines how overreliance on AI tools by inexperienced mushroom foragers has led to cases of poisoning; it is easy to imagine how this might be a harbinger of patients misdiagnosing themselves with health care tools that are made publicly available and missing critical treatment or advocating for treatment that is contraindicated. Similarly, tools used by health care professionals should be supported by rigorous use protocols. Although advanced tools will likely provide accurate guidance an overwhelming majority of the time, they can also experience catastrophic failures (such as those referred to as “hallucinations” in the AI field), so it is critical for trained human users to be in the loop when making key decisions.
- Proactively protect against medical misinformation. False or misleading claims about health and health care—whether the result of ignorance or malicious intent—have proliferated in digital spaces and become harder for the average person to distinguish from reliable information. This type of misinformation about health care AI tools presents a serious threat, potentially leading to mistrust or misapplication of these tools. To discourage misinformation, guardrails should be put in place to ensure consistent transparency about what data are used and how that continuous verification of training data accuracy takes place.
How can regulation of AI in health care keep pace with rapidly changing conditions?
In addition to developers of tools themselves, there are important opportunities for unaffiliated researchers to study the impact of AI health care tools as they are introduced and recommend adjustments to any regulatory framework. Two examples of what this work might contribute are:
- Social scientists can learn more about how people think about and engage with AI tools, as well as how perceptions and behaviors change over time. Rigorous data collection and qualitative and quantitative analyses can shed light on these questions, improving understanding of how individuals, communities, and society adapt to shifts in the health care landscape.
- Systems scientists can consider the co-evolution of AI tools and human behavior over time. Building on or tangential to recent research, systems science can be used to explore the complex interactions that determine how multiple health care AI tools deployed across diverse settings might affect long-term health trends. Using longitudinal data collected as AI tools come into widespread use, prospective simulation models can provide timely guidance on how policies might need to be course corrected.
-
Acknowledgements and disclosures
The authors gratefully acknowledge helpful contributions of Carol Graham, Bobby Innes-Gold, and Chris Miller.
The Brookings Institution is financed through the support of a diverse array of foundations, corporations, governments, individuals, as well as an endowment. A list of donors can be found in our annual reports published online here. The findings, interpretations, and conclusions in this report are solely those of its author(s) and are not influenced by any donation.
The Brookings Institution is committed to quality, independence, and impact.
We are supported by a diverse array of funders. In line with our values and policies, each Brookings publication represents the sole views of its author(s).
Commentary
Why and how should we regulate the use of AI in health care?
September 5, 2024
Key Takeaways: