By Sarah C. Baldwin
Whether comparing patient outcomes from different treatment approaches or identifying the consequences of broader health policies, researchers must grapple not only with vast amounts of data but also with a multitude of variables that might have a bearing on their question of interest. At Heller, machine learning is being harnessed to address that complexity.
In 2024, the National Institutes of Health (NIH) funded a one-year extension to the Substance Use and Psychological Injury Combat study, or SUPIC, a multiyear project based at Heller’s Institute for Behavioral Health (IBH). Led by principal investigators Mary Jo Larson, PhD’92, senior scientist at IBH, and Rachel Sayko Adams, PhD’13, formerly senior scientist at IBH, SUPIC studies pain management treatments and health outcomes among military service members.
Consolidating data
The extension project uses machine learning to summarize enormous amounts of county-level data on social determinants of health (SDOH) — race, economic status, education, access to health care and environmental pollution — into subgroups, or clusters of counties. Those clusters can then be applied to other data the team is looking at, resulting in much more nuanced findings about, for example, how the impact of an intervention might vary in communities with different patterns of SDOH characteristics.
William Crown, formerly a distinguished senior scientist at Heller, who conceived the project, worked with Nick Huntington, an IBH-based research scientist who specializes in statistical methods. Together, they used machine-learning methods to consolidate data from the more than 3,000 counties in the U.S. and over 600 variables into cohesive clusters, making it possible to control more quickly and easily for the differences among where subjects in a particular study live.
Controlling for other factors “strengthens causal inference,” Huntington says, referring to researchers’ ability to claim that change in an outcome (e.g., health status) is actually caused by change in the variable under study (e.g., a treatment), rather than being the reflection of some other factor.
In addition to providing the military with actionable information about which treatments work better for whom, the team contributed their dataset to the Inter-university Consortium for Political and Social Research, or ICPSR, a publicly available data archive of research in the social and behavioral sciences. The team also published a paper in Therapeutic Advances in Drug Safety that Crown says is like a tutorial for researchers for creating these clusters: “This work can benefit anyone who is interested in the effectiveness of a particular drug or medical intervention.”Helping to find causation
Rajan Sonik, PhD’17, associate research professor at Heller, is leading three NIH-funded studies that explore how social welfare policies affect health care outcomes. In particular, Sonik studies the effects of changes to the Supplemental Nutrition Assistance Program (SNAP), the nation’s largest food-benefit program for low-income families.
To do so, Sonik must confront the interconnectedness of the myriad policies affecting this population at the same time. “When you’re isolating the effect of any one policy decision, accounting for concurrent policy changes is a critical challenge,” he explains. He and Crown are using machine-learning tools to identify clusters within a database of pandemic-era policies that his team is creating.
Project findings recently suggested a possible relationship between a loss of SNAP benefits and a loss of health insurance, independent of income changes. “This implies connections between benefit programs beyond overlaps in eligibility rules,” Sonik says. “So if you’re cutting multiple supports at once, like SNAP and Medicaid, effects may be even more far-reaching than intended.”
Uncovering the unintended consequences of any policy decision, he adds, “should be valuable to policymakers as they think about the full gamut of potential effects of their choices.”