Causal models can extend what we learn from trials

by Adi Rotem

Jun 10, 2026

8 Min Read

Blog Post

White Papers

Atezolizumab Trial Simulator

Atezolizumab Trial Simulator: Treated vs. Simulated Control

Cohort

Immune-phenotype proportions

Set two of the three; the third is computed so they sum to 1.

Desert (immune cold) 0.30

Excluded (T cells in stroma) 0.40

Inflamed = 0.30 (= 1 − desert − excluded)

Tumour mutation burden

scale_M (median, mut/MB) 5.0

Implied mean TMB ≈ 6.9 mut/MB

n = 1000 · 50/50 randomised T=0/T=1 · seed 0 · all other params fixed at IMvigor210-calibrated defaults

Control response

—

— responders / —

Treated response

—

— responders / —

Δ Response

—

treated − control

Model: causal DAG calibrated to IMvigor210 (Mariathasan et al., Nature 2018; Balar et al., Lancet 2017).

Oncology Phase 2 studies are often small, split across biomarker-defined subgroups, and frequently single-arm, leaving a fragmented view of disease biology and treatment response. To make sense of that fragmentation, we need a way to connect patient measurements, biological mechanisms, treatment, and outcomes. Causal models provide a framework for connecting data to biological mechanisms and exploring why patients respond differently, how outcomes vary across cohorts, and what might happen under alternative biological or treatment conditions. That missing layer can fundamentally change how the results are interpreted.

Why a model, and why causal

To make this concrete, we focus on a case study of IMvigor210, a real single-arm Phase 2 trial of atezolizumab in metastatic urothelial carcinoma, an advanced form of bladder cancer. Atezolizumab is an immunotherapy that helps T cells attack tumors by blocking PD-L1, a signal tumors can use to dampen the immune response.

All patients in the trial received atezolizumab, so the study does not directly show what would have happened without treatment. What makes it useful for this example is that it collected rich pre-treatment biomarker data, giving us a detailed snapshot of each patient’s tumor biology before therapy. We use those measurements to calibrate the causal model.

To estimate a treatment effect we would ideally want to compare each patient's outcome under the drug to what would have happened to them without it. The single-arm structure gives us the first; it gives us nothing about the second.

A standard predictive model trained on the trial would not close that gap. A regression of response on baseline biomarkers learns correlations between features and the observed (treated) outcome — it has nothing to say about what those same patients would have looked like untreated, because no untreated patients appear in its training data. A causal model is different: instead of leveraging associations for predictions, it encodes hypotheses about the mechanism that generates the data — which variable depends causally on which. Once the mechanism is laid out, the model can be interrogated under interventions it never observed. For example, one may ask how the same biological system behaves under alternative treatment conditions, including the hypothetical no treatment scenario. Such analyses do not reveal the true counterfactual, but they provide a mechanistically grounded prediction conditional on the assumptions encoded in the model.

Causal models are powerful, but they come with an important caveat. A causal graph alone does not determine a credible counterfactual. The difficulty lies in specifying the quantitative relationships along each edge, validating that the resulting model generates biologically plausible patient profiles and reproduces key patterns in the observed data, and identifying which biological processes are directly affected by treatment. When simulating a counterfactual scenario, for example a control arm, it is not enough to switch a treatment indicator from on to off; one must also account for how treatment changes the parameters governing tumour growth, immune activity, or other downstream processes. Those assumptions must be justified biologically and tested against available evidence.

In the worked example below, we use the model to ask what outcomes might have looked like for the same trial population under a hypothetical no-treatment scenario. The example is intended as an illustration of the process rather than a fully validated virtual control arm. Its purpose is to show how biological knowledge can be encoded into a causal model, calibrated against patient-level data, and then used to explore counterfactual scenarios. In practice, the credibility of any estimated control arm depends on careful model validation and scrutiny of the treatment-dependent assumptions that drive the predictions.

A Probabilistic Graph: How Biology is Encoded in the Model

Building the model starts before any equations are written. The first step is a focused literature review around the specific causal question: for this drug, in this disease setting, which biological processes are most likely to determine response? The goal is not to include every known feature of tumour immunology. A useful model should be detailed enough to capture the mechanisms that matter for the question, but simple enough that the main assumptions remain visible.

At the first pass we concentrate on building a scaffold that includes the treatment, treatment target and biological outcomes, and we add details and nodes as needed. In the example below, the treatment is atezolizumab, an anti-PD-L1 checkpoint inhibitor; its target is PD-L1 on the tumour cell surface; the outcome we model is tumour growth C after treatment.

After identifying the treatment, treatment target and biological outcome, we consider which biological entities are mediators leading from the treatment to the biological outcome, and add them as nodes. Each candidate node is chosen deliberately. Some are included because they are plausible drivers of response; others are included because they are measured in the trial and help anchor the model to data. In our example, the mechanism connecting the treatment to the outcome is the PD-1 / PD-L1 brake: CD8 T-cells can kill tumour cells, but tumours often express PD-L1, which binds PD-1 on T-cells and switches off their killing function. Atezolizumab blocks that interaction and restores T-cell function — but only when a T-cell is present and active.

But the same drug against the same target produces very different outcomes patient to patient, depending on the pre-treatment immune state of the tumour. To capture that patient-to-patient variability, the model includes the pre-treatment biomarkers from Mariathasan et al. (Nature 2018) that determines whether PD-L1 blockade reduces tumor growth. CD8 T-effector activity measures whether killer T-cells are present and active to begin with — the drug has nothing to unleash if they're absent. Tumour mutation burden counts how many neoantigens those T-cells might recognise as foreign — more mutations means a more visible tumour. TGF-β signalling can physically exclude T-cells from the tumour by building a stromal barrier — the brake-release doesn't help if T-cells can't reach the cancer cells. PD-L1 expression measures how much brake there is to release in the first place. Mariathasan also groups patients into three immune phenotypes that summarise the spatial pattern of T-cell infiltration: inflamed tumours have CD8 T-cells already inside the tumour, ready to attack, and respond best to checkpoint blockade. Excluded tumours have T-cells trapped outside the tumour by TGF-β-driven stromal barriers, so brake-release doesn't help much. Desert tumours have no T-cell infiltration at all, leaving the drug with nothing to act on.

The mechanism leads us to add two upstream inputs — immune phenotype and tumour mutation burden — which drive three observable pre-treatment markers PD-L1 expression, CD8 score, TGF-β level. These markers then combine with the treatment indicator T to determine post-treatment tumour growth C. The parameters of the model, such as the strength and direction of each effect, is calibrated against the patient-level biomarker data from IMvigor210 Cohort 2.

When selecting the nodes, each measurement is scrutinized. A biomarker is not automatically the biological process it is meant to represent. PD-L1 staining, CD8 expression, TGF-β signatures, and tumour mutation burden are all operational measurements, produced by specific assays, cutoffs, sampling procedures, and tissue contexts. For each one, the model builder has to ask: does this measurement directly reflect the biological node we care about, or does the experimental procedure introduce another source of variation that should be represented in the graph? In some cases, the assay can be treated as a reasonable proxy. In others, the measurement process itself may warrant an additional edge or node.

Before using the model to explore counterfactuals, we check that it reproduces the data it was built from. We simulate a cohort the size of the trial population and compare it against the real patients, biomarker by biomarker (figure). The simulated distributions match the observed ones, and each biomarker varies across the three immune phenotypes the way it does in the trial — CD8 activity and PD-L1 rising from desert to inflamed tumours, TGF-β lowest in desert. The simulated cohort is distributed like the real one, confirming the model is faithfully calibrated to the patient-level data before we ask it anything counterfactual.

Simulated cohort — Left: baseline biomarkers from a simulated cohort. Right: the same five biomarkers Top: baseline biomarkers from a simulated cohort. Bottom: the same five biomarkers in the observed IMvigor210 data (Mariathasan et al., Nature 2018), shown for comparison.

Observed — IMvigor210 Cohort 2 — Left: baseline biomarkers from a simulated cohort. Right: the same five biomarkers Top: baseline biomarkers from a simulated cohort. Bottom: the same five biomarkers in the observed IMvigor210 data (Mariathasan et al., Nature 2018), shown for comparison.

With the graph in place, the model can be used to explore alternative interventions. The same simulated patient can be propagated through the graph under different treatment assumptions, biological states, or parameter settings. One possible analysis is to compare outcomes under T = 1 and T = 0, generating a model-based estimate of how treatment changes the predicted outcome for that patient. Other analyses might examine the consequences of changing biomarker levels, modifying specific biological pathways, or testing competing mechanistic hypotheses. The goal is not to replace the experiment, but to make the most informed estimate possible about the missing arm when no untreated comparison was actually observed. The quality of that estimate depends on how well the underlying biology is understood, but it allows the comparison to be grounded in mechanism and available data.

Try It Yourself

The figure embeds the simulator. The default cohort matches the empirical biomarker profile of the trial data; the plot shows one possible counterfactual analysis: the distribution of tumour growth `C` for the simulated treated and control arms side by side. The gap between them — in medians, 50% intervals, and response rates — is the model's estimate of the treatment effect for that cohort.

Move the sliders and both arms re-run. Push the cohort towards immune-desert and the gap narrows; towards inflamed and it widens; raise median tumor burden and the treated arm pulls further away.

The simulated treatment effect is not evidence of efficacy — only an experiment delivers that. What the model offers is the best prediction the available evidence permits, combining patient-level data from past trials with biological mechanisms. This provides more rigorous interpretation than either a model or data can alone. That makes this approach useful for designing follow-up trials, interpreting published response rates, and surfacing the biological assumptions driving the predictions — not for substituting an experiment with a randomized control arm.

The figure above illustrates one possible application. More broadly, causal models bring biological mechanisms and patient-level data together — carrying knowledge across trials, answering questions the trials themselves never directly asked, and pointing to where the next experiment is most likely to pay off.

References

Take the Next Step

Want to Learn More?

For more information including detailed platform application guides, live platform demos, or additional case studies, please contact a member of our team

Take the Next Step

Looking to learn more?

Whether you're looking for comprehensive guides, expert tips, or detailed tutorials, we have exactly what you need to succeed.

Talk to Our Team

Featured Blog Posts

Apr 3, 2026

6 Min Read

Blog Post

What the latest FDA updates mean for the future of Clinical Trial Design

Recent FDA updates are signaling a meaningful shift in how clinical trials can be designed, analyzed, and approved. A recent PhaseV webinar brought together industry leaders to unpack these changes—from Bayesian methods to AI-enabled review processes—and what they mean for sponsors navigating an increasingly complex regulatory landscape.

Apr 8, 2026

6 Min Read

Blog Post

We simulated 60 million trials. Here's what we found.

Adaptive trials are not new, but what’s changing is our ability to quantify their value with precision. By combining rigorous statistical frameworks with large-scale simulation, we can now make more informed decisions before a single patient is enrolled.In an industry where every trial counts, that’s a powerful advantage.And as this analysis shows, when you let trials learn and adapt, the results speak for themselves.

Apr 16, 2026

3 Min Read

Blog Post

Doing Good While Doing Business

In clinical development, we spend a lot of time talking about data, trial design, endpoints, and probability of success. It’s the language of how we make progress. But for many of us, the reason we’re here isn’t abstract. It’s personal.

Featured White Papers

11 Min Read

White Papers

Advancing Oncology Trials with Covariate Adjustment

PhaseV and iOMEDICO combine causal inference and rich oncology RWD to enable advanced covariate adjustment, reducing variability and bias, enhancing trial power, and accelerating accurate treatment effect estimation in oncology clinical trials

2 Min Read

White Papers

Optimizing Clinical Trial Design: A Novel Adaptive Enrichment Design for Heterogeneous Diseases

PhaseV’s novel adaptive enrichment design dynamically identifies responder subgroups during the trial while rigorously controlling type I error and avoiding selection bias, delivering superior efficiency, reduced risk, and faster drug development

8 Min Read

White Papers

Unlocking the Future of Clinical Research: The Transformative Power of Adaptive Trials

Adaptive clinical trials accelerate drug development by enabling data-driven design changes, improving efficiency, patient outcomes, and success rates. PhaseV’s platform makes even the most complex designs accessible and actionable for clinical teams.

Causal models can extend what we learn from trials

Atezolizumab Trial Simulator: Treated vs. Simulated Control

Cohort

Why a model, and why causal

A Probabilistic Graph: How Biology is Encoded in the Model

Try It Yourself

References

Want to Learn More?

Looking to learn more?

Featured Blog Posts

What the latest FDA updates mean for the future of Clinical Trial Design

We simulated 60 million trials. Here's what we found.

Doing Good While Doing Business

Featured White Papers

Advancing Oncology Trials with Covariate Adjustment

Optimizing Clinical Trial Design: A Novel Adaptive Enrichment Design for Heterogeneous Diseases

Unlocking the Future of Clinical Research: The Transformative Power of Adaptive Trials

Frequently Asked Questions