Count data example • outstandR

Introduction

Example analysis

First, let us load necessary packages.

library(boot)      # non-parametric bootstrap in MAIC and ML G-computation
library(copula)    # simulating BC covariates from Gaussian copula
library(rstanarm)  # fit outcome regression, draw outcomes in Bayesian G-computation
library(outstandR)
library(simcovariates)

Data

First, use the gen_data() function available in the simcovariates package.

library(dplyr)
library(MASS)

N <- 200
allocation <- 2/3      # active treatment vs. placebo allocation ratio (2:1)
b_trt <- log(0.17)     # conditional effect of active treatment vs. common comparator
b_X <- -log(0.5)       # conditional effect of each prognostic variable
b_EM <- -log(0.67)     # conditional interaction effect of each effect modifier
meanX_AC <- c(0.45, 0.45)       # mean of normally-distributed covariate in AC trial
meanX_BC <- c(0.6, 0.6)         # mean of each normally-distributed covariate in BC
meanX_EM_AC <- c(0.45, 0.45)    # mean of normally-distributed EM covariate in AC trial
meanX_EM_BC <- c(0.6, 0.6)      # mean of each normally-distributed EM covariate in BC
sdX <- c(0.4, 0.4)     # standard deviation of each covariate (same for AC and BC)
sdX_EM <- c(0.4, 0.4)  # standard deviation of each EM covariate
corX <- 0.2            # covariate correlation coefficient  
b_0 <- -0.6            # baseline intercept coefficient  ##TODO: fixed value

ipd_trial <- gen_data(N, b_trt, b_X, b_EM, b_0,
                      meanX_AC, sdX, 
                      meanX_EM_AC, sdX_EM, 
                      corX, allocation,
                      family = poisson(link = "log"))

ipd_trial$trt <- factor(ipd_trial$trt, labels = c("C", "A"))

Similarly, for the aggregate data but with the additional summarise step.

BC.IPD <- gen_data(N, b_trt, b_X, b_EM, b_0,
                   meanX_BC, sdX, 
                   meanX_EM_BC, sdX_EM, 
                   corX, allocation,
                   family = poisson(link = "log"))

cov.X <- BC.IPD %>%
  summarise(across(starts_with("X"),
                   list(mean = mean, sd = sd),
                   .names = "{fn}.{col}"))

out.C <- dplyr::filter(BC.IPD, trt == 1) %>%
  summarise(y.B.sum = sum(y),
            y.B.bar = mean(y),
            N.B = n())

out.B <- dplyr::filter(BC.IPD, trt == 0) %>%
  summarise(y.C.sum = sum(y),
            y.C.bar = mean(y),
            N.C = n())

ald_trial <- cbind.data.frame(cov.X, out.C, out.B)

# By definition, the true log-OR relative effect in the AC population between AC is `r b_trt`. Between BC is `r b_trt` and between AB is `r b_trt - b_trt`.
# A naive indirect comparison, ignoring the presence of effect modifiers, would calculate the C vs B effect in the AC population as
# `b_trt - b_trt`.

Output statistics

We will implement for MAIC, STC, and G-computation methods to obtain the marginal variance, defined as

and the marginal treatment effect, defined as the log-odds ratio,

where $\bar{C}$ is the compliment of $C$ so e.g. $n_{\bar{C}} = N_C - n_c$ .

Model fitting in R

The outstandR package has been written to be easy to use and essential consists of a single function, outstandR(). This can be used to run all of the different types of model, which we will call strategies. The first two arguments of outstandR() are the individual and aggregate-level data, respectively.

A strategy argument of outstandR takes functions called strategy_*(), where the wildcard * is replaced by the name of the particular method required, e.g. strategy_maic() for MAIC. Each specific example is provided below.

MAIC

Using the individual level data for AC firstly we perform non-parametric bootstrap of the maic.boot function with R = 1000 replicates. This function fits treatment coefficient for the marginal effect for A vs C. The returned value is an object of class boot from the {boot} package. We then calculate the bootstrap mean and variance in the wrapper function maic_boot_stats().

The formula used in this model is

$y = X_3 + X_4 + \beta_t X_1 + \beta_t X_2$

which corresponds to the following R formula object passed as an argument to the strategy function.

lin_form <- as.formula("y ~ X3 + X4 + trt*X1 + trt*X2")

outstandR_maic <-
  outstandR(ipd_trial, ald_trial,
            strategy = strategy_maic(
              formula = lin_form,
              family = poisson(link = "log")))

The returned object is of class outstandR.

outstandR_maic

We see that this is a list object with 3 parts, each containing statistics between each pair of treatments. These are the mean contrasts, variances and confidence intervals (CI), respectively. The default CI is for 95% but can be altered in outstandR with the CI argument.

STC

STC is the conventional outcome regression method. It involves fitting a regression model of outcome on treatment and covariates to the IPD. IPD effect modifiers are centred at the mean BC values.

$g(\mu_n) = \beta_0 + (\boldsymbol{x}_n - \boldsymbol{\theta}) \beta_1 + (\beta_z + (\boldsymbol{x_n^{EM}} - \boldsymbol{\theta^{EM}}) \boldsymbol{\beta_2}) \; \mbox{I}(z_n=1)$

where $\beta_0$ is the intercept, $\beta_1$ are the covariate coefficients, $\beta_z$ and $\beta_2$ are the effect modifier coefficients, $z_n$ are the indicator variables of effect alternative treatment. $g(\cdot)$ is the link function e.g. $\log$ .

As already mentioned, running the STC analysis is almost identical to the previous analysis but we now use the strategy_stc() strategy function instead and a formula with centered covariates.

$y = X_3 + X_4 + \beta_t(X_1 - \bar{X_1}) + \beta_t(X_2 - \bar{X_2})$

However, outstandR() knows how to handle this so we can simply pass the same (uncentred) formula as before.

outstandR_stc <-
  outstandR(ipd_trial, ald_trial,
            strategy = strategy_stc(
              formula = lin_form,
              family = poisson(link = "log")))

outstandR_stc

For the last two approaches, we perform G-computation firstly with a frequentist MLE approach and then a Bayesian approach.

Parametric G-computation with maximum-likelihood estimation

G-computation marginalizes the conditional estimates by separating the regression modelling from the estimation of the marginal treatment effect for A versus C. First, a regression model of the observed outcome $y$ on the covariates $x$ and treatment $z$ is fitted to the AC IPD:

$g(\mu_n) = \beta_0 + \boldsymbol{x}_n \boldsymbol{\beta_1} + (\beta_z + \boldsymbol{x_n^{EM}} \boldsymbol{\beta_2}) \; \mbox{I}(z_n = 1)$

In the context of G-computation, this regression model is often called the “Q-model.” Having fitted the Q-model, the regression coefficients are treated as nuisance parameters. The parameters are applied to the simulated covariates $x*$ to predict hypothetical outcomes for each subject under both possible treatments. Namely, a pair of predicted outcomes, also called potential outcomes, under A and under C, is generated for each subject.

By plugging treatment C into the regression fit for every simulated observation, we predict the marginal outcome mean in the hypothetical scenario in which all units are under treatment C:

$\hat{\mu}_0 = \int_{x^*} g^{-1} (\hat{\beta}_0 + x^* \hat{\beta}_1 ) p(x^*) \; \text{d}x^*$

To estimate the marginal or population-average treatment effect for A versus C in the linear predictor scale, one back-transforms to this scale the average predictions, taken over all subjects on the natural outcome scale, and calculates the difference between the average linear predictions:

$\hat{\Delta}^{(2)}_{10} = g(\hat{\mu}_1) - g(\hat{\mu}_0)$

outstandR_gcomp_ml <-
  outstandR(ipd_trial, ald_trial,
            strategy = strategy_gcomp_ml(
              formula = lin_form,
              family = poisson(link = "log")))

outstandR_gcomp_ml

Bayesian G-computation with MCMC

The difference between Bayesian G-computation and its maximum-likelihood counterpart is in the estimated distribution of the predicted outcomes. The Bayesian approach also marginalizes, integrates or standardizes over the joint posterior distribution of the conditional nuisance parameters of the outcome regression, as well as the joint covariate distribution.

Draw a vector of size $N^*$ of predicted outcomes $y^*_z$ under each set intervention $z^* \in \{0, 1\}$ from its posterior predictive distribution under the specific treatment. This is defined as $p(y^*_{z^*} \mid \mathcal{D}_{AC}) = \int_{\beta} p(y^*_{z^*} \mid \beta) p(\beta \mid \mathcal{D}_{AC}) d\beta$ where $p(\beta \mid \mathcal{D}_{AC})$ is the posterior distribution of the outcome regression coefficients $\beta$ , which encode the predictor-outcome relationships observed in the AC trial IPD. This is given by:

$p(y^*_{^z*} \mid \mathcal{D}_{AC}) = \int_{x^*} p(y^* \mid z^*, x^*, \mathcal{D}_{AC}) p(x^* \mid \mathcal{D}_{AC})\; \text{d}x^*$

$= \int_{x^*} \int_{\beta} p(y^* \mid z^*, x^*, \beta) p(x^* \mid \beta) p(\beta \mid \mathcal{D}_{AC})\; d\beta \; \text{d}x^*$

In practice, the integrals above can be approximated numerically, using full Bayesian estimation via Markov chain Monte Carlo (MCMC) sampling.

The average, variance and interval estimates of the marginal treatment effect can be derived empirically from draws of the posterior density.

We can draw a vector of size $N^*$ of predicted outcomes $y^*_z$ under each set intervention $z^*$ from its posterior predictive distribution under the specific treatment.

outstandR_gcomp_bayes <-
  outstandR(ipd_trial, ald_trial,
            strategy = strategy_gcomp_bayes(
              formula = lin_form,
              family = poisson(link = "log")))

outstandR_gcomp_bayes

Multiple imputation marginalisation

ref

$equation here$

outstandR_mim <-
  outstandR(ipd_trial, ald_trial,
            strategy = strategy_mim(
              formula = lin_form,
              family = poisson(link = "log")))
outstandR_mim

Model comparison

Combine all outputs for log-odds ratio table of all contrasts and methods.

knitr::kable(
  data.frame(
    `MAIC` = unlist(outstandR_maic$contrasts$means),
    `STC` = unlist(outstandR_stc$contrasts$means),
    `Gcomp ML` = unlist(outstandR_gcomp_ml$contrasts$means),
    `Gcomp Bayes` = unlist(outstandR_gcomp_bayes$contrasts$means),
    `MIM` = unlist(outstandR_mim$contrasts$means))
) |> 
  round(2)