| Title: | Marginalization over Incomplete Auxiliaries |
|---|---|
| Description: | Implements methods to estimate conditional outcome means in settings with missingness-not-at-random and incomplete auxiliary variables. Specifically, this package implements the marginalization over incomplete auxiliaries (MIA) method proposed by Mathur et al. (2026) <doi:10.13140/RG.2.2.30750.19524>. The package supports continuous and binary outcomes, and supports auxiliary variables that are normal, binary, and categorical. |
| Authors: | Sean McGrath [aut, cre] (ORCID: <https://orcid.org/0000-0002-7281-3516>), Shaun Seaman [aut] (ORCID: <https://orcid.org/0000-0003-3726-5937>), Willi Zhang [aut] (ORCID: <https://orcid.org/0009-0006-9087-479X>), Ilya Shpitser [aut] (ORCID: <https://orcid.org/0000-0003-2571-7326>), Maya Mathur [aut] (ORCID: <https://orcid.org/0000-0001-6698-2607>) |
| Maintainer: | Sean McGrath <[email protected]> |
| License: | GPL (>=3) |
| Version: | 0.1.0 |
| Built: | 2026-05-27 07:19:46 UTC |
| Source: | https://github.com/stmcg/miapack |
This data set was simulated to reflect a setting with missingness-not-at-random and an incomplete auxiliary variable.
dat.simdat.sim
A data frame that contains 9,297 rows and the following columns:
YA continuous outcome variable.
X1A binary predictor variable.
X2A binary predictor variable.
WA binary auxiliary variable.
Variable dependencies: The underlying values of the variables were generated as follows:
X1 is generated independently.
X2 depends on X1.
W depends on X2.
Y depends on X1, X2, and their interaction.
Missingness patterns: The missingness patterns were generated as follows:
Missingness in X1, X2, and Y depends on the
underlying (potentially unobserved) values of W.
Missingness in W is generated independently.
Rows where all variables are missing are removed from the dataset.
This function applies nonparametric bootstrap to construct confidence intervals around the conditional mean estimates obtained by mia. This function is a wrapper for the boot and boot.ci functions from the boot package.
get_CI( mia_res, n_boot = 1000, type = "bca", conf = 0.95, boot_args = list(), boot.ci_args = list(), show_progress = TRUE )get_CI( mia_res, n_boot = 1000, type = "bca", conf = 0.95, boot_args = list(), boot.ci_args = list(), show_progress = TRUE )
mia_res |
Output from the |
n_boot |
Numeric scalar specifying the number of bootstrap replicates to use |
type |
Character string specifying the type of confidence interval. The options are |
conf |
Numeric scalar specifying the level of the confidence interval. The default is |
boot_args |
A list of additional arguments to pass to the |
boot.ci_args |
A list of additional arguments to pass to the |
show_progress |
Logical scalar indicating whether to show a progress bar during bootstrap. Default is |
An object of class "mia_ci". This object is a list with the following elements:
ci_1 |
An object of class "boot.ci" which contains the output of the |
ci_2 |
An object of class "boot.ci" which contains the output of the |
ci_contrast |
An object of class "boot.ci" which contains the output of the |
bres |
An object of class "boot" which contains the output of the |
... |
additional elements |
Mathur MB, Seaman S, Zhang W, McGrath S, Shpitser I. (2026). Estimating conditional means under missingness-not-at-random with incomplete auxiliary variables. doi.org/10.13140/RG.2.2.30750.19524.
set.seed(1234) res <- mia(data = dat.sim, X_names = c("X1", "X2"), X_values_1 = c(0, 1), X_values_2 = c(0, 0), Y_model = Y ~ W + X1 + X2, W_model = W ~ X1 + X2) res_ci <- get_CI(mia_res = res, n_boot = 50, type = 'perc') res_ci ## Example with parallelization res_par <- get_CI(res, n_boot = 100, type = 'perc', boot_args = list(parallel = "snow", ncpus = 2))set.seed(1234) res <- mia(data = dat.sim, X_names = c("X1", "X2"), X_values_1 = c(0, 1), X_values_2 = c(0, 0), Y_model = Y ~ W + X1 + X2, W_model = W ~ X1 + X2) res_ci <- get_CI(mia_res = res, n_boot = 50, type = 'perc') res_ci ## Example with parallelization res_par <- get_CI(res, n_boot = 100, type = 'perc', boot_args = list(parallel = "snow", ncpus = 2))
This function implements the marginalization over incomplete auxiliaries (MIA) method (Mathur et al. 2026). For an outcome variable , predictor variable , and auxiliary variable , this function estimates the conditional outcome mean identified by
where and are indicators of non-missing values of and , respectively, and is an indicator of a complete case pattern (i.e., , , and are non-missing).
The function supports estimating the identifying functionals of and as well as contrasts between them (differences, ratios).
mia( data, X_names, X_values_1, X_values_2 = NULL, contrast_type, Y_model, Y_type, W_model, W_type, n_mc = 10000, return_simulated_data = FALSE )mia( data, X_names, X_values_1, X_values_2 = NULL, contrast_type, Y_model, Y_type, W_model, W_type, n_mc = 10000, return_simulated_data = FALSE )
data |
Data frame containing the observed data. |
X_names |
Vector of character strings specifying the name(s) of the predictor variable(s) |
X_values_1 |
Numeric vector specifying the value of the predictor variable(s) |
X_values_2 |
(Optional) Numeric vector specifying an additional value of the predictor variable(s) |
contrast_type |
(Optional) Character string specifying the type of contrast to use when comparing |
Y_model |
Formula for the outcome model. |
Y_type |
(Optional) Character string specifying the "type" of the outcome variable. Options are |
W_model |
Formula for the auxiliary variable model. If the auxiliary variable is multivariate, this argument should be a list of model formulas, one for each component. The components will be simulated in the order they appear in the list. |
W_type |
(Optional) Vector of character strings specifying the "type" of each auxiliary variable. Options are |
n_mc |
Integer specifying the number of Monte Carlo samples to use. |
return_simulated_data |
Logical scalar indicating whether to return the simulated data set(s) containing the predictors and simulated auxiliary variable. Setting this argument to |
Estimation algorithm:
Step 1: One fits a model for the conditional outcome mean and the conditional density of the auxiliary variables . When is multivariate, i.e., , one uses the decomposition
and fits models for the components .
Step 2: Monte Carlo integration is used to compute the integral in the identifying functional for based on the fitted models in the first step. More specifically, for iteration , the following algorithm is performed. The value of is first simulated from its estimated conditional distribution. When is multivariate, the components of are simulated sequentially from their fitted models. That is, is simulated conditional on , is simulated conditional on , and so on. Then, the mean of is estimated conditional on . Finally, the average of the estimated means (across all iterations ) is taken as the estimate of .
An object of class "mia". This object is a list with the following elements:
mean_est_1 |
conditional outcome mean estimate under |
mean_est_2 |
conditional outcome mean estimate under |
contrast_est |
contrast of conditional outcome mean estimates between |
fit_W |
a list of fitted model(s) for W |
fit_Y |
fitted model for Y |
simulated_data |
a list, where the first element is the simulated data set under |
... |
additional elements |
Mathur MB, Seaman S, Zhang W, McGrath S, Shpitser I. (2026). Estimating conditional means under missingness-not-at-random with incomplete auxiliary variables. doi.org/10.13140/RG.2.2.30750.19524.
set.seed(1234) mia(data = dat.sim, X_names = c("X1", "X2"), X_values_1 = c(0, 1), X_values_2 = c(0, 0), Y_model = Y ~ W + X1 + X2, W_model = W ~ X1 + X2)set.seed(1234) mia(data = dat.sim, X_names = c("X1", "X2"), X_values_1 = c(0, 1), X_values_2 = c(0, 0), Y_model = Y ~ W + X1 + X2, W_model = W ~ X1 + X2)
Print method for objects of class "mia"
## S3 method for class 'mia' print(x, digits = 4, ...)## S3 method for class 'mia' print(x, digits = 4, ...)
x |
Object of class "mia". |
digits |
Integer specifying the number of decimal places to display. |
... |
Other arguments (ignored). |
No value is returned.
res <- mia(data = dat.sim, X_names = c("X1", "X2"), X_values_1 = c(0, 1), X_values_2 = c(0, 0), Y_model = Y ~ W + X1 + X2, W_model = W ~ X1 + X2) print(res)res <- mia(data = dat.sim, X_names = c("X1", "X2"), X_values_1 = c(0, 1), X_values_2 = c(0, 0), Y_model = Y ~ W + X1 + X2, W_model = W ~ X1 + X2) print(res)
Print method for objects of class "mia_ci"
## S3 method for class 'mia_ci' print(x, digits = 4, ...)## S3 method for class 'mia_ci' print(x, digits = 4, ...)
x |
Object of class "mia_ci". |
digits |
Integer specifying the number of decimal places to display. |
... |
Other arguments (ignored). |
No value is returned.
set.seed(1234) res <- mia(data = dat.sim, X_names = c("X1", "X2"), X_values_1 = c(0, 1), X_values_2 = c(0, 0), Y_model = Y ~ W + X1 + X2, W_model = W ~ X1 + X2) res_ci <- get_CI(res, n_boot = 100, type = 'perc') print(res_ci)set.seed(1234) res <- mia(data = dat.sim, X_names = c("X1", "X2"), X_values_1 = c(0, 1), X_values_2 = c(0, 0), Y_model = Y ~ W + X1 + X2, W_model = W ~ X1 + X2) res_ci <- get_CI(res, n_boot = 100, type = 'perc') print(res_ci)