Introduction to MarZIC

Introduction

MarZIC estimates and tests the marginal mediation effects of zero-inflated compositional mediators. It can be applied to microbiome studies to estimate and test the marginal mediation effects of relative abundances (RA). Let \(Y\), \(M\) and \(X\) denote the outcome, mediator (RA) and the independent variable respectively. And let \(1_{(M>0)}\) denote the binary indicator variable indicating whether \(M\) is positive. MarZIC decomposes the mediation effect into two components of which one is the total mediation effect going thru the continuous scale of the mediator, denoted by NIE\(_1\) and the other is the mediation effect going thru the binary change (from zero to non-zero status) of the mediator, denoted by NIE\(_2\). For a microbiome data, let \(M_j\) denote the RA of the \(j\)th taxon/OTU/ASV. And let \(Z\) denote confounder variable (if any) and \(Z\) could contain multiple variables. The marginal outcome model for \(Y\) is: \[ Y=\beta_0+\beta_1M_j+\beta_21_{M_j>0}+\beta_3X+\beta_4X1_{M_j>0}+\beta_5XM_j+\beta_6Z+\epsilon \] The mean of \(M_j\) depends on \(X\) and \(Z\) through the following equation: \[ \log \Bigg(\frac{E(M_j)}{1-E(M_j)}\Bigg)=\alpha_0 + \alpha_1X+\alpha_2Z \] Let \(\Delta_j=P(M_j=0)\) which is the probability of a structural zero (ie, true zero). We use the following logistic model for this probability:
\[ \log\bigg(\frac{\Delta_j}{1-\Delta_j}\bigg)=\gamma_0 + \gamma_1X+\gamma_2Z \]

Input for MarZIC() function

Typically, users just need to feed the first six inputs to the function: Experiment_dat, lib_name, y_name, x_name, conf_name and taxa_of_interest.

Output for IFAA() function

A list of 4 datasets containing the results for NIE1, NIE2, NDE, and NIE. Each dataset has row representing each taxon, 6 columns for Estimates, Standard Error, Lower bound for 95% Confidence Interval, Upper bound for 95% Confidence Interval, Adjusted p value, Significance indicator.

Example

## A make up example with 1 taxon and 100 subjects.
set.seed(1)
nSub <- 200
nTaxa <- 10
## generate covariate of interest X
X <- rbinom(nSub, 1, 0.5)
## generate confounders
conf1<-rnorm(nSub)
conf2<-rbinom(nSub,1,0.5)

## generate mean of each taxon. All taxon are having the same mean for simplicity.
mu <- exp(-5 + X + 0.1 * conf1 + 0.1 * conf2) / 
  (1 + exp(-5 + X + 0.1 * conf1 + 0.1 * conf2))
phi <- 10

## generate true RA
M_taxon<-t(sapply(mu,function(x) dirmult::rdirichlet(n=1,rep(x*phi,nTaxa))))

P_zero <- exp(-3 + 0.3 * X + 0.1 * conf1 + 0.1 * conf2) / 
  (1 + exp(-3 + 0.3 * X + 0.1 * conf1 + 0.1 * conf2))

non_zero_ind <- t(sapply(P_zero,function(x) 1-rbinom(nTaxa,1,rep(x,nTaxa))))

True_RA<-t(apply(M_taxon*non_zero_ind,1,function(x) x/sum(x)))

## generate outcome Y based on true RA
Y <- 1 + 100 * True_RA[,1] + 5 * (True_RA[,1] > 0) + X + conf1 + conf2 + rnorm(nSub)

## library size was set to 10,000 for all subjects for simplicity.
libsize <- 10000

## generate observed RA
observed_AA <- floor(M_taxon*libsize*non_zero_ind)

observed_RA <- t(apply(observed_AA,1,function(x) x/sum(x)))
colnames(observed_RA)<-paste0("rawCount",seq_len(nTaxa))
CovData <- cbind(Y = Y, X = X, libsize = libsize, conf1 = conf1, conf2 = conf2)

Above steps could be ignored if a SummarizedExperiment object is already available. Suppose we’re interested in the mediation effects of rawCount1, rawCount2, and rawCount3, the analysis could be done as:

## run the analysis
res <- MarZIC(
  MicrobData = observed_RA,
  CovData = CovData,
  lib_name = "libsize",
  y_name = "Y",
  x_name = "X",
  conf_name = c("conf1","conf2"),
  taxa_of_interest = c("rawCount1","rawCount2","rawCount3"),
  num_cores = 1,
  mediator_mix_range = 1
)

The results contain 4 datasets for NIE\(_1\), NIE\(_2\), NDE, NIE, respectively. The NIE\(_1\), for example, could be extracted by:

NIE1 <- res$NIE1_save

The significant result could be extracted by:

subset(NIE1, significance == TRUE)
#>                est       se   CI low    CI up p value unadj  p value adj
#> rawCount1 7.138202 1.666013 3.872816 10.40359  1.830673e-05 5.492018e-05
#>           significance
#> rawCount1         TRUE

From the result, we can find that rawCount1 is an significant mediator of mediating the effect of \(X\) on \(Y\).

Reference

Wu et al.(2022) MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data. Genes 2022, 13, 1049.

#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MarZIC_1.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.11       pracma_2.4.2      bslib_0.5.1       compiler_4.2.1   
#>  [5] jquerylib_0.1.4   mathjaxr_1.6-0    iterators_1.0.14  tools_4.2.1      
#>  [9] digest_0.6.33     jsonlite_1.8.7    evaluate_0.22     lattice_0.21-8   
#> [13] rlang_1.1.1       foreach_1.5.2     cli_3.6.1         rstudioapi_0.14  
#> [17] parallel_4.2.1    yaml_2.3.7        NlcOptim_0.6      xfun_0.40        
#> [21] fastmap_1.1.1     knitr_1.44        sass_0.4.7        stats4_4.2.1     
#> [25] lmtest_0.9-40     grid_4.2.1        nnet_7.3-19       R6_2.5.1         
#> [29] flexmix_2.3-19    rmarkdown_2.25    Formula_1.2-5     codetools_0.2-19 
#> [33] htmltools_0.5.6.1 modeltools_0.2-23 MASS_7.3-60       dirmult_0.1.3-5  
#> [37] betareg_3.1-4     quadprog_1.5-8    sandwich_3.0-2    doParallel_1.0.17
#> [41] cachem_1.0.8      zoo_1.8-12