Reproduction of manuscript’s results

Introduction

This vignette serves the purpose to reproduce the results shown in the section “Real data example” of the manuscript describing collaborative graphical lasso. The computations carried out by the following code are heavy and could take a considerable amount of time. On our machine (RAM 48 Gb, CPU Intel(R) Xeon(R) W-2235 CPU @ 3.80GHz) it took almost 3 hours. Hence, we refer the user who wants to get started with the package to the introductory vignette: vignette("coglasso"). Collaborative graphical lasso is an algorithm to reconstruct multi-omics networks based on graphical lasso (Friedman, Hastie and Tibshirani, 2008) and collaborative regression (Gross and Tibshirani, 2015). The coglasso package implements and R interface to this algorithm. We will use it to reproduce the results obtained in the original manuscript.

Let us first attach coglasso.

library(coglasso)

We will use the multi-omics data set provided by the package, in its largest version: multi_omics_sd. Let’s inspect its description in the manual.

help(multi_omics_sd)

We notice that this data set has 162 transcripts and 76 metabolites. For further information on the data set sources and generation, we refer the user to the script multi_omics_sd.R on the GitHub repository of the package.

Reproducing the results

For the multi-omics network reconstruction carried out in the manuscript we explored 30 automatically generated \(λ_w\) and \(λ_b\) values and 9 given \(c\) values.

nlambda_w <- 30
nlambda_b <- 30
cs <- c(0.01, 0.05, 0.1, 0.2, 1, 5, 10, 20, 100)

We can now build the networks, one per combination of the chosen hyperparameters, with coglasso().

cg <- coglasso(multi_omics_sd,
  pX = 162,
  nlambda_w = nlambda_w,
  nlambda_b = nlambda_b,
  c = cs
)

We proceed now to select the most stable, yet sparse network, using an adaptation to coglasso of StARS (Liu, Roeder and Wasserman, 2010). The function implementing this adapted version is stars_coglasso().

set.seed(42)
sel_cg <- stars_coglasso(cg)

We display now the selected network using the package igraph, reproducing the one displayed in Figure 3 of the manuscript. The orientation of the network may vary.

# To create the igraph object from the selected adjacency matrix:
sel_graph <- igraph::graph.adjacency(sel_cg$sel_adj, mode = "undirected")

# Setting some graphical parameters
igraph::V(sel_graph)$label <- NA
igraph::V(sel_graph)$color <- c(rep("#00ccff", 162), rep("#ff9999", 76))
igraph::V(sel_graph)$frame.color <- c(rep("#002060", 162), rep("#800000", 76))
igraph::V(sel_graph)$frame.width <- 2
igraph::V(sel_graph)$size <- 4
igraph::E(sel_graph)$width <- 0.3

# Setting a force-based network layout and removing disconnected nodes from the graph
lo <- igraph::layout_with_fr(sel_graph)
diconnected <- which(igraph::degree(sel_graph) == 0)
sel_graph2 <- igraph::delete.vertices(sel_graph, diconnected)
lo2 <- lo[-diconnected, ]

# Plotting
plot(sel_graph2, layout = lo2)

We now plot the subnetwork of the gene Cirbp and of its neighborhood. This reproduces the network displayed in Figure 4A.

igraph::V(sel_graph)$label<-colnames(sel_cg$data)
cirbp_index <- which(colnames(sel_cg$data) == "Cirbp")
subnetwork <- igraph::subgraph(
  sel_graph,
  c(cirbp_index, igraph::neighbors(sel_graph, cirbp_index))
)
plot(subnetwork)

A useful way to extract information from a network is community discovery. This technique allows to simplify a network by breaking it in functional units where nodes share several connections, enough to be recognized as separate communities. In the manuscript we used the greedy algorithm described in Clauset, Newman and Moore, 2004, part of the igraph toolkit. We focused and inspected the second largest community, shown in Figure 4B. Here is the code to reproduce it.

communities <- igraph::cluster_fast_greedy(sel_graph)
community2 <- communities[[2]]
community2_graph<-igraph::subgraph(sel_graph, community2)

# Focusing on nodes of interest
fosjun_erg_AA <- c("Fos", "Fosb", "Junb", "Fosl2", "Egr1", "Egr2", "Egr3", "Ala", "Arg", "Asn","His","Ile","Leu","Lys","Met","Orn","Phe","Ser","Thr","Tyr","Val")
igraph::V(community2_igraph)[!(label %in% fosjun_erg_AA)]$color<-c(rep("#3bd8ff", 29), rep("#ffadad", 5))
igraph::V(community2_igraph)[!(label %in% fosjun_erg_AA)]$frame.color<-NA
igraph::V(community2_igraph)[!(label %in% fosjun_erg_AA)]$label<-NA

lo_community2 <- igraph::layout_with_fr(community2_graph)
plot(community2_graph, layout=lo_community2)

References

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441. https://doi.org/10.1093/biostatistics/kxm045

Gross, S. M., & Tibshirani, R. (2015). Collaborative regression. Biostatistics, 16(2), 326–338. https://doi.org/10.1093/biostatistics/kxu047

Liu, H., Roeder, K., & Wasserman, L. (2010). Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models (arXiv:1006.3316). arXiv. https://doi.org/10.48550/arXiv.1006.3316

Clauset, A., Newman, M. E. J., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6), 066111. https://doi.org/10.1103/PhysRevE.70.066111