FracFixR is a compositional statistical framework for analyzing RNA-seq data from fractionation experiments. It addresses the fundamental challenge where library preparation and sequencing depth obscure the original proportions of RNA fractions.
This vignette demonstrates:
Let’s simulate a simple polysome profiling experiment:
# Simulate count data for 500 genes across 12 samples
n_genes <- 100
n_samples <- 12
# Generate count matrix with varying expression levels
total_counts <- matrix(
rnbinom(n_genes * 4, mu = 100, size = 10),
nrow = n_genes,
dimnames = list(
paste0("Gene", 1:n_genes),
paste0("Sample", 1:4)
)
)
prob=c(1/4,1/4,1/2)
# distribute counts evenly accross different fractions in Control
multinom_samples <- apply(total_counts, c(1, 2), function(x) rmultinom(1, size = x, prob = prob))
reshaped <- aperm(multinom_samples, c(3, 2, 1)) # now (n, 4, 3)
reshaped_matrix1 <- matrix(reshaped, nrow = n_genes, ncol = 12)
colnames(reshaped_matrix1) <- paste0("V", rep(1:4, each = 3), "_p", 1:3)
counts<-cbind(total_counts[,1:2],reshaped_matrix1[,c(2:3,5:6)],total_counts[,3:4],reshaped_matrix1[,c(8:9,11:12)])
# Create annotation data frame
annotation <- data.frame(
Sample = colnames(counts),
Condition = rep(c("Control", "Treatment"), each = 6),
Type = rep(c("Total", "Total", "Light_Polysome", "Heavy_Polysome","Light_Polysome", "Heavy_Polysome"), 2),
Replicate = c("Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2), "Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2))
)
print(head(annotation))
#> Sample Condition Type Replicate
#> 1 Sample1 Control Total Rep1
#> 2 Sample2 Control Total Rep2
#> 3 V1_p2 Control Light_Polysome Rep1
#> 4 V1_p3 Control Heavy_Polysome Rep1
#> 5 V2_p2 Control Light_Polysome Rep2
#> 6 V2_p3 Control Heavy_Polysome Rep2# Run the main analysis
results <- FracFixR(MatrixCounts = counts, Annotation = annotation)
#> Setting up parallel processing...
#> Filtering transcripts based on Total samples...
#> Retained 100 transcripts present in Total samples
#>
#> Processing condition 1/2: Control
#> Selecting transcripts with 60.0 - 99.9% quantiles (range: 201.8 - 358.8)
#> Selected 39 transcripts for regression
#> Processing replicate 1/2: Rep1
#> Processing replicate 2/2: Rep2
#>
#> Processing condition 2/2: Treatment
#> Selecting transcripts with 60.0 - 99.9% quantiles (range: 208.4 - 335.4)
#> Selected 39 transcripts for regression
#> Processing replicate 1/2: Rep1
#> Processing replicate 2/2: Rep2
#>
#> Finalizing results...
#> FracFixR analysis complete!
# View the structure of results
names(results)
#> [1] "OriginalData" "Annotation" "Propestimates" "NewData"
#> [5] "Coefficients" "Fractions" "plots"The FracFixR output contains several components:
# 1. Fraction proportions for each replicate
print(results$Fractions)
#> Light_Polysome Heavy_Polysome Lost Replicate
#> 1 0.18955381 0.08374122 0.7267050 Control_Rep1
#> 2 0.03637074 0.13166057 0.8319687 Control_Rep2
#> 3 0.08583560 0.06639776 0.8477666 Treatment_Rep1
#> 4 0.02474643 0.14819955 0.8270540 Treatment_Rep2
# 2. Regression coefficients
print(results$Coefficients)
#> Lost Light_Polysome Heavy_Polysome Replicate
#> 1 186.8989 1.4650526 0.6729095 Control_Rep1
#> 2 207.0308 0.3022741 1.0571047 Control_Rep2
#> 3 208.1736 0.6734116 0.2764326 Treatment_Rep1
#> 4 203.7555 0.1008105 0.5727541 Treatment_Rep2
# 3. Estimated Proportions (first 5 genes, first 6 samples)
print(results$Propestimates[1:5, 1:6])
#> Sample1 Sample2 V1_p2 V1_p3 V2_p2 V2_p3
#> Gene1 266 256 0.3913497 0.14748701 0.04140741 0.28961774
#> Gene2 220 228 0.1177932 0.04734036 0.03189827 0.06905709
#> Gene3 253 232 0.1671537 0.05193597 0.01927251 0.06385196
#> Gene4 254 250 0.2408306 0.21662154 0.04554815 0.24617508
#> Gene5 236 233 0.1367382 0.07775843 0.01612128 0.09866311The plot shows: - The proportion of RNA in each fraction - The “Lost” fraction (grey) representing unrecoverable material - Consistency across replicates
Now let’s identify genes with differential polysome association between conditions:
# Test for differential proportion in heavy polysomes
diff_heavy <- DiffPropTest(
NormObject = results,
Conditions = c("Control", "Treatment"),
Types = "Heavy_Polysome",
Test = "Logit"
)
#> Performing Logit test comparing Conditions Control and Treatment in the Heavy_Polysome fraction
#> Applying FDR correction...
#> Analysis complete. Found 1 transcripts with padj < 0.01
# View top differentially associated genes
top_genes <- diff_heavy[order(diff_heavy$padj), ]
print(head(top_genes, 10))
#> transcript mean_success_cond1 mean_success_cond2 mean_diff log2FC
#> 13 Gene13 0.23013011 0.06347892 -0.16665119 -2.349775
#> 6 Gene6 0.04840396 0.19092543 0.14252147 1.878592
#> 64 Gene64 0.13833735 0.05312658 -0.08521077 -1.546880
#> 82 Gene82 0.11560510 0.20245948 0.08685438 1.273309
#> 40 Gene40 0.08224187 0.14030679 0.05806492 1.386000
#> 65 Gene65 0.14134238 0.05856196 -0.08278042 -1.272163
#> 72 Gene72 0.09611483 0.18364472 0.08752988 1.115993
#> 81 Gene81 0.17475680 0.09982287 -0.07493392 -1.144928
#> 86 Gene86 0.13551103 0.06077879 -0.07473223 -1.401098
#> 89 Gene89 0.16866573 0.07038866 -0.09827706 -1.358102
#> pval padj
#> 13 3.510741e-06 0.0003510741
#> 6 2.081895e-04 0.0104094772
#> 64 6.469782e-04 0.0215659404
#> 82 1.191512e-03 0.0297878057
#> 40 5.868677e-03 0.0896933828
#> 65 7.887182e-03 0.0896933828
#> 72 8.111811e-03 0.0896933828
#> 81 8.969338e-03 0.0896933828
#> 86 7.630853e-03 0.0896933828
#> 89 5.519371e-03 0.0896933828Positive mean_diff indicates higher proportion in Treatment.
volcano <- PlotComparison(
diff_heavy,
Conditions = c("Control", "Treatment"),
Types = "Heavy_Polysome",
cutoff = 20
)
print(volcano)For real experiments, ensure your data meets these requirements:
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
#> [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Australia/Sydney
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] FracFixR_1.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] future.apply_1.20.0 gtable_0.3.6 jsonlite_2.0.0
#> [4] dplyr_1.1.4 compiler_4.4.2 tidyselect_1.2.1
#> [7] parallel_4.4.2 tidyr_1.3.1 jquerylib_0.1.4
#> [10] globals_0.18.0 scales_1.4.0 yaml_2.3.10
#> [13] fastmap_1.2.0 ggplot2_4.0.1 R6_2.6.1
#> [16] labeling_0.4.3 generics_0.1.4 knitr_1.50
#> [19] future_1.68.0 tibble_3.3.0 bslib_0.9.0
#> [22] pillar_1.11.1 RColorBrewer_1.1-3 rlang_1.1.6
#> [25] cachem_1.1.0 xfun_0.54 nnls_1.6
#> [28] sass_0.4.10 S7_0.2.1 aod_1.3.3
#> [31] cli_3.6.5 withr_3.0.2 magrittr_2.0.4
#> [34] digest_0.6.39 grid_4.4.2 lifecycle_1.0.4
#> [37] vctrs_0.6.5 evaluate_1.0.5 glue_1.8.0
#> [40] farver_2.1.2 listenv_0.10.0 codetools_0.2-20
#> [43] parallelly_1.45.1 purrr_1.2.0 rmarkdown_2.30
#> [46] matrixStats_1.5.0 tools_4.4.2 pkgconfig_2.0.3
#> [49] htmltools_0.5.8.1Cleynen A, Ravindran A, Shirokikh N (2025). FracFixR: A compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data.
For more examples and advanced usage, see the FracFixR GitHub repository.