Introduction to FracFixR

Alice Cleynen, Agin Ravindran, Nikolay Shirokikh

2026-05-11

Introduction

FracFixR is a compositional statistical framework for analyzing RNA-seq data from fractionation experiments. It addresses the fundamental challenge where library preparation and sequencing depth obscure the original proportions of RNA fractions.

This vignette demonstrates:

  1. Basic usage with polysome profiling data
  2. Differential proportion testing
  3. Visualization of results
  4. Advanced applications

Installation

# From CRAN
install.packages("FracFixR")

# From GitHub (development version)
devtools::install_github("Arnaroo/FracFixR")

Quick Start Example

Let’s simulate a simple polysome profiling experiment:

library(FracFixR)

# Set seed for reproducibility
set.seed(123)

Creating Example Data

# Simulate count data for 500 genes across 12 samples
n_genes <- 100
n_samples <- 12

# Generate count matrix with varying expression levels
total_counts <- matrix(
  rnbinom(n_genes * 4, mu = 100, size = 10),
  nrow = n_genes,
  dimnames = list(
    paste0("Gene", 1:n_genes),
    paste0("Sample", 1:4)
  )
)
prob=c(1/4,1/4,1/2)

# distribute counts evenly accross different fractions in Control
multinom_samples <- apply(total_counts, c(1, 2), function(x) rmultinom(1, size = x, prob = prob))
reshaped <- aperm(multinom_samples, c(3, 2, 1))  # now (n, 4, 3)
reshaped_matrix1 <- matrix(reshaped, nrow = n_genes, ncol = 12)
colnames(reshaped_matrix1) <- paste0("V", rep(1:4, each = 3), "_p", 1:3)

counts<-cbind(total_counts[,1:2],reshaped_matrix1[,c(2:3,5:6)],total_counts[,3:4],reshaped_matrix1[,c(8:9,11:12)])

# Create annotation data frame
annotation <- data.frame(
  Sample = colnames(counts),
  Condition = rep(c("Control", "Treatment"), each = 6),
  Type = rep(c("Total", "Total", "Light_Polysome", "Heavy_Polysome","Light_Polysome", "Heavy_Polysome"), 2),
  Replicate = c("Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2), "Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2))
)

print(head(annotation))
#>    Sample Condition           Type Replicate
#> 1 Sample1   Control          Total      Rep1
#> 2 Sample2   Control          Total      Rep2
#> 3   V1_p2   Control Light_Polysome      Rep1
#> 4   V1_p3   Control Heavy_Polysome      Rep1
#> 5   V2_p2   Control Light_Polysome      Rep2
#> 6   V2_p3   Control Heavy_Polysome      Rep2

Running FracFixR

# Run the main analysis
results <- FracFixR(MatrixCounts = counts, Annotation = annotation)
#> Setting up parallel processing...
#> Filtering transcripts based on Total samples...
#> Retained 100 transcripts present in Total samples
#> 
#> Processing condition 1/2: Control
#>   Selecting transcripts with 60.0 - 99.9% quantiles (range: 201.8 - 358.8)
#>   Selected 39 transcripts for regression
#>   Processing replicate 1/2: Rep1
#>   Processing replicate 2/2: Rep2
#> 
#> Processing condition 2/2: Treatment
#>   Selecting transcripts with 60.0 - 99.9% quantiles (range: 208.4 - 335.4)
#>   Selected 39 transcripts for regression
#>   Processing replicate 1/2: Rep1
#>   Processing replicate 2/2: Rep2
#> 
#> Finalizing results...
#> FracFixR analysis complete!

# View the structure of results
names(results)
#> [1] "OriginalData"  "Annotation"    "Propestimates" "NewData"      
#> [5] "Coefficients"  "Fractions"     "plots"

Understanding the Output

The FracFixR output contains several components:

# 1. Fraction proportions for each replicate
print(results$Fractions)
#>   Light_Polysome Heavy_Polysome      Lost      Replicate
#> 1     0.18955381     0.08374122 0.7267050   Control_Rep1
#> 2     0.03637074     0.13166057 0.8319687   Control_Rep2
#> 3     0.08583560     0.06639776 0.8477666 Treatment_Rep1
#> 4     0.02474643     0.14819955 0.8270540 Treatment_Rep2

# 2. Regression coefficients
print(results$Coefficients)
#>       Lost Light_Polysome Heavy_Polysome      Replicate
#> 1 186.8989      1.4650526      0.6729095   Control_Rep1
#> 2 207.0308      0.3022741      1.0571047   Control_Rep2
#> 3 208.1736      0.6734116      0.2764326 Treatment_Rep1
#> 4 203.7555      0.1008105      0.5727541 Treatment_Rep2

# 3. Estimated Proportions (first 5 genes, first 6 samples)
print(results$Propestimates[1:5, 1:6])
#>       Sample1 Sample2     V1_p2      V1_p3      V2_p2      V2_p3
#> Gene1     266     256 0.3913497 0.14748701 0.04140741 0.28961774
#> Gene2     220     228 0.1177932 0.04734036 0.03189827 0.06905709
#> Gene3     253     232 0.1671537 0.05193597 0.01927251 0.06385196
#> Gene4     254     250 0.2408306 0.21662154 0.04554815 0.24617508
#> Gene5     236     233 0.1367382 0.07775843 0.01612128 0.09866311

Visualizing Fraction Proportions

# Create fraction plot
frac_plot <- PlotFractions(results)
print(frac_plot)
Fraction proportions across replicates
Fraction proportions across replicates

The plot shows: - The proportion of RNA in each fraction - The “Lost” fraction (grey) representing unrecoverable material - Consistency across replicates

Differential Proportion Testing

Now let’s identify genes with differential polysome association between conditions:

# Test for differential proportion in heavy polysomes
diff_heavy <- DiffPropTest(
  NormObject = results,
  Conditions = c("Control", "Treatment"),
  Types = "Heavy_Polysome",
  Test = "Logit"
)
#> Performing Logit test comparing Conditions Control and Treatment in the Heavy_Polysome fraction
#> Applying FDR correction...
#> Analysis complete. Found 1 transcripts with padj < 0.01

# View top differentially associated genes
top_genes <- diff_heavy[order(diff_heavy$padj), ]
print(head(top_genes, 10))
#>    transcript mean_success_cond1 mean_success_cond2   mean_diff    log2FC
#> 13     Gene13         0.23013011         0.06347892 -0.16665119 -2.349775
#> 6       Gene6         0.04840396         0.19092543  0.14252147  1.878592
#> 64     Gene64         0.13833735         0.05312658 -0.08521077 -1.546880
#> 82     Gene82         0.11560510         0.20245948  0.08685438  1.273309
#> 40     Gene40         0.08224187         0.14030679  0.05806492  1.386000
#> 65     Gene65         0.14134238         0.05856196 -0.08278042 -1.272163
#> 72     Gene72         0.09611483         0.18364472  0.08752988  1.115993
#> 81     Gene81         0.17475680         0.09982287 -0.07493392 -1.144928
#> 86     Gene86         0.13551103         0.06077879 -0.07473223 -1.401098
#> 89     Gene89         0.16866573         0.07038866 -0.09827706 -1.358102
#>            pval         padj
#> 13 3.510741e-06 0.0003510741
#> 6  2.081895e-04 0.0104094772
#> 64 6.469782e-04 0.0215659404
#> 82 1.191512e-03 0.0297878057
#> 40 5.868677e-03 0.0896933828
#> 65 7.887182e-03 0.0896933828
#> 72 8.111811e-03 0.0896933828
#> 81 8.969338e-03 0.0896933828
#> 86 7.630853e-03 0.0896933828
#> 89 5.519371e-03 0.0896933828

Interpreting Results

Positive mean_diff indicates higher proportion in Treatment.

Creating a Volcano Plot

volcano <- PlotComparison(
  diff_heavy,
  Conditions = c("Control", "Treatment"),
  Types = "Heavy_Polysome",
  cutoff = 20
)
print(volcano)
Volcano plot of differential polysome association
Volcano plot of differential polysome association

Data Requirements

For real experiments, ensure your data meets these requirements:

  1. Count Matrix: Raw counts (not normalized)
  2. Annotation: Must include:
    • Sample names matching count matrix columns
    • At least one “Total” sample per condition-replicate
    • Consistent fraction names across replicates

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
#>  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Australia/Sydney
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] FracFixR_1.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] future.apply_1.20.0 gtable_0.3.6        jsonlite_2.0.0     
#>  [4] dplyr_1.1.4         compiler_4.4.2      tidyselect_1.2.1   
#>  [7] parallel_4.4.2      tidyr_1.3.1         jquerylib_0.1.4    
#> [10] globals_0.18.0      scales_1.4.0        yaml_2.3.10        
#> [13] fastmap_1.2.0       ggplot2_4.0.1       R6_2.6.1           
#> [16] labeling_0.4.3      generics_0.1.4      knitr_1.50         
#> [19] future_1.68.0       tibble_3.3.0        bslib_0.9.0        
#> [22] pillar_1.11.1       RColorBrewer_1.1-3  rlang_1.1.6        
#> [25] cachem_1.1.0        xfun_0.54           nnls_1.6           
#> [28] sass_0.4.10         S7_0.2.1            aod_1.3.3          
#> [31] cli_3.6.5           withr_3.0.2         magrittr_2.0.4     
#> [34] digest_0.6.39       grid_4.4.2          lifecycle_1.0.4    
#> [37] vctrs_0.6.5         evaluate_1.0.5      glue_1.8.0         
#> [40] farver_2.1.2        listenv_0.10.0      codetools_0.2-20   
#> [43] parallelly_1.45.1   purrr_1.2.0         rmarkdown_2.30     
#> [46] matrixStats_1.5.0   tools_4.4.2         pkgconfig_2.0.3    
#> [49] htmltools_0.5.8.1

References

Cleynen A, Ravindran A, Shirokikh N (2025). FracFixR: A compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data.

For more examples and advanced usage, see the FracFixR GitHub repository.