The RFplus package implements a novel spatial extrapolation and bias correction framework, integrating Random Forest (RF) and Quantile Mapping (QM) in a multi-stage process to improve the accuracy of satellite precipitation estimates. The methodology consists of three key stages:
Spatial Extrapolation of Precipitation: The first stage employs a Random Forest model to extrapolate the spatial distribution of precipitation. The model is trained using in-situ measurements as the response variable and a diverse set of satellite precipitation products and environmental covariates as predictors. This approach enables the generation of an initial precipitation field that extends observed precipitation patterns across unmonitored regions with high spatial flexibility, allowing applications at different temporal scales (e.g., daily, monthly, or annual).
Residual Correction through a Secondary RF Model: To enhance predictive accuracy, a second Random Forest model is trained to estimate residual errors from the initial predictions. The residuals are defined as the difference between observed and modeled precipitation values at station locations. By modeling these residuals as a function of the same covariates used in the first stage, systematic biases are identified and corrected iteratively. The corrected precipitation estimate is obtained by summing the residual predictions to the initial RF-based precipitation estimates, leading to a refined precipitation product with reduced bias and improved spatial coherence.
Bias Adjustment via Non-Parametric Quantile Mapping (QM): In the third stage, a nonparametric quantile mapping (QM) is applied to adapt the distribution of each time series to the in situ observations of the nearest station. The QM correction will be applied to those pixels that meet the proximity criterion, which states that only pixels within a predefined radius of influence (e.g., ≤15 km) are QM corrected.
The RFplus package is designed to be highly adaptable and can be utilized across various satellite precipitation products and geographic regions. Although initially developed for precipitation bias correction, its methodology is applicable to other environmental variables such as temperature, wind speed, and soil moisture. This versatility makes RFplus a powerful tool for enhancing the accuracy of remote sensing-based estimations across diverse environmental conditions.
## RFplus 1.4-0
## Type RFplusNews() to see new features/changes/bug fixes.
## Warning: package 'terra' was built under R version 4.4.3
## terra 1.8.29
##
## Attaching package: 'data.table'
## The following object is masked from 'package:terra':
##
## shift
# Load the in-situ data and the coordinates
data("BD_Insitu", package = "RFplus")
data("Cords_Insitu", package = "RFplus")
# Load the covariates
MSWEP = terra::rast(system.file("extdata/MSWEP.nc", package = "RFplus"))
CHIRPS = terra::rast(system.file("extdata/CHIRPS.nc", package = "RFplus"))
DEM = terra::rast(system.file("extdata/DEM.nc", package = "RFplus"))
# Adjust individual covariates to the covariate format required by RFplus
Covariates = list(MSWEP = MSWEP, CHIRPS = CHIRPS, DEM = DEM)
# Apply the RFplus -----------------------------------------------------------
# 1. Define categories to categorize rainfall intensity
Rain_threshold = list(
no_rain = c(0, 1),
light_rain = c(1, 5),
moderate_rain = c(5, 20),
heavy_rain = c(20, 40),
violent_rain = c(40, 100)
)
# 2. Apply de the model
model = RFplus(BD_Insitu, Cords_Insitu, Covariates, n_round = 1, wet.day = 0.1 , ntree = 2000, seed = 123, training = 0.8, Rain_threshold = Rain_threshold, method = "RQUANT", ratio = 5, save_model = FALSE, name_save = NULL
)
## The training parameter has been introduced. The model will be trained with: 80 % data and validated with: 20 %
## Analysis in progress: Stage 1 of 2. Please wait...
## Analysis in progress: Stage 2 of 2. Correction by: RQUANT. Please wait...
## Applying correction method. This may take a while...
## Analysis completed.
## Validation process in progress. Please wait.
# Precipitation results within the study area
modelo_rainfall = model$Ensamble
# Validation statistic results
# goodness-of-fit metrics
metrics_gof = model$Validation$gof
# Categorical Metrics
metrics_categoricxal = model$Validation$categorical_metrics
# Note: In the above example we used 80% of the data for training and 20% for model validation.
The Rain_threshold parameter is used exclusively when performing point-to-pixel validation of the model. Its purpose is to classify rainfall values into different intensity categories, allowing the calculation of categorical performance metrics such as Probability of Detection (POD), False Alarm Rate (FAR), Critical Success Index (CSI), among others.
This parameter should be defined as a list, where each category corresponds to a range of precipitation values. For example:
Rain_threshold = list(
no_rain = c(0, 1), # No precipitation
light_rain = c(1, 5), # Light rainfall
moderate_rain = c(5, 20), # Moderate rainfall
heavy_rain = c(20, 40), # Heavy rainfall
violent_rain = c(40, 100) # violent rain
)
This parameter should only be specified when training is different from 1, because in this case the algorithm performs a validation of the results.
When training = 1, the model trains with 100% of the available data and does not perform validation, so Rain_threshold is not used.
Finally, the user has full flexibility to define one or more categories as he/she deems appropriate, which allows to adapt the classification of precipitation events to different regions.
The RFplus method improves satellite rainfall estimates by correcting for biases through machine learning (Random Forest) and statistical distribution fitting (Quantile Mapping). By applying these corrections, RFplus ensures that satellite data not only aligns with observed data in terms of mean values, but also in terms of the underlying distribution, which is particularly useful for accurately capturing extreme weather events such as heavy precipitation. The flexibility of RFplus allows its application to a wide range of satellite data products beyond precipitation, including temperature and wind speed, making it a versatile tool for extrapolation where weather stations are not available.