| Type: | Package |
| Title: | Check Threatened Plant Species Status Against Peru's DS 043-2006-AG |
| Version: | 0.2.0 |
| Maintainer: | Paul E. Santos Andrade <paulefrens@gmail.com> |
| Description: | Provides tools to match plant species names against the official threatened species list of Peru (Supreme Decree DS 043-2006-AG, 2006). Implements a hierarchical matching pipeline with exact, fuzzy, and suffix matching algorithms to handle nomenclatural variations and taxonomic changes. Supports both the original 2006 nomenclature and updated taxonomic names, allowing users to check protection status regardless of nomenclatural changes since the decree's publication. Threat categories follow IUCN standards (CR, EN, VU, NT). |
| Encoding: | UTF-8 |
| URL: | https://github.com/PaulESantos/peruflorads43, https://paulesantos.github.io/peruflorads43/ |
| BugReports: | https://github.com/PaulESantos/peruflorads43/issues |
| Depends: | R (≥ 4.1.0) |
| Imports: | assertthat (≥ 0.2.1), dplyr (≥ 1.1.0), fuzzyjoin (≥ 0.1.6), memoise (≥ 2.0.1), progress (≥ 1.2.2), purrr (≥ 1.0.0), readr (≥ 2.1.0), stringr (≥ 1.5.0), tibble (≥ 3.1.0), tidyr (≥ 1.3.0) |
| Suggests: | gt, scales, stringdist, ggplot2, forcats, covr, knitr, rmarkdown, testthat (≥ 3.0.0), withr (≥ 2.5.0) |
| VignetteBuilder: | knitr |
| Language: | en-US |
| RoxygenNote: | 7.3.3 |
| Config/testthat/edition: | 3 |
| License: | MIT + file LICENSE |
| NeedsCompilation: | no |
| Packaged: | 2025-10-27 01:27:00 UTC; PC |
| Author: | Paul E. Santos Andrade
|
| Repository: | CRAN |
| Date/Publication: | 2025-10-27 15:00:09 UTC |
Backup Ambiguous Match Attributes
Description
Extracts and consolidates ambiguous match attributes from multiple objects. This prevents attribute loss during dplyr transformations.
Usage
.backup_ambiguous_attrs(...)
Arguments
... |
One or more data frames or tibbles that may contain ambiguous match attributes |
Value
A named list with consolidated ambiguous match attributes:
- genera
Tibble with ambiguous genus matches
- species
Tibble with ambiguous species matches
- infraspecies
Tibble with ambiguous infraspecies level 1 matches
- infraspecies_2
Tibble with ambiguous infraspecies level 2 matches
Consolidate Ambiguous Match Attributes
Description
Collects ambiguous match attributes from intermediate pipeline results and attaches them to the final output. This ensures that ambiguous match information created during fuzzy matching is preserved through all transformations and available to the user via get_ambiguous_matches().
Usage
.consolidate_ambiguous_attrs(output_f, pipe_1_5, infra_out)
Arguments
output_f |
Final output tibble from the matching pipeline |
pipe_1_5 |
List containing results from nodes 1-5 (genus/species matching) |
infra_out |
List containing results from nodes 6-7 (infraspecies matching) |
Details
This function solves the problem of attributes being lost during dplyr transformations (left_join, mutate, bind_rows, etc.). It retrieves attributes created in earlier stages of the pipeline and re-attaches them to the final output.
Value
output_f with attached ambiguous match attributes: - attr(*, "ambiguous_genera") - attr(*, "ambiguous_species") - attr(*, "ambiguous_infraspecies")
Final Validation of Matching Results
Description
Validates that the output maintains integrity with the original input, including proper handling of duplicate names.
Usage
.final_assertions(splist_class, output_f)
Arguments
splist_class |
Tibble. Original classified species list |
output_f |
Tibble. Final formatted output |
Value
Invisible TRUE if all checks pass, otherwise throws error
Restore Ambiguous Match Attributes
Description
Attaches previously backed-up ambiguous match attributes to a tibble.
Usage
.restore_ambiguous_attrs(tbl, backup)
Arguments
tbl |
A tibble to which attributes should be attached |
backup |
A named list of ambiguous match attributes (output from '.backup_ambiguous_attrs()') |
Value
The input tibble with ambiguous match attributes attached
Simplified wrapper for consolidated matching
Description
Simplified interface for checking DS 043-2006-AG status with automatic consolidation of original and updated nomenclature.
Usage
check_ds043(splist, return_simple = FALSE)
Arguments
splist |
Character vector of species names |
return_simple |
Logical. If TRUE, returns only "Protected" or "Not protected" |
Value
Character vector with protection status
Examples
## Not run:
species <- c("Brassia ocanensis", "Persea americana")
check_ds043(species)
## End(Not run)
Create comparison table between original and updated results
Description
Creates a side-by-side comparison table useful for understanding nomenclatural changes and their impact on DS 043-2006-AG status.
Usage
comparison_table_ds043(splist)
Arguments
splist |
Character vector of species names |
Value
Tibble with comparison
Direct Match Species Names
Description
Performs direct matching of species names against the threatened species database. Matches binomial names (genus + species), trinomial names (+ infraspecies level 1), and quaternomial names (+ infraspecies level 2) when applicable.
Usage
direct_match(df, target_df = NULL, source = "original")
Arguments
df |
A tibble containing the species data to be matched. |
target_df |
A tibble representing the threatened species database containing the reference list of threatened species. |
source |
Character string specifying which database version to use. Options are:
|
Value
A tibble with an additional logical column 'direct_match' indicating whether the name was successfully matched ('TRUE') or not ('FALSE').
Direct Match Infraspecific Rank within Species
Description
Performs direct matching of infraspecific rank (VAR., SUBSP., F., etc.) within an already matched species. This is a prerequisite before fuzzy matching the infraspecific epithet, as the rank category must match exactly.
Usage
direct_match_infra_rank_within_species(
df,
target_df = NULL,
source = "original"
)
Arguments
df |
A tibble containing the species data to be matched. |
target_df |
A tibble representing the threatened species database. |
source |
Character string specifying which database version to use. Options are:
|
Details
This function ensures that the infraspecific category (e.g., VAR., SUBSP., F.) matches exactly before attempting fuzzy matching on the infraspecific epithet. This prevents inappropriate matches like "var. alba" matching with "subsp. alba" which, despite having similar epithets, are taxonomically different entities.
The function automatically uses the correct column name based on use_infraspecies_2: - TRUE: Uses 'tag' column (original DS 043-2006-AG database) - FALSE: Uses 'tag_acc' column (updated nomenclature database)
Value
A tibble with an additional logical column 'direct_match_infra_rank' indicating whether the infraspecific rank was successfully matched ('TRUE') or not ('FALSE').
Helper: Direct Match Infraspecific Rank within Species
Description
Helper function that performs the actual matching of infraspecific ranks for a single matched species. Automatically handles both original and updated databases by using the appropriate column name (tag or tag_acc).
Usage
direct_match_infra_rank_within_species_helper(
df,
target_df,
source = "original"
)
Arguments
df |
A tibble containing data for a single matched species. |
target_df |
A tibble representing the threatened species database. |
source |
Character string specifying which database version to use. Options are:
|
Details
The function performs the following steps: 1. Determines which column to use based on use_infraspecies_2 2. Extracts infraspecific ranks from the database for the matched species 3. Standardizes rank names to uppercase 4. Performs exact matching on the rank category 5. Returns matched and unmatched records with boolean indicator
Value
A tibble with match results and logical indicator.
Direct Match Species within Genus
Description
This function performs a direct match of specific epithets within an already matched genus from the list of threatened species in the database.
Usage
direct_match_species_within_genus_helper(df, target_df)
Arguments
df |
A tibble containing the species data to be matched. |
target_df |
A tibble representing the threatened species database containing the reference list of threatened species. |
Value
A tibble with an additional logical column indicating whether the specific epithet was successfully matched within the matched genus ('TRUE') or not ('FALSE').
Fuzzy Match Genus Name
Description
This function performs a fuzzy match of genus names against the threatened species database using fuzzyjoin::stringdist() to account for slight variations in spelling.
Usage
fuzzy_match_genus(df, target_df = NULL)
Arguments
df |
A tibble containing the genus names to be matched. |
target_df |
A tibble representing the threatened species database containing the reference list of threatened species. |
Details
If multiple genera match with the same string distance (ambiguous matches),
a warning is issued and the first match is automatically selected. To
examine ambiguous matches in detail, use get_ambiguous_matches
on the result object.
**IMPROVED**: Ambiguous match attributes now include database information such as family and representative species for better manual curation.
Value
A tibble with two additional columns: - fuzzy_match_genus: A logical column indicating whether the genus was successfully matched ('TRUE') or not ('FALSE'). - fuzzy_genus_dist: A numeric column representing the distance for each match.
See Also
get_ambiguous_matches to retrieve ambiguous match details
Fuzzy Match Infraspecies Level 2 within Infraspecies Level 1
Description
Fuzzy Match Infraspecies Level 2 within Infraspecies Level 1
Usage
fuzzy_match_infraspecies2_within_infraspecies(df, target_df = NULL)
Helper function for fuzzy matching infraspecies level 2
Description
Helper function for fuzzy matching infraspecies level 2
Usage
fuzzy_match_infraspecies2_within_infraspecies_helper(df, target_df)
Fuzzy Match Infraspecific Epithet within Species
Description
Fuzzy Match Infraspecific Epithet within Species
Usage
fuzzy_match_infraspecies_within_species(
df,
target_df = NULL,
source = "original"
)
Helper: Fuzzy Match Infraspecific Epithet within Species (IMPROVED)
Description
Helper: Fuzzy Match Infraspecific Epithet within Species (IMPROVED)
Usage
fuzzy_match_infraspecies_within_species_helper(
df,
target_df,
source = "original"
)
Fuzzy Match Species within Genus
Description
This function attempts to fuzzy match species names within a genus to the threatened species database using fuzzyjoin::stringdist for fuzzy matching.
Usage
fuzzy_match_species_within_genus(df, target_df = NULL)
Arguments
df |
A tibble containing the species data to be matched. |
target_df |
A tibble representing the threatened species database containing the reference list of threatened species. |
Details
If multiple species match with the same string distance (ambiguous matches),
a warning is issued and the first match is automatically selected. To
examine ambiguous matches in detail, use get_ambiguous_matches
on the result object with type = "species".
**IMPROVED**: Ambiguous match attributes now include threat category and accepted names for better decision-making.
Value
A tibble with an additional logical column fuzzy_match_species_within_genus, indicating whether the specific epithet was successfully fuzzy matched within the matched genus ('TRUE') or not ('FALSE').
See Also
get_ambiguous_matches to retrieve ambiguous match details
Fuzzy Match Species within Genus - Helper
Description
Fuzzy Match Species within Genus - Helper
Usage
fuzzy_match_species_within_genus_helper(df, target_df)
Match Genus Name
Description
This function performs a direct match of genus names against the genus names listed in the threatened species database.
Usage
genus_match(df, target_df = NULL)
Arguments
df |
A tibble containing the genus names to be matched. |
target_df |
A tibble representing the threatened species database containing the reference list of threatened species. |
Value
A tibble with an additional logical column genus_match indicating whether the genus was successfully matched ('TRUE') or not ('FALSE').
Retrieve Ambiguous Match Information
Description
Extracts information about ambiguous matches (multiple candidates with tied distances) from matching results. This is useful for quality control and manual curation of uncertain matches.
Usage
get_ambiguous_matches(
match_result,
type = c("genus", "species", "infraspecies", "all"),
save_to_file = FALSE,
output_dir = tempdir()
)
Arguments
match_result |
A tibble returned by matching functions such as
|
type |
Character. Type of ambiguous matches to retrieve:
|
save_to_file |
Logical. If TRUE, saves results to a CSV file. Default is FALSE (CRAN compliant - no automatic file writing). |
output_dir |
Character. Directory to save the file if save_to_file = TRUE.
Defaults to |
Details
During fuzzy matching, multiple candidates may have identical string distances, making the choice of match ambiguous. The matching algorithm automatically selects the first candidate, but this function allows you to:
Review all ambiguous matches for quality control
Export them for manual curation
Make informed decisions about match quality
Value
A tibble with ambiguous match details, or NULL if no ambiguous matches exist. Columns depend on the match type but typically include original names, matched names, and distance metrics.
File Output
When save_to_file = TRUE, a timestamped CSV file is created:
Filename format: "threatenedperu_ambiguous_[type]_[timestamp].csv"
Location:
output_dir(defaults to tempdir())Contains all ambiguous matches with metadata
Get Database Summary Statistics
Description
Provides summary statistics for the threatened species databases.
Usage
get_database_summary(type = c("both", "original", "updated"))
Arguments
type |
Character string: "original", "updated", or "both" (default). |
Value
A tibble with summary statistics.
Examples
# Get summary of both databases
summary_stats <- get_database_summary()
print(summary_stats)
# Get summary of just the original
summary_original <- get_database_summary("original")
print(summary_original)
Get Threatened Species Database
Description
Retrieves the threatened plant species database for Peru. This function provides controlled access to the internal datasets used by the package.
Usage
get_threatened_database(type = c("original", "updated"))
Arguments
type |
Character string specifying which database version to retrieve. Options are:
|
Value
A tibble containing the threatened species database.
Database Structure
**Original Database** (type = "original"):
~777 species as listed in DS 043-2006-AG
Supports quaternomial names (Rank 4)
Includes both accepted names and synonyms
Columns: scientific_name, genus, species, tag, infraspecies, tag_2, infraspecies_2, threat_category, accepted_name_author, taxonomic_status, accepted_name, family, protected_ds_043
**Updated Database** (type = "updated"):
Updated nomenclature using WCVP and POWO
Supports trinomial names (Rank 3 maximum)
Only accepted names (synonyms resolved)
Columns: scientific_name, genus, species, tag_acc, infraspecies, threat_category, accepted_name_author, taxonomic_status, accepted_name, family, protected_ds_043
Threat Categories
- CR
Critically Endangered
- EN
Endangered
- VU
Vulnerable
- NT
Near Threatened
Legal Context
Data based on Supreme Decree DS 043-2006-AG, Ministry of Agriculture, Peru (July 13, 2006), which establishes the official list of threatened wild flora species in Peru.
Note
This function is primarily for advanced users who need direct access
to the database structure. For most use cases, use the higher-level
functions: is_threatened_peru or is_ds043_2006_ag.
See Also
is_threatened_peru to check threat status of species
is_ds043_2006_ag to check DS 043 protection status
Examples
# Get original database
db_original <- get_threatened_database(type = "original")
str(db_original)
nrow(db_original)
# Get updated database
db_updated <- get_threatened_database(type = "updated")
str(db_updated)
# Compare number of species
n_original <- nrow(db_original)
n_updated <- nrow(db_updated)
cat("Original:", n_original, "| Updated:", n_updated, "\n")
# Count by threat category
table(db_original$threat_category)
# Find critically endangered orchids
orchids <- db_original[db_original$family == "ORCHIDACEAE" &
db_original$threat_category == "CR", ]
head(orchids$scientific_name)
Matching for DS 043-2006-AG Species
Description
Performs consolidated matching that searches species names in both the original DS 043-2006-AG list (2006 names) and the updated nomenclature database. This ensures that users with updated names can still identify if their species are protected under the DS 043-2006-AG, even if the nomenclature has updated.
Usage
is_ds043_2006_ag(splist, prioritize = "original", return_details = FALSE)
Arguments
splist |
Character vector of species names to check |
prioritize |
Character. Which result to prioritize when both databases match: "original" (default) or "updated" |
return_details |
Logical. Return detailed matching information |
Details
The function performs a two-stage search:
1. Searches in original DS 043-2006-AG (names as listed in 2006) 2. Searches in updated nomenclature database (current accepted names) 3. Consolidates results with clear indication of which database provided the match 4. Identifies if original names are now synonyms
This approach handles cases where: - User provides original name from 2006: Found in original database - User provides updated name: Found in updated database and linked to DS 043-2006-AG list - Name matches in both: Returns most relevant result based on priority - Original name is now a synonym: Indicated with "(synonym)" marker
Value
If return_details = FALSE: Character vector with consolidated threat status. If return_details = TRUE: Tibble with detailed reconciliation information.
Examples
## Not run:
# Species with nomenclatural changes
species <- c(
"Haageocereus acranthus subsp. olowinskianus", # Original name
"Brassia ocanensis", # Updated name (was Ada)
"Ida locusta", # Updated name
"Lycaste locusta", # Now a synonym
"Persea americana" # Not threatened
)
# Get consolidated status
status <- is_ds043_2006_ag(species)
# Get detailed information
details <- is_ds043_2006_ag(species, return_details = TRUE)
View(details)
## End(Not run)
Check if species are threatened listed in DS 043-2006-AG Peru
Description
This function checks if a list of species names are threatened according to the Peruvian threatened species database. The function allows fuzzy matching for species names with a maximum distance threshold to handle potential typos or variations in species names.
Usage
is_threatened_peru(splist, source = "original", return_details = FALSE)
Arguments
splist |
A character vector containing the list of species names to be checked for threatened status in Peru. |
source |
Character string specifying which database version to use. Options are:
|
return_details |
Logical. If TRUE, returns detailed matching results. If FALSE (default), returns only the threat status vector. |
Value
If return_details = FALSE: A character vector indicating the threat status of each species ("Not threatened", "Threatened - CR", "Threatened - EN", "Threatened - VU", "Threatened - NT", or "Threatened - Unknown category").
If return_details = TRUE: A tibble with detailed matching results including matched names, threat categories, and matching process information.
Examples
# Example 1: Basic usage with valid species names
species_list <- c("Cattleya maxima", "Polylepis incana", "Fake species")
# Simple status check
threat_status <- tryCatch(
is_threatened_peru(species_list),
error = function(e) {
message("Error in matching: ", e$message)
rep("Error", length(species_list))
}
)
print(threat_status)
# Example 2: Detailed results
detailed_results <- tryCatch(
is_threatened_peru(species_list, return_details = TRUE),
error = function(e) {
message("Error in detailed matching: ", e$message)
NULL
}
)
if (!is.null(detailed_results)) {
print(detailed_results)
}
# Example 3: Handling NA values gracefully
species_with_na <- c("Cattleya maxima", NA, "Polylepis incana")
status_with_na <- is_threatened_peru(species_with_na)
print(status_with_na)
# Example 4: Empty input handling
empty_result <- is_threatened_peru(character(0))
print(empty_result) # Should return character(0)
# Example 5: Using updated database
updated_results <- tryCatch(
is_threatened_peru(species_list, source = "updated"),
error = function(e) {
message("Error with updated database: ", e$message)
rep("Error", length(species_list))
}
)
print(updated_results)
Map with optional progress bar
Description
Internal wrapper for purrr::map_dfr with optional progress tracking. Progress bars are only shown in interactive sessions.
Usage
map_dfr_progress(.x, .f, ..., .id = NULL, .progress = interactive())
Arguments
.x |
A list or vector to iterate over |
.f |
A function to apply |
... |
Additional arguments passed to .f |
.id |
Column name for row identification |
.progress |
Logical. Show progress bar? Default is interactive() |
Match Species Names to Threatened Plant List of Peru
Description
This function matches given species names against the internal database of threatened plant species in Peru. It uses a hierarchical matching strategy that includes direct matching, genus-level matching, fuzzy matching, and suffix matching to maximize successful matches while maintaining accuracy.
Usage
matching_threatenedperu(
splist,
source = c("original", "updated"),
quiet = TRUE
)
Arguments
splist |
A character vector containing the species names to be matched. Can include duplicate names - results will be expanded to match the input. |
source |
Character string specifying which database version to use. Options are:
|
quiet |
Logical, default TRUE. If FALSE, prints informative messages. |
Details
**Duplicate Handling:** When the input contains duplicate names, the function automatically:
Detects duplicates and creates a tracking column (sorters)
Processes only unique names (efficient matching)
Expands results to restore all original positions
Preserves original input order via sorter column
The duplicate handling uses a 'sorters' column that concatenates all original sorter values for duplicate names (e.g., "1 - 3" for a name appearing at positions 1 and 3), enabling accurate result expansion.
**Matching Strategy:** 1. Direct exact matching 2. Genus-level matching (exact and fuzzy) 3. Species-level matching within genus 4. Infraspecies-level matching (up to 2 levels for original database)
**Rank Validation:** The algorithm implements strict rank validation to prevent false positives.
Value
A tibble with detailed matching results including:
- sorter
Integer. Original position in input vector
- Orig.Name
Character. Original input name (standardized)
- Matched.Name
Character. Matched name from database or "—"
- Threat.Status
Character. IUCN threat category or "Not threatened"
- Rank
Integer. Input taxonomic rank (1-4)
- Matched.Rank
Integer. Matched taxonomic rank
- Comp.Rank
Logical. Whether ranks match exactly
- Match.Level
Character. Description of match quality
- matched
Logical. Whether a match was found
See Also
is_threatened_peru for a simplified interface
get_ambiguous_matches to retrieve ambiguous match details
get_threatened_database to access the raw databases
Examples
## Not run:
# Basic usage
species_list <- c("Cattleya maxima", "Polylepis incana")
results <- matching_threatenedperu(species_list, source = "original")
# With duplicates
species_dup <- c("Cattleya maxima", "Polylepis incana", "Cattleya maxima")
results_dup <- matching_threatenedperu(species_dup)
nrow(results_dup) == 3 # TRUE - preserves duplicates
# Access metadata
attr(results, "match_rate")
# Check for ambiguous matches
get_ambiguous_matches(results, type = "infraspecies")
## End(Not run)
Suffix Match Species within Genus
Description
Function to match the specific epithet by exchanging common suffixes within an already matched genus in the threatened species database.
Usage
suffix_match_species_within_genus_helper(df, target_df)
Arguments
df |
A tibble. |
target_df |
A tibble representing the threatened species database containing the reference list of threatened species. |
Value
Returns a tibble with the additional logical column suffix_match_species_within_genus, indicating whether the specific epithet was successfully matched within the matched genus ('TRUE') or not ('FALSE').