fastymd

Overview

fastymd is a package for working with Year-Month-Day (YMD) style date objects. It provides extremely fast passing of character strings and numeric values to date objects as well as fast decomposition of these in to their year, month and day components. The underlying algorithms follow the approach of Howard Hinnant for calculating days from the UNIX Epoch of Gregorian Calendar dates and vice versa.

The API won’t give any surprises:

library(fastymd)
cdate <- c("2025-04-16", "2025-04-17")
(res <- fymd(cdate))
#> [1] "2025-04-16" "2025-04-17"
res == as.Date(cdate)
#> [1] TRUE TRUE
get_ymd(res)
#>   year month day
#> 1 2025     4  16
#> 2 2025     4  17
fymd(2025, 4, 16) == res[1L]
#> [1] TRUE

Invalid dates will return NA and a warning:

fymd(2021, 02, 29) # not a leap year
#> NAs introduced due to invalid month and/or day combinations.
#> [1] NA

More interesting is the handling of output after a valid date. Consider the following timestamp:

timelt <- as.POSIXlt(Sys.time(), tz = "UTC")
(timestamp <- strftime(timelt , "%Y-%m-%dT%H:%M:%S%z"))
#> [1] "2025-04-24T21:51:27+0000"

By default the time element is ignored:

(res <- fymd(timestamp))
#> [1] "2025-04-24"
res == as.Date(timestamp, tz = "UTC")
#> [1] TRUE

This ignoring of the timestamp is both good and bad. For timestamps it makes perfect sense, but perhaps you have simple dates and a concern that some are corrupted. For these we can use the strict argument:

cdate <- "2025-04-16nonsense "
fymd(cdate)
#> [1] "2025-04-16"
fymd(cdate, strict = TRUE)
#> NAs introduced due to invalid date strings.
#> [1] NA

Benchmarks

The character method of fymd() parses input strings in a fixed, year, month and day order. These values must be digits but can be separated by any non-digit character. This is similar in spirit to the fastDate() function in Simon Urbanek’s fasttime package, using pure text parsing and no system calls for maximum speed.

For extremely fast passing of POSIX style timestamps you will struggle to beat the performance of fasttime. This works fantastically for timestamps that do not need validation and are within the date range supported by the package (currently 1970-01-01 through to the year 2199).

fymd() fills the, admittedly small, niche where you want fast parsing of YMD strings along with date validation and support for a wider range of dates from the Proleptic Gregorian calendar (currently we support years in the range [-9999, 9999]). This additional capability does come with a small performance penalty but, hopefully, this has been kept to a minimum and the implementation remains competitive.

library(microbenchmark)

# 1970-01-01 (UNIX epoch) to "2199-01-01"
dates <- seq.Date(from = .Date(0), to = fymd("2199-01-01"), by = "day")

# comparison timings for fymd (character method) 
cdates  <- format(dates)
(res_c <- microbenchmark(
    fasttime  = fasttime::fastDate(cdates),
    fastymd   = fymd(cdates),
    ymd       = ymd::ymd(cdates),
    lubridate = lubridate::ymd(cdates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq     mean    median       uq       max neval
#>   fasttime  530.626  535.6445  590.102  540.0530  546.215  3573.341   100
#>    fastymd  759.244  769.6580  797.582  777.3625  784.922  2186.300   100
#>        ymd 4420.079 4514.3255 4617.550 4535.2805 4614.213  5878.654   100
#>  lubridate 5016.447 5180.8155 6189.872 5331.4980 6651.158 36841.468   100
# comparison timings for fymd (numeric method)
ymd  <- get_ymd(dates)
(res_n <- microbenchmark(
    fastymd   = fymd(ymd[[1]], ymd[[2]], ymd[[3]]),
    lubridate = lubridate::make_date(ymd[[1]], ymd[[2]], ymd[[3]]),
    check     = "equal"
))
#> Unit: microseconds
#>       expr     min       lq     mean   median       uq      max neval
#>    fastymd 343.595 345.2425 367.1786 348.3880  370.139  462.517   100
#>  lubridate 537.478 542.0665 769.6049 547.9725 1061.470 3038.528   100
# comparison timings for year getter
(res_get_year <- microbenchmark(
    fastymd   = get_year(dates),
    ymd       = ymd::year(dates),
    lubridate = lubridate::year(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min       lq      mean    median        uq       max neval
#>    fastymd  482.164  496.652  574.9981  503.0590  507.7575  1960.116   100
#>        ymd  499.396  506.810  573.2474  511.8395  517.6060  3277.326   100
#>  lubridate 7609.029 7619.865 8252.9036 7625.8210 7706.4820 39390.940   100
# comparison timings for month getter
(res_get_month <- microbenchmark(
    fastymd   = get_month(dates),
    ymd       = ymd::month(dates),
    lubridate = lubridate::month(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq     mean    median        uq       max neval
#>    fastymd  449.323  464.4660  506.680  469.0645  472.5910  2012.544   100
#>        ymd  532.418  537.7585  550.637  540.5080  544.4505   786.986   100
#>  lubridate 8219.113 8271.7070 8832.283 8300.4755 9651.9500 11677.499   100
# comparison timings for mday getter
(res_get_mday <- microbenchmark(
    fastymd   = get_mday(dates),
    ymd       = ymd::mday(dates),
    lubridate = lubridate::day(dates),
    check     = "equal"
))
#> Unit: microseconds
#>       expr      min        lq     mean    median       uq      max neval
#>    fastymd  450.595  462.7430  570.538  467.9580  473.047 2562.345   100
#>        ymd  537.688  542.0065  568.018  543.6995  546.725 1931.322   100
#>  lubridate 7552.984 7571.8345 7757.858 7582.7800 7631.016 9653.524   100