fastymd

Overview
Benchmarks

Overview

fastymd is a package for working with Year-Month-Day (YMD) style date objects. It provides extremely fast passing of character strings and numeric values to date objects as well as fast decomposition of these in to their year, month and day components. The underlying algorithms follow the approach of Howard Hinnant for calculating days from the UNIX Epoch of Gregorian Calendar dates and vice versa.

The API won’t give any surprises:

library(fastymd)
cdate <- c("2025-04-16", "2025-04-17")
(res <- fymd(cdate))

#> [1] "2025-04-16" "2025-04-17"

res == as.Date(cdate)

#> [1] TRUE TRUE

get_ymd(res)

#>   year month day
#> 1 2025     4  16
#> 2 2025     4  17

fymd(2025, 4, 16) == res[1L]

#> [1] TRUE

Invalid dates will return NA and a warning:

fymd(2021, 02, 29) # not a leap year

#> NAs introduced due to invalid month and/or day combinations.

#> [1] NA

More interesting is the handling of output after a valid date. Consider the following timestamp:

timelt <- as.POSIXlt(Sys.time(), tz = "UTC")
(timestamp <- strftime(timelt , "%Y-%m-%dT%H:%M:%S%z"))

#> [1] "2025-05-12T19:11:06+0000"

By default the time element is ignored:

(res <- fymd(timestamp))

#> [1] "2025-05-12"

res == as.Date(timestamp, tz = "UTC")

#> [1] TRUE

This ignoring of the timestamp is both good and bad. For timestamps it makes perfect sense, but perhaps you have simple dates and a concern that some are corrupted. For these we can use the strict argument:

cdate <- "2025-04-16nonsense "
fymd(cdate)

#> [1] "2025-04-16"

fymd(cdate, strict = TRUE)

#> NAs introduced due to invalid date strings.

#> [1] NA

Benchmarks

The character method of fymd() parses input strings in a fixed, year, month and day order. These values must be digits but can be separated by any non-digit character. This is similar in spirit to the fastDate() function in Simon Urbanek’s fasttime package, using pure text parsing and no system calls for maximum speed.

For extremely fast passing of POSIX style timestamps you will struggle to beat the performance of fasttime. This works fantastically for timestamps that do not need validation and are within the date range supported by the package (currently 1970-01-01 through to the year 2199).

fymd() fills the, admittedly small, niche where you want fast parsing of YMD strings along with date validation and support for a wider range of dates from the Proleptic Gregorian calendar (currently we support years in the range [-9999, 9999]). This additional capability does come with a small performance penalty but, hopefully, this has been kept to a minimum and the implementation remains competitive.

library(microbenchmark)

# 1970-01-01 (UNIX epoch) to "2199-01-01"
dates <- seq.Date(from = .Date(0), to = fymd("2199-01-01"), by = "day")

# comparison timings for fymd (character method) 
cdates  <- format(dates)
(res_c <- microbenchmark(
    fasttime  = fasttime::fastDate(cdates),
    fastymd   = fymd(cdates),
    ymd       = ymd::ymd(cdates),
    lubridate = lubridate::ymd(cdates),
    check     = "equal"
))

#> Unit: microseconds
#>       expr      min       lq      mean   median       uq       max neval
#>   fasttime  528.702  533.631  560.6187  536.527  540.419  1939.710   100
#>    fastymd  775.045  780.199  794.6313  784.697  788.530  1130.412   100
#>        ymd 4444.390 4487.957 4607.3698 4507.409 4555.168  5968.801   100
#>  lubridate 4956.522 5065.686 6075.4734 5176.645 6562.420 37051.120   100

# comparison timings for fymd (numeric method)
ymd  <- get_ymd(dates)
(res_n <- microbenchmark(
    fastymd   = fymd(ymd[[1]], ymd[[2]], ymd[[3]]),
    lubridate = lubridate::make_date(ymd[[1]], ymd[[2]], ymd[[3]]),
    check     = "equal"
))

#> Unit: microseconds
#>       expr     min      lq     mean  median      uq      max neval
#>    fastymd 373.440 375.765 425.1393 378.861 381.897 2289.366   100
#>  lubridate 534.272 542.949 660.0808 547.262 552.196 2261.223   100

# comparison timings for year getter
(res_get_year <- microbenchmark(
    fastymd   = get_year(dates),
    ymd       = ymd::year(dates),
    lubridate = lubridate::year(dates),
    check     = "equal"
))

#> Unit: microseconds
#>       expr      min        lq      mean    median        uq      max neval
#>    fastymd  483.708  497.4185  567.4244  501.1405  506.2355 2148.081   100
#>        ymd  498.245  505.6640  521.0937  510.2580  514.2100  775.766   100
#>  lubridate 7593.239 7605.0920 7891.8327 7618.5575 7650.4620 9812.684   100

# comparison timings for month getter
(res_get_month <- microbenchmark(
    fastymd   = get_month(dates),
    ymd       = ymd::month(dates),
    lubridate = lubridate::month(dates),
    check     = "equal"
))

#> Unit: microseconds
#>       expr      min       lq      mean    median        uq       max neval
#>    fastymd  449.674  465.243  569.8989  468.5995  473.1580  2142.421   100
#>        ymd  532.589  536.547  550.2292  539.6225  543.1995   805.552   100
#>  lubridate 8202.122 8243.695 8996.7768 8268.0460 9351.0135 40174.112   100

# comparison timings for mday getter
(res_get_mday <- microbenchmark(
    fastymd   = get_mday(dates),
    ymd       = ymd::mday(dates),
    lubridate = lubridate::day(dates),
    check     = "equal"
))

#> Unit: microseconds
#>       expr      min       lq      mean   median       uq      max neval
#>    fastymd  451.738  462.253  522.9238  465.584  469.496 1871.983   100
#>        ymd  535.996  539.823  558.3011  541.927  545.193 1798.335   100
#>  lubridate 7530.001 7550.941 7771.7465 7564.346 7592.257 9609.052   100