| Title: | R Interface to the Data Retriever | 
| Description: | Provides an R interface to the Data Retriever https://retriever.readthedocs.io/en/latest/ via the Data Retriever's command line interface. The Data Retriever automates the tasks of finding, downloading, and cleaning public datasets, and then stores them in a local database. | 
| Version: | 3.1.1 | 
| BugReports: | https://github.com/ropensci/rdataretriever/issues | 
| URL: | https://docs.ropensci.org/rdataretriever/ (website), https://github.com/ropensci/rdataretriever/ | 
| Depends: | R (≥ 3.4.0) | 
| Imports: | reticulate (≥ 1.16), semver | 
| Suggests: | testthat (≥ 1.0.0), DBI, devtools, RSQLite, RPostgreSQL, knitr, rmarkdown, dbplyr, raster, ggplot2 | 
| VignetteBuilder: | knitr | 
| SystemRequirements: | Python (>= 3.7), retriever (>= 3.0.0) (version must be listed to patch to allow parsing) | 
| License: | MIT + file LICENSE | 
| RoxygenNote: | 7.2.3 | 
| Encoding: | UTF-8 | 
| NeedsCompilation: | no | 
| Packaged: | 2024-07-25 21:03:58 UTC; henrysenyondo | 
| Author: | Henry Senyondo  | 
| Maintainer: | Henry Senyondo <henrykironde@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-07-25 22:00:02 UTC | 
Check for updates
Description
Check for updates
Usage
check_for_updates(repo = "")
Arguments
repo | 
 path to the repository  | 
Value
No return value, checks for updates repo
Examples
## Not run: 
rdataretriever::check_for_updates()
## End(Not run)
Check to see if minimum version of retriever Python package is installed
Description
Check to see if minimum version of retriever Python package is installed
Usage
check_retriever_availability()
Value
boolean
Examples
## Not run: 
rdataretriever::check_retriever_availability()
## End(Not run)
Commit a dataset
Description
Commit a dataset
Usage
commit(dataset, commit_message = "", path = NULL, quiet = FALSE)
Arguments
dataset | 
 name of the dataset  | 
commit_message | 
 commit message for the commit  | 
path | 
 path to save the committed dataset, if no path given save in provenance directory  | 
quiet | 
 logical, if true retriever runs in quiet mode  | 
Value
No return value, provides confirmation for commit
Examples
## Not run: 
rdataretriever::commit("iris")
## End(Not run)
See the log of committed dataset stored in provenance directory
Description
See the log of committed dataset stored in provenance directory
Usage
commit_log(dataset)
Arguments
dataset | 
 name of the dataset stored in provenance directory  | 
Value
No return value, prints message after commit
Examples
## Not run: 
rdataretriever::commit_log("iris")
## End(Not run)
Get Data Retriever version
Description
Get Data Retriever version
Usage
data_retriever_version(clean = TRUE)
Arguments
clean | 
 boolean return cleaned version appropriate for semver  | 
Value
returns a string with the version information
Examples
## Not run: 
rdataretriever::data_retriever_version()
## End(Not run)
Name all available dataset scripts.
Description
Additional information on the available datasets can be found at url https://retriever.readthedocs.io/en/latest/datasets.html
Usage
dataset_names()
Value
returns a character vector with the available datasets for download
Examples
## Not run: 
rdataretriever::dataset_names()
## End(Not run)
Name all available dataset scripts.
Description
Additional information on the available datasets can be found at url https://retriever.readthedocs.io/en/latest/datasets.html
Usage
datasets(keywords = "", licenses = "")
Arguments
keywords | 
 search all datasets by keywords  | 
licenses | 
 search all datasets by licenses  | 
Value
returns a character vector with the available datasets for download
Returns the names of all available dataset scripts
Examples
## Not run: 
rdataretriever::datasets()
## End(Not run)
Displays the list of rdataset names present in the list of packages provided
Description
Can take a list of packages, or NULL or a string 'all' for all rdataset packages and datasets
Usage
display_all_rdataset_names(package_name = NULL)
Arguments
package_name | 
 print datasets in the package, default to print rdataset and all to print all  | 
Value
No return value, displays the list of rdataset names present
Examples
## Not run: 
rdataretriever::display_all_rdataset_names()
## End(Not run)
Download datasets via the Data Retriever.
Description
Directly downloads data files with no processing, allowing downloading of non-tabular data.
Usage
download(
  dataset,
  path = "./",
  quiet = FALSE,
  sub_dir = "",
  debug = FALSE,
  use_cache = TRUE
)
Arguments
dataset | 
 the name of the dataset that you wish to download  | 
path | 
 the path where the data should be downloaded to  | 
quiet | 
 logical, if true retriever runs in quiet mode  | 
sub_dir | 
 downloaded dataset is stored into a custom subdirectory.  | 
debug | 
 setting TRUE helps in debugging in case of errors  | 
use_cache | 
 Setting FALSE reinstalls scripts even if they are already installed  | 
Value
No return value, downloads the raw dataset
Examples
## Not run: 
rdataretriever::download("plant-comp-ok")
# downloaded files will be copied to your working directory
# when no path is specified
dir()
## End(Not run)
Fetch a dataset via the Data Retriever
Description
Each datafile in a given dataset is downloaded to a temporary directory and then imported as a data.frame as a member of a named list.
Usage
fetch(dataset, quiet = TRUE, data_names = NULL)
Arguments
dataset | 
 the names of the dataset that you wish to download  | 
quiet | 
 logical, if true retriever runs in quiet mode  | 
data_names | 
 the names you wish to assign to cells of the list which stores the fetched dataframes. This is only relevant if you are downloading more than one dataset.  | 
Value
Returns a dataframe of dataset
Examples
## Not run: 
## fetch the portal Database
portal <- rdataretriever::fetch("portal")
class(portal)
names(portal)
## preview the data in the portal species datafile
head(portal$species)
vegdata <- rdataretriever::fetch(c("plant-comp-ok", "plant-occur-oosting"))
names(vegdata)
names(vegdata$plant_comp_ok)
## End(Not run)
Returns metadata for the following dataset id
Description
Returns metadata for the following dataset id
Usage
find_socrata_dataset_by_id(dataset_id)
Arguments
dataset_id | 
 id of the dataset  | 
Value
No return value, shows metadata for the following dataset id
Examples
## Not run: 
rdataretriever::socrata_dataset_info()
## End(Not run)
Get dataset names from upstream
Description
Get dataset names from upstream
Usage
get_dataset_names_upstream(keywords = "", licenses = "", repo = "")
Arguments
keywords | 
 filter datasets based on keywords  | 
licenses | 
 filter datasets based on license  | 
repo | 
 path to the repository  | 
Value
No return value, gets dataset names from upstream
Examples
## Not run: 
rdataretriever::get_dataset_names_upstream(keywords = "", licenses = "", repo = "")
## End(Not run)
Returns a list of all the available RDataset names present
Description
Returns a list of all the available RDataset names present
Usage
get_rdataset_names()
Value
No return value, list of all the available RDataset
Examples
## Not run: 
rdataretriever::get_rdataset_names()
## End(Not run)
Get retriever citation
Description
Get retriever citation
Usage
get_retriever_citation()
Value
No return value, outputs citation of the Data Retriever Python package
Examples
## Not run: 
rdataretriever::get_retriever_citation()
## End(Not run)
Get citation of a script
Description
Get citation of a script
Usage
get_script_citation(dataset = NULL)
Arguments
dataset | 
 dataset to obtain citation  | 
Value
No return value, gets citation of a script
Examples
## Not run: 
rdataretriever::get_script_citation(dataset = "")
## End(Not run)
Get scripts upstream
Description
Get scripts upstream
Usage
get_script_upstream(dataset, repo = "")
Arguments
dataset | 
 name of the dataset  | 
repo | 
 path to the repository  | 
Value
No return value, gets upstream scripts
Examples
## Not run: 
rdataretriever::get_script_upstream("iris")
## End(Not run)
Update the retriever's dataset scripts to the most recent versions.
Description
This function will check if the version of the retriever's scripts in your local directory ‘~/.retriever/scripts/’ is up-to-date with the most recent official retriever release. Note it is possible that even more updated scripts exist at the retriever repository https://github.com/weecology/retriever/tree/main/scripts that have not yet been incorperated into an official release, and you should consider checking that page if you have any concerns.
Usage
get_updates()
Value
No return value, updatea the retriever's dataset scripts to the most recent versions
Examples
## Not run: 
rdataretriever::get_updates()
## End(Not run)
Install datasets via the Data Retriever (deprecated).
Description
Data is stored in either CSV files or one of the following database management systems: MySQL, PostgreSQL, SQLite, or Microsoft Access.
Usage
install(
  dataset,
  connection,
  db_file = NULL,
  conn_file = NULL,
  data_dir = ".",
  log_dir = NULL
)
Arguments
dataset | 
 the name of the dataset that you wish to download  | ||||||||
connection | 
 what type of database connection should be used. The options include: mysql, postgres, sqlite, msaccess, or csv'  | ||||||||
db_file | 
 the name of the datbase file the dataset should be loaded into  | ||||||||
conn_file | 
 the path to the .conn file that contains the connection configuration options for mysql and postgres databases. This defaults to mysql.conn or postgres.conn respectively. The connection file is a file that is formated in the following way: 
  | ||||||||
data_dir | 
 the location where the dataset should be installed. Only relevant for csv connection types. Defaults to current working directory  | ||||||||
log_dir | 
 the location where the retriever log should be stored if the progress is not printed to the console  | 
Value
No return value, main install function
Examples
## Not run: 
rdataretriever::install("iris", "csv")
## End(Not run)
Install datasets via the Data Retriever.
Description
Data is stored in CSV files
Usage
install_csv(
  dataset,
  table_name = "{db}_{table}.csv",
  data_dir = getwd(),
  debug = FALSE,
  use_cache = TRUE,
  force = FALSE,
  hash_value = NULL
)
Arguments
dataset | 
 the name of the dataset that you wish to install or path to a committed dataset zip file  | 
table_name | 
 the name of the database file to store data  | 
data_dir | 
 the dir path to store data, defaults to working dir  | 
debug | 
 setting TRUE helps in debugging in case of errors  | 
use_cache | 
 Setting FALSE reinstalls scripts even if they are already installed  | 
force | 
 setting TRUE doesn't prompt for confirmation while installing committed datasets when changes are discovered in environment  | 
hash_value | 
 the hash value of committed dataset when installing from provenance directory  | 
Value
No return value, installs datasets into CSV
Examples
## Not run: 
rdataretriever::install_csv("iris")
## End(Not run)
Install datasets via the Data Retriever.
Description
Data is stored in JSON files
Usage
install_json(
  dataset,
  table_name = "{db}_{table}.json",
  data_dir = getwd(),
  debug = FALSE,
  use_cache = TRUE,
  force = FALSE,
  hash_value = NULL
)
Arguments
dataset | 
 the name of the dataset that you wish to install or path to a committed dataset zip file  | 
table_name | 
 the name of the database file to store data  | 
data_dir | 
 the dir path to store data, defaults to working dir  | 
debug | 
 setting TRUE helps in debugging in case of errors  | 
use_cache | 
 setting FALSE reinstalls scripts even if they are already installed  | 
force | 
 setting TRUE doesn't prompt for confirmation while installing committed datasets when changes are discovered in environment  | 
hash_value | 
 the hash value of committed dataset when installing from provenance directory  | 
Value
No return value, installs datasets in to JSON
Examples
## Not run: 
rdataretriever::install_json("iris")
## End(Not run)
Install datasets via the Data Retriever.
Description
Data is stored in MSAccess database
Usage
install_msaccess(
  dataset,
  file = "access.mdb",
  table_name = "[{db} {table}]",
  debug = FALSE,
  use_cache = TRUE,
  force = FALSE,
  hash_value = NULL
)
Arguments
dataset | 
 the name of the dataset that you wish to install or path to a committed dataset zip file  | 
file | 
 file name for database  | 
table_name | 
 table name for installing of dataset  | 
debug | 
 setting TRUE helps in debugging in case of errors  | 
use_cache | 
 Setting FALSE reinstalls scripts even if they are already installed  | 
force | 
 setting TRUE doesn't prompt for confirmation while installing committed datasets when changes are discovered in environment  | 
hash_value | 
 the hash value of committed dataset when installing from provenance directory  | 
Value
No return value, installs datasets into MSAccess database
Examples
## Not run: 
rdataretriever::install_msaccess(dataset = "iris", file = "sqlite.db")
## End(Not run)
Install datasets via the Data Retriever.
Description
Data is stored in MySQL database
Usage
install_mysql(
  dataset,
  user = "root",
  password = "",
  host = "localhost",
  port = 3306,
  database_name = "{db}",
  table_name = "{db}.{table}",
  debug = FALSE,
  use_cache = TRUE,
  force = FALSE,
  hash_value = NULL
)
Arguments
dataset | 
 the name of the dataset that you wish to install or path to a committed dataset zip file  | 
user | 
 username for database connection  | 
password | 
 password for database connection  | 
host | 
 hostname for connection  | 
port | 
 port number for connection  | 
database_name | 
 database name in which dataset will be installed  | 
table_name | 
 table name specified especially for datasets containing one file  | 
debug | 
 setting TRUE helps in debugging in case of errors  | 
use_cache | 
 setting FALSE reinstalls scripts even if they are already installed  | 
force | 
 setting TRUE doesn't prompt for confirmation while installing committed datasets when changes are discovered in environment  | 
hash_value | 
 the hash value of committed dataset when installing from provenance directory  | 
Value
No return value, installs datasets into MySQL database
Examples
## Not run: 
rdataretriever::install_mysql(dataset = "portal", user = "postgres", password = "abcdef")
## End(Not run)
Install datasets via the Data Retriever.
Description
Data is stored in PostgreSQL database
Usage
install_postgres(
  dataset,
  user = "postgres",
  password = "",
  host = "localhost",
  port = 5432,
  database = "postgres",
  database_name = "{db}",
  table_name = "{db}.{table}",
  bbox = list(),
  debug = FALSE,
  use_cache = TRUE,
  force = FALSE,
  hash_value = NULL
)
Arguments
dataset | 
 the name of the dataset that you wish to install or path to a committed dataset zip file  | 
user | 
 username for database connection  | 
password | 
 password for database connection  | 
host | 
 hostname for connection  | 
port | 
 port number for connection  | 
database | 
 the database name default is postres  | 
database_name | 
 database schema name in which dataset will be installed  | 
table_name | 
 table name specified especially for datasets containing one file  | 
bbox | 
 optional extent values used to fetch data from the spatial dataset  | 
debug | 
 setting TRUE helps in debugging in case of errors  | 
use_cache | 
 setting FALSE reinstalls scripts even if they are already installed  | 
force | 
 setting TRUE doesn't prompt for confirmation while installing committed datasets when changes are discovered in environment  | 
hash_value | 
 the hash value of committed dataset when installing from provenance directory  | 
Value
No return value, installs datasets into PostgreSQL database
Examples
## Not run: 
rdataretriever::install_postgres(dataset = "portal", user = "postgres", password = "abcdef")
## End(Not run)
install the python module 'retriever'
Description
install the python module 'retriever'
Usage
install_retriever(method = "auto", conda = "auto")
Arguments
method | 
 Installation method. By default, "auto" automatically finds a method that will work in the local environment. Change the default to force a specific installation method. Note that the "virtualenv" method is not available on Windows.  | 
conda | 
 The path to a   | 
Value
No return value, install the python module 'retriever'
Install datasets via the Data Retriever.
Description
Data is stored in SQLite database
Usage
install_sqlite(
  dataset,
  file = "sqlite.db",
  table_name = "{db}_{table}",
  data_dir = getwd(),
  debug = FALSE,
  use_cache = TRUE,
  force = FALSE,
  hash_value = NULL
)
Arguments
dataset | 
 the name of the dataset that you wish to install or path to a committed dataset zip file  | 
file | 
 Sqlite database file name or path  | 
table_name | 
 table name for installing of dataset  | 
data_dir | 
 the dir path to store the db, defaults to working dir  | 
debug | 
 setting TRUE helps in debugging in case of errors  | 
use_cache | 
 setting FALSE reinstalls scripts even if they are already installed  | 
force | 
 setting TRUE doesn't prompt for confirmation while installing committed datasets when changes are discovered in environment  | 
hash_value | 
 the hash value of committed dataset when installing from provenance directory  | 
Value
No return value, installs datasets into SQLite database
Examples
## Not run: 
rdataretriever::install_sqlite(dataset = "iris", file = "sqlite.db")
## End(Not run)
Install datasets via the Data Retriever.
Description
Data is stored in XML files
Usage
install_xml(
  dataset,
  table_name = "{db}_{table}.xml",
  data_dir = getwd(),
  debug = FALSE,
  use_cache = TRUE,
  force = FALSE,
  hash_value = NULL
)
Arguments
dataset | 
 the name of the dataset that you wish to install or path to a committed dataset zip file  | 
table_name | 
 the name of the database file to store data  | 
data_dir | 
 the dir path to store data, defaults to working dir  | 
debug | 
 setting TRUE helps in debugging in case of errors  | 
use_cache | 
 Setting FALSE reinstalls scripts even if they are already installed  | 
force | 
 setting TRUE doesn't prompt for confirmation while installing committed datasets when changes are discovered in environment  | 
hash_value | 
 the hash value of committed dataset when installing from provenance directory  | 
Value
No return value, installs datasets into XML
Examples
## Not run: 
rdataretriever::install_xml("iris")
## End(Not run)
Update the retriever's global_script_list with the scripts present in the ~/.retriever directory.
Description
Update the retriever's global_script_list with the scripts present in the ~/.retriever directory.
Usage
reload_scripts()
Value
No return value, fetches most resent scripts
Examples
## Not run: 
rdataretriever::reload_scripts()
## End(Not run)
Reset the scripts or data(raw_data) directory or both
Description
Reset the scripts or data(raw_data) directory or both
Usage
reset(scope = "all")
Arguments
scope | 
 All resets both scripst and data directory  | 
Value
No return value, resets the scripts and the data directory
Examples
## Not run: 
rdataretriever::reset("iris")
## End(Not run)
Returns the list of dataset names after autocompletion
Description
Returns the list of dataset names after autocompletion
Usage
socrata_autocomplete_search(dataset)
Arguments
dataset | 
 the name of the dataset  | 
Value
No return value, show dataset names after autocompletion
Examples
## Not run: 
rdataretriever::socrata_autocomplete_search()
## End(Not run)
Get socrata dataset info
Description
Get socrata dataset info
Usage
socrata_dataset_info(dataset_name)
Arguments
dataset_name | 
 dataset name to obtain info  | 
Value
No return value, shows socrata dataset info
Examples
## Not run: 
rdataretriever::socrata_dataset_info()
## End(Not run)
Updates the datasets_url.json from the github repo
Description
Updates the datasets_url.json from the github repo
Usage
update_rdataset_catalog(test = FALSE)
Arguments
test | 
 flag set when testing  | 
Value
No return value, updates the datasets_url.json
Examples
## Not run: 
rdataretriever::update_rdataset_catalog()
## End(Not run)
Setting path of retriever
Description
Setting path of retriever
Usage
use_RetrieverPath(path)
Arguments
path | 
 location of retriever in the system  | 
Value
No return value, setting path of retriever
Examples
## Not run: 
rdataretriever::use_RetrieverPath("/home/<system_name>/anaconda2/envs/py27/bin/")
## End(Not run)