Getting started with ccdR:
Introduction Vignette

Center for Computational Toxicology and Exposure

Introduction to the CompTox Chemicals Dashboard (CCD)

Accessing chemical data is a vital step in many workflows related to chemical, biological, and environmental modeling. While there are many resources available from which one can pull data, the CompTox Chemicals Dashboard (CCD), built and maintained by the United States Environmental Protection Agency, is particularly well-designed and suitable for these purposes. Originally introduced in The CompTox Chemistry Dashboard: a community data resource for environmental chemistry, the CCD contains information on over 1.2 million chemicals as of December 2023.

The CCD includes chemical information from many different domains, including physicochemical, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay data. For information on data sources and current versions, please review the CCD Release Notes.

The CCD can be queried for one chemical at a time or using batch search.

Application Programming Interfaces (APIs) for Automated Batch Search of the CCD

Recently, the Center for Computational Toxicology and Exposure (CCTE) developed a set of Application Programming Interfaces (APIs) that allows programmatic access to the CCD, bypassing the manual steps of the web-based batch search workflow. APIs effectively automate the process of accessing and downloading the data that populates the CCD.

The CCTE APIs are publicly available at no cost to the user. However, in order to use the CCTE APIs, users must have a individual API key. The API key uniquely identifies the user to the CCD servers and verifies that you have permission to access the database. Getting an API key is free, but requires contacting the API support team at ccte_api@epa.gov.

The APIs are organized into sets of “endpoints” by data domains: Chemical, Hazard, and Bioactivity. A view from the Chemical APIs web interface is pictured below.

Figure 4: CCTE API Chemical Endpoints

On the left side of each domain’s web interface page, there will be several different tabs listed depending on information requests available within the domain. In Figure 4, the Chemical Details Resource endpoint provides basic chemical information; the Chemical Property Resource endpoint provides more comprehensive physico-chemical property information; the Chemical Fate Resource endpoint provides chemical fate and transport information; and so on.

Authentication

Authentication, found in upper left tab on each web interface page, is required to use the APIs. To authenticate themselves in the API web interface, the user must input their unique API key.

Figure 5: API Key Authentication

Request Construction

APIs effectively automate the process of accessing and downloading the data that populates the CCD. APIs do this via requests using the Hypertext Transfer Protocol (HTTP) that enables communication between clients (e.g. your computer) and servers (e.g. the CCD).

In the CCTE API web interface, the colored boxes next to each endpoint indicate the type of the associated HTTP method. GET is used to request data from a specific web resource (e.g. a specific URL); POST is used to send data to a server to create or update a web resource. For the CCTE APIs, POST requests are used to perform multiple (batch) searches in a single API call; GET requests are used for non-batch searches.

You do not need to understand the details of POST and GET requests in order to use the API. Let’s consider constructing an API request to Get data by dtxsid under the Chemical Details Resource.

Figure 6: Get Details by DTXSID

The web interface has two subheadings:

  • Path Parameters contain user-specified parameters that are required in order to tell the API what URL (web address) to access. In this case, the required parameter is a string for the DTXSID identifying the chemical to be searched.
  • Query-String Parameters contain user-specific parameters (usually optional) that tell the API what specific type(s) of information to download from the specified URL. In this case, the optional parameter is a projection parameter, a string that can take one of five values (chemicaldetailall, chemicaldetailstandard, chemicalidentifier, chemicalstructure, ntatoolkit). Depending on the value of this string, the API can return different sets of information about the chemical. If the projection parameter is left blank, then a default set of chemical information is returned.

The default return format is displayed below and includes a variety of fields with data types represented.

Figure 7: Get Details by DTXSID, Return Format

Pictured below is an example of returned Details for Bisphenol A with the chemicaldetailstandard value for projection selected.

Figure 8: Returned Details for Bisphenol A

Introduction to ccdR

Formatting an http request is not necessarily intuitive nor worth the time for someone not already familiar with the process, so these endpoints may provide a resource that for many would require a significant investment in time and energy to learn how to use. However, there is a solution to this in the form of the R package ccdR.

ccdR was developed to streamline the process of accessing the information available through the CCTE APIs without requiring prior knowledge of how to use APIs.

Package Settings

Users can run library(ccdR) to install from CRAN or install the development version of ccdR like so:

if (!library(devtools, logical.return = TRUE)){
  install.packages(devtools)
  library(devtools)}

devtools::install_github("USEPA/ccdR")

API Key Storage

As previously described, a user must have an API key to use in order to access the CCTE APIs. This can be obtained from the admins of the CCTE APIs by emailing CCTE API Admins. In the example code, the API key will be stored as the variable my_key.

my_key <- 'YOUR_CCTE_API_key'

For general use of the package, the user may use the function register_ccdr() to store the API key in the current session or more permanently for access across sessions.

# This stores the key in the current session
register_ccdr(key = '<YOUR API KEY>')

# This stores the key across multiple sessions and only needs to be run once. If the key changes, rerun this with the new key.
register_ccdr(key = '<YOUR API KEY>', write = TRUE)

Once the API key is stored, the default display setting is turned off for protection. To change this, use the following functions as demonstrated.

# To show the API key
ccdr_show_api_key()
getOption('ccdr')$display_api_key

# To hide the API key
ccdr_hide_api_key()
getOption('ccdr')$display_api_key

Finally, to access the key, use the ccte_key() function.

ccte_key()

Quick Start Examples

As some quick start examples, we demonstrate the ease* of retrieving the information across endpoints for Bisphenol A using ccdR. *This is in contrast to the approach using the CCD or API web interface.

For additional examples and more comprehensive documentation on each endpoint, consider reviewing the other ccdR vignettes for the data domain of interest.

Chemical APIs

In this section, several ccdR functions are used to access different types of information from the CCTE Chemical APIs.

Chemical Details Resource

The function get_chemical_details() takes in either the DTXSID or DTXCID of a chemical and the user-specific API key. Relevant chemical details for Bisphenol A, which has DTXSID “DTXSID7020182”, are obtained in a data.table. output below:

bpa_details <- get_chemical_details(DTXSID = 'DTXSID7020182',
                                    API_key = my_key)
bpa_details <- data.table::as.data.table(bpa_details)
head(bpa_details)
Chemical Property Resource

The function get_chem_info() returns phys-chem properties for the selected chemical, and can be filtered to ‘experimental’ or ‘predicted’ if desired.

Here all phys-chem properties are returned for Bisphenol A:

bpa_info <- get_chem_info(DTXSID = "DTXSID7020182",
                          API_key = my_key)
bpa_info <- data.table::as.data.table(bpa_info)

head(bpa_info)

Request can be filtered to return experimental results only.

bpa_info_experimental <- get_chem_info(DTXSID = "DTXSID7020182",
                                       type = 'experimental',
                                       API_key = my_key)
bpa_info_experimental <- data.table::as.data.table(bpa_info_experimental)

head(bpa_info_experimental)

Hazard APIs

In this section, several ccdR functions are used to access different types of information from the CCTE Hazard APIs.

Hazard Resource

The function get_hazard_by_dtxsid() retrieves hazard data (all human or ecological toxicity data) for a given chemical based on input DTXSID. get_human_hazard_by_dtxsid() and get_ecotox_hazard_by_dtxsid() can filter returned hazard results for the given chemical to human or ecological toxicity data, respectively.

Here all hazard data is returned for Bisphenol A:

bpa_hazard <- get_hazard_by_dtxsid(DTXSID = 'DTXSID7020182',
                                   API_key = my_key)
bpa_hazard <- data.table::as.data.table(bpa_hazard)
head(bpa_hazard)

Request can be refined to return results for human hazard,

bpa_human_hazard <- get_human_hazard_by_dtxsid(DTXSID = 'DTXSID7020182',
                                               API_key = my_key)
bpa_human_hazard <- data.table::as.data.table(bpa_human_hazard)
head(bpa_human_hazard)

or EcoTox results.

bpa_eco_hazard <- get_ecotox_hazard_by_dtxsid(DTXSID = 'DTXSID7020182',
                                              API_key = my_key)
bpa_eco_hazard <- data.table::as.data.table(bpa_eco_hazard)
head(bpa_eco_hazard)

Bioactivity APIs

In this section, several ccdR functions are used to access different types of information from the CCTE Bioactivity APIs.

Bioactivity Resource

The function get_bioactivity_details() retrieves all bioactivity data for a given chemical based on input DTXSID.

bpa_bioactivity <- get_bioactivity_details(DTXSID = 'DTXSID7020182',
                                           API_key = my_key)

bpa_bioactivity <- data.table::as.data.table(bpa_bioactivity)
head(bpa_bioactivity)

The function get_bioactivity_details() can also be used to retrieve all bioactivity data for a given endpoint, based on input AEID (assay endpoint identifier).

assay_id_search <- get_bioactivity_details(AEID = 42,
                                           API_key = my_key)
assay_id_search <- data.table::as.data.table(assay_id_search)
head(assay_id_search)

Conclusion

The ccdR package provides a streamlined approach to accessing data from the CCD for users with little or no prior experience using APIs.

For additional examples and more comprehensive documentation on each endpoint, consider reviewing the other ccdR vignettes for the data domain of interest.