% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/get_cheminfo.R
\name{get_cheminfo}
\alias{get_cheminfo}
\title{Retrieve chemical information available from HTTK package}
\usage{
get_cheminfo(
  info = "CAS",
  species = "Human",
  fup.lod.default = 0.005,
  model = "3compartmentss",
  default.to.human = FALSE,
  median.only = FALSE,
  fup.ci.cutoff = TRUE,
  clint.pvalue.threshold = 0.05,
  physchem.exclude = TRUE,
  class.exclude = TRUE,
  suppress.messages = FALSE
)
}
\arguments{
\item{info}{A single character vector (or collection of character vectors)
from "Compound", "CAS", "DTXSID, "logP", "pKa_Donor"," pKa_Accept", "MW", "Clint",
"Clint.pValue", "Funbound.plasma","Structure_Formula", or "Substance_Type". info="all"
gives all information for the model and species.}

\item{species}{Species desired (either "Rat", "Rabbit", "Dog", "Mouse", or
default "Human").}

\item{fup.lod.default}{Default value used for fraction of unbound plasma for
chemicals where measured value was below the limit of detection. Default
value is 0.0005.}

\item{model}{Model used in calculation, 'pbtk' for the multiple compartment
model, '1compartment' for the one compartment model, '3compartment' for
three compartment model, '3compartmentss' for the three compartment model
without partition coefficients, or 'schmitt' for chemicals with logP and
fraction unbound (used in predict_partitioning_schmitt).}

\item{default.to.human}{Substitutes missing values with human values if
true.}

\item{median.only}{Use median values only for fup and clint.  Default is FALSE.}

\item{fup.ci.cutoff}{Boolean eliminating uncertain fup estimates.
If TRUE, fup values whose 95% credible interval
spans 0.1 to 0.9 (or more) are eliminated. (Default value is TRUE.)}

\item{clint.pvalue.threshold}{Hepatic clearance for chemicals where the in
vitro clearance assay result has a p-values greater than the threshold are
set to zero.}

\item{physchem.exclude}{Exclude chemicals on the basis of physico-chemical
properties (currently only Henry's law constant) as specified by 
the relevant modelinfo_[MODEL] file (default TRUE).}

\item{class.exclude}{Exclude chemical classes identified as outside of 
domain of applicability by the relevant modelinfo_[MODEL] file (default TRUE).}

\item{suppress.messages}{Whether or not the output messages are suppressed 
(default FALSE).}
}
\value{
\item{vector/data.table}{Table (if info has multiple entries) or 
vector containing a column for each valid entry 
specified in the argument "info" and a row for each chemical with sufficient
data for the model specified by argument "model":
\tabular{lll}{
\strong{Column} \tab \strong{Description} \tab \strong{units} \cr
Compound \tab The preferred name of the chemical compound \tab none \cr 
CAS \tab The preferred Chemical Abstracts Service Registry Number \tab none \cr  
DTXSID \tab DSSTox Structure ID 
(\url{https://comptox.epa.gov/dashboard}) \tab none \cr 
logP \tab The log10 octanol:water partition coefficient\tab log10 unitless ratio \cr 
MW \tab The chemical compound molecular weight \tab g/mol \cr 
pKa_Accept \tab The hydrogen acceptor equilibria concentrations 
\tab logarithm \cr   
pKa_Donor \tab The hydrogen donor equilibria concentrations 
 \tab logarithm \cr   
[SPECIES].Clint \tab (Primary hepatocyte suspension) 
intrinsic hepatic clearance. \emph{Entries with comma separated values are Bayesian estimates of
the Clint distribution - displayed as the median, 95th credible interval
(that is quantile 2.5 and 97.5, respectively), and p-value.} \tab uL/min/10^6 hepatocytes \cr    
[SPECIES].Clint.pValue \tab Probability that there is no clearance observed.
Values close to 1 indicate clearance is not statistically significant. \tab none \cr  
[SPECIES].Funbound.plasma \tab Chemical fraction unbound in presence of 
plasma proteins (fup). \emph{Entries with comma separated values are Bayesian estimates of
the fup distribution - displayed as the median and 95th credible interval
(that is quantile 2.5 and 97.5, respectively).} \tab unitless fraction \cr 
[SPECIES].Rblood2plasma \tab Chemical concentration blood to plasma ratio \tab unitless ratio \cr  
}
}
}
\description{
This function lists information on all the chemicals within HTTK for which 
there are sufficient data for the specified model and species. 
By default the function returns only CAS (that is, info="CAS"). 
The type of information available includes chemical identifiers 
("Compound", "CAS", "DTXSID"), in vitro
measurements ("Clint", "Clint.pvalue", "Funbound plasma", "Rblood2plasma"), 
and physico-chemical information ("Formula", "logMA", "logP", "MW",
"pKa_Accept", "pKa_Donor"). The argument "info" can be a single type of 
information, "all" information, or a vector of specific types of information.
The argument "model" defaults to 
"3compartmentss" and the argument "species" defaults to "human".  
Since different models have different 
requirements and not all chemicals have complete data, this function will 
return different numbers of chemicals depending on the model specified. If
a chemical is not listed by get_cheminfo then either the in vitro or
physico-chemical data needed are currently missing (but could potentially
be added using \code{\link{add_chemtable}}.
}
\details{
When default.to.human is set to TRUE, and the species-specific data,
Funbound.plasma and Clint, are missing from 
\code{\link{chem.physical_and_invitro.data}}, human values are given instead.

In some cases the rapid equilibrium dialysis method (Waters et al., 2008)
fails to yield detectable concentrations for the free fraction of chemical. 
In those cases we assume the compound is highly bound (that is, Fup approaches
zero). For some calculations (for example, steady-state plasma concentration)
there is precedent (Rotroff et al., 2010) for using half the average limit 
of detection, that is, 0.005 (this value is configurable via the argument
fup.lod.default). We do not recommend using other models where 
quantities like partition coefficients must be predicted using Fup. We also
do not recommend including the value 0.005 in training sets for Fup predictive
models.

\strong{Note} that in some cases the \strong{Funbound.plasma} (fup) and the 
\strong{intrinsic clearance} (clint) are
\emph{provided as a series of numbers separated by commas}. These values are the 
result of Bayesian analysis and characterize a distribution: the first value
is the median of the distribution, while the second and third values are the 
lower and upper 95th percentile (that is quantile 2.5 and 97.5) respectively.
For intrinsic clearance a fourth value indicating a p-value for a decrease is
provided. Typically 4000 samples were used for the Bayesian analysis, such
that a p-value of "0" is equivalent to "<0.00025". See Wambaugh et al. (2019)
for more details. If argument median.only == TRUE then only the median is
reported for parameters with Bayesian analysis distributions. If the 95% 
credible interval spans the range of 0.1 to 0.9 and fup.ci.cutoff is set to TRUE,
i.e., the default setting, then the Fup is treated as too uncertain and
the value NA is given.
}
\examples{

\donttest{
# List all CAS numbers for which the 3compartmentss model can be run in humans: 
get_cheminfo()

get_cheminfo(info=c('compound','funbound.plasma','logP'),model='pbtk') 
# See all the data for humans:
get_cheminfo(info="all")

TPO.cas <- c("741-58-2", "333-41-5", "51707-55-2", "30560-19-1", "5598-13-0", 
"35575-96-3", "142459-58-3", "1634-78-2", "161326-34-7", "133-07-3", "533-74-4", 
"101-05-3", "330-54-1", "6153-64-6", "15299-99-7", "87-90-1", "42509-80-8", 
"10265-92-6", "122-14-5", "12427-38-2", "83-79-4", "55-38-9", "2310-17-0", 
"5234-68-4", "330-55-2", "3337-71-1", "6923-22-4", "23564-05-8", "101-02-0", 
"140-56-7", "120-71-8", "120-12-7", "123-31-9", "91-53-2", "131807-57-3", 
"68157-60-8", "5598-15-2", "115-32-2", "298-00-0", "60-51-5", "23031-36-9", 
"137-26-8", "96-45-7", "16672-87-0", "709-98-8", "149877-41-8", "145701-21-9", 
"7786-34-7", "54593-83-8", "23422-53-9", "56-38-2", "41198-08-7", "50-65-7", 
"28434-00-6", "56-72-4", "62-73-7", "6317-18-6", "96182-53-5", "87-86-5", 
"101-54-2", "121-69-7", "532-27-4", "91-59-8", "105-67-9", "90-04-0", 
"134-20-3", "599-64-4", "148-24-3", "2416-94-6", "121-79-9", "527-60-6", 
"99-97-8", "131-55-5", "105-87-3", "136-77-6", "1401-55-4", "1948-33-0", 
"121-00-6", "92-84-2", "140-66-9", "99-71-8", "150-13-0", "80-46-6", "120-95-6",
"128-39-2", "2687-25-4", "732-11-6", "5392-40-5", "80-05-7", "135158-54-2", 
"29232-93-7", "6734-80-1", "98-54-4", "97-53-0", "96-76-4", "118-71-8", 
"2451-62-9", "150-68-5", "732-26-3", "99-59-2", "59-30-3", "3811-73-2", 
"101-61-1", "4180-23-8", "101-80-4", "86-50-0", "2687-96-9", "108-46-3", 
"95-54-5", "101-77-9", "95-80-7", "420-04-2", "60-54-8", "375-95-1", "120-80-9",
"149-30-4", "135-19-3", "88-58-4", "84-16-2", "6381-77-7", "1478-61-1", 
"96-70-8", "128-04-1", "25956-17-6", "92-52-4", "1987-50-4", "563-12-2", 
"298-02-2", "79902-63-9", "27955-94-8")
httk.TPO.rat.table <- subset(get_cheminfo(info="all",species="rat"),
 CAS \%in\% TPO.cas)
 
httk.TPO.human.table <- subset(get_cheminfo(info="all",species="human"),
 CAS \%in\% TPO.cas)
 
# create a data.frame with all the Fup values, we ask for model="schmitt" since
# that model only needs fup, we ask for "median.only" because we don't care
# about uncertainty intervals here:
fup.tab <- get_cheminfo(info="all",median.only=TRUE,model="schmitt")
# calculate the median, making sure to convert to numeric values:
median(as.numeric(fup.tab$Human.Funbound.plasma),na.rm=TRUE)
# calculate the mean:
mean(as.numeric(fup.tab$Human.Funbound.plasma),na.rm=TRUE)
# count how many non-NA values we have (should be the same as the number of 
# rows in the table but just in case we ask for non NA values:
sum(!is.na(fup.tab$Human.Funbound.plasma))
}

}
\references{
\insertRef{rotroff2010incorporating}{httk}

\insertRef{waters2008validation}{httk}

\insertRef{wambaugh2019assessing}{httk}
}
\author{
John Wambaugh, Robert Pearce, and Sarah E. Davidson
}
\keyword{Retrieval}
