| Title: | Query the FDA Global Substance Registration System (GSRS) API |
|---|---|
| Description: | Provides functions to query the FDA Global Substance Registration System (GSRS) REST API (<https://gsrs.ncats.nih.gov/api/v1/>). Enables programmatic access to substance records, UNII identifiers, synonyms, external codes, and chemical structures for over 170,000 registered substances. |
| Authors: | Claudio Zanettini [aut, cre] (ORCID: <https://orcid.org/0000-0001-5043-8033>) |
| Maintainer: | Claudio Zanettini <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-05 08:53:30 UTC |
| Source: | https://github.com/c1au6i0/rgsrs |
Convenience wrapper that calls gsrs_substance(), gsrs_names(),
gsrs_codes(), gsrs_structure(), and gsrs_hierarchy() in sequence and
returns a named list containing all five data frames. Each sub-function uses
with_graceful_exit internally, so partial failures return NULL for that
element without aborting the whole call.
gsrs_all(unii, verbose = TRUE, delay = 0.5)gsrs_all(unii, verbose = TRUE, delay = 0.5)
unii |
Character vector of one or more UNII codes. |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups.
Default |
A named list with five elements:
Data frame from gsrs_substance().
Data frame from gsrs_names().
Data frame from gsrs_codes().
Data frame from gsrs_structure().
Data frame from gsrs_hierarchy().
Returns NULL on error (with a warning).
gsrs_substance(), gsrs_names(), gsrs_codes(),
gsrs_structure(), gsrs_hierarchy()
Sys.sleep(2) out <- gsrs_all("R16CO5Y76E") # aspirin if (!is.null(out)) { print(out$substance) print(head(out$names)) print(head(out$codes)) print(out$structure[, c("smiles", "formula", "mwt", "inchi_key")]) print(out$hierarchy[, c("depth", "type", "approval_id", "name")]) }Sys.sleep(2) out <- gsrs_all("R16CO5Y76E") # aspirin if (!is.null(out)) { print(out$substance) print(head(out$names)) print(head(out$codes)) print(out$structure[, c("smiles", "formula", "mwt", "inchi_key")]) print(out$hierarchy[, c("depth", "type", "approval_id", "name")]) }
Retrieves a paginated list of all substance records from
GET /api/v1/substances. Useful for bulk workflows or building a local
catalogue. Use top and skip to page through the ~170,000 available
records, or set top = Inf to fetch all (slow — use with care).
gsrs_browse(top = 10L, skip = 0L, verbose = TRUE, delay = 0.5)gsrs_browse(top = 10L, skip = 0L, verbose = TRUE, delay = 0.5)
top |
Integer. Maximum number of records to return per request.
Default |
skip |
Integer. Number of records to skip (offset). Default |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between paginated requests when
|
A data frame with the same columns as gsrs_search().
Returns NULL on error (with a warning).
gsrs_search(), gsrs_substance()
Sys.sleep(2) # Fetch the first 5 substance records out <- gsrs_browse(top = 5, verbose = FALSE) if (!is.null(out)) print(out[, c("approval_id", "preferred_name", "substance_class")])Sys.sleep(2) # Fetch the first 5 substance records out <- gsrs_browse(top = 5, verbose = FALSE) if (!is.null(out)) print(out[, c("approval_id", "preferred_name", "substance_class")])
A convenience wrapper that resolves one or more substance identifiers to GSRS UNIIs and then fetches the embedded chemical structure data for each substance. The result is one wide row per input identifier containing both the resolved metadata and the full structure record.
gsrs_chem_info( identifiers, type = c("name", "cas", "unii", "inchikey", "smiles"), verbose = TRUE, delay = 0.5 )gsrs_chem_info( identifiers, type = c("name", "cas", "unii", "inchikey", "smiles"), verbose = TRUE, delay = 0.5 )
identifiers |
Character vector of substance identifiers. |
type |
Character scalar. The identifier type. One of:
|
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual API calls.
Default |
A data frame with one row per input identifier and columns:
The identifier supplied by the caller.
The identifier type ("name" or "cas").
Resolved UNII / approval ID.
Preferred display name in GSRS.
Substance class (e.g., "chemical").
Canonical SMILES string.
Molecular formula (e.g., "C9H8O4").
Molecular weight (numeric).
Standard InChIKey.
Full InChI string.
Stereochemistry descriptor.
Optical activity descriptor.
Formal charge (integer).
Number of stereocenters.
Number of defined stereocenters.
Number of E/Z double-bond stereocenters.
MDL molfile as a string.
Date the structure response was received.
Unresolved identifiers or non-chemical substances produce a row of NAs
with query and type set. Returns NULL on error (with a warning).
gsrs_structure(), gsrs_unii_from_name(), gsrs_codes(),
gsrs_structure_search()
Sys.sleep(2) out <- gsrs_chem_info(c("aspirin", "ibuprofen"), type = "name") if (!is.null(out)) print(out[, c("query", "unii", "formula", "mwt")]) Sys.sleep(2) out_cas <- gsrs_chem_info(c("50-78-2", "15687-27-1"), type = "cas") if (!is.null(out_cas)) print(out_cas[, c("query", "unii", "formula", "mwt")]) Sys.sleep(2) out_unii <- gsrs_chem_info("R16CO5Y76E", type = "unii") if (!is.null(out_unii)) print(out_unii[, c("query", "formula", "mwt")]) Sys.sleep(2) out_ik <- gsrs_chem_info("BSYNRYMUTXBXSQ-UHFFFAOYSA-N", type = "inchikey") if (!is.null(out_ik)) print(out_ik[, c("query", "unii", "formula")]) Sys.sleep(2) out_smi <- gsrs_chem_info("CC(=O)Oc1ccccc1C(=O)O", type = "smiles") if (!is.null(out_smi)) print(out_smi[, c("query", "unii", "formula")])Sys.sleep(2) out <- gsrs_chem_info(c("aspirin", "ibuprofen"), type = "name") if (!is.null(out)) print(out[, c("query", "unii", "formula", "mwt")]) Sys.sleep(2) out_cas <- gsrs_chem_info(c("50-78-2", "15687-27-1"), type = "cas") if (!is.null(out_cas)) print(out_cas[, c("query", "unii", "formula", "mwt")]) Sys.sleep(2) out_unii <- gsrs_chem_info("R16CO5Y76E", type = "unii") if (!is.null(out_unii)) print(out_unii[, c("query", "formula", "mwt")]) Sys.sleep(2) out_ik <- gsrs_chem_info("BSYNRYMUTXBXSQ-UHFFFAOYSA-N", type = "inchikey") if (!is.null(out_ik)) print(out_ik[, c("query", "unii", "formula")]) Sys.sleep(2) out_smi <- gsrs_chem_info("CC(=O)Oc1ccccc1C(=O)O", type = "smiles") if (!is.null(out_smi)) print(out_smi[, c("query", "unii", "formula")])
For each supplied UNII, calls GET /api/v1/substances(<UNII>)/codes and
returns all registered cross-references as a tidy data frame. These include
CAS numbers, PubChem CIDs, ChEMBL IDs, WHO-ATC codes, NDF-RT codes,
DrugBank IDs, and many more.
gsrs_codes(unii, code_system = NULL, verbose = TRUE, delay = 0.5)gsrs_codes(unii, code_system = NULL, verbose = TRUE, delay = 0.5)
unii |
Character vector of one or more UNII codes. |
code_system |
Character vector of code systems to filter on
(e.g., |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
A data frame with columns:
External database / code system name
(e.g., "CAS", "PUBCHEM", "ChEMBL", "WHO-ATC").
The identifier in that system.
"PRIMARY" or "ALTERNATIVE".
URL to the external record (when available).
Additional context for the code (e.g., ATC path).
Logical; TRUE for classification codes.
Internal GSRS UUID for the code record.
Date the response was received.
The UNII supplied by the caller.
Returns NULL on error (with a warning).
gsrs_substance(), gsrs_names(), gsrs_search()
Sys.sleep(2) # All codes for aspirin out <- gsrs_codes("R16CO5Y76E") if (!is.null(out)) print(head(out)) Sys.sleep(2) # Only CAS and PubChem codes out_cas <- gsrs_codes("R16CO5Y76E", code_system = c("CAS", "PUBCHEM")) if (!is.null(out_cas)) print(out_cas)Sys.sleep(2) # All codes for aspirin out <- gsrs_codes("R16CO5Y76E") if (!is.null(out)) print(head(out)) Sys.sleep(2) # Only CAS and PubChem codes out_cas <- gsrs_codes("R16CO5Y76E", code_system = c("CAS", "PUBCHEM")) if (!is.null(out_cas)) print(out_cas)
For each supplied UNII, calls GET /api/v1/substances(<UNII>)/@hierarchy
and returns the flat parent/child relationship tree as a tidy data frame.
This is useful for navigating relationships such as salt forms to free base,
active metabolites, or component substances.
gsrs_hierarchy(unii, verbose = TRUE, delay = 0.5)gsrs_hierarchy(unii, verbose = TRUE, delay = 0.5)
unii |
Character vector of one or more UNII codes. |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
A data frame with columns:
Node identifier within the hierarchy tree (string index).
Parent node identifier ("#" for root nodes).
Depth in the tree (0 = root).
Node type (e.g., "ROOT", "ACTIVE MOIETY",
"SALT/SOLVATE").
Human-readable label including UNII and name.
Logical; TRUE if node has children.
UNII of the substance at this node.
Preferred name at this node.
Internal GSRS UUID of the related substance.
Substance class at this node.
Logical; TRUE if the node substance is deprecated.
Date the response was received.
The UNII supplied by the caller.
Returns NULL on error (with a warning).
Sys.sleep(2) out <- gsrs_hierarchy("R16CO5Y76E") # aspirin if (!is.null(out)) print(out[, c("depth", "type", "approval_id", "name")])Sys.sleep(2) out <- gsrs_hierarchy("R16CO5Y76E") # aspirin if (!is.null(out)) print(out[, c("depth", "type", "approval_id", "name")])
For each supplied UNII, calls GET /api/v1/substances(<UNII>)/names and
returns every registered name record as a tidy data frame row.
gsrs_names(unii, verbose = TRUE, delay = 0.5)gsrs_names(unii, verbose = TRUE, delay = 0.5)
unii |
Character vector of one or more UNII codes. |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
A data frame with columns:
The name string.
Standardised (uppercased) name.
Name type code (e.g., "bn" brand name, "cn" common name,
"sys" systematic name, "of" official name).
Logical; TRUE when this is the preferred name.
Logical; TRUE when this name is shown by default.
Semicolon-separated language codes.
Semicolon-separated domain tags.
Internal GSRS UUID for the name record.
Date the response was received.
The UNII supplied by the caller.
Returns NULL on error (with a warning).
gsrs_substance(), gsrs_codes(), gsrs_search()
Sys.sleep(2) out <- gsrs_names("R16CO5Y76E") # aspirin if (!is.null(out)) print(head(out))Sys.sleep(2) out <- gsrs_names("R16CO5Y76E") # aspirin if (!is.null(out)) print(head(out))
Searches the FDA Global Substance Registration System (GSRS) using a free-text or Lucene-style field query. Returns a tidy data frame of matching substance records with key metadata fields.
gsrs_search(query, top = 10L, skip = 0L, verbose = TRUE, delay = 0.5)gsrs_search(query, top = 10L, skip = 0L, verbose = TRUE, delay = 0.5)
query |
Character string. The search query. Supports:
|
top |
Integer. Maximum number of records to return per request.
Default |
skip |
Integer. Number of records to skip (offset). Default |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between paginated requests.
Default |
A data frame with columns:
Internal GSRS UUID of the substance.
FDA UNII / approval ID.
Preferred display name.
Substance class (e.g., "chemical",
"structurallyDiverse").
Record status (e.g., "approved").
"PRIMARY" or "ALTERNATIVE".
"COMPLETE" or "INCOMPLETE".
Record version string.
URL to retrieve all names for this substance.
URL to retrieve all codes for this substance.
Full URL for this substance record.
Date the response was received from the server.
Returns NULL on error (with a warning).
gsrs_substance(), gsrs_names(), gsrs_codes()
Sys.sleep(2) out <- gsrs_search("aspirin", top = 5) if (!is.null(out)) print(head(out))Sys.sleep(2) out <- gsrs_search("aspirin", top = 5) if (!is.null(out)) print(head(out))
For each supplied UNII, fetches the full substance record from
GET /api/v1/substances(<UNII>) and extracts the embedded structure
object, returning chemical identifiers and properties as a tidy data frame.
gsrs_structure(unii, verbose = TRUE, delay = 0.5)gsrs_structure(unii, verbose = TRUE, delay = 0.5)
unii |
Character vector of one or more UNII codes. |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
A data frame with columns:
Canonical SMILES string.
Molecular formula (e.g., "C9H8O4").
Molecular weight (numeric).
Standard InChIKey.
Full InChI string.
Stereochemistry descriptor (e.g., "ACHIRAL",
"RACEMIC", "ABSOLUTE").
Optical activity (e.g., "UNSPECIFIED", "(+)",
"(-)").
Formal charge (integer).
Number of stereocenters.
Number of defined stereocenters.
Number of E/Z double-bond stereocenters.
MDL molfile as a string.
Date the response was received.
The UNII supplied by the caller.
Non-chemical substances (proteins, polymers, etc.) return a row of NAs
with query set. Returns NULL on error (with a warning).
gsrs_substance(), gsrs_structure_search(), gsrs_names(),
gsrs_codes()
Sys.sleep(2) out <- gsrs_structure("R16CO5Y76E") # aspirin if (!is.null(out)) print(out[, c("smiles", "formula", "mwt", "inchi_key")])Sys.sleep(2) out <- gsrs_structure("R16CO5Y76E") # aspirin if (!is.null(out)) print(out[, c("smiles", "formula", "mwt", "inchi_key")])
Searches the FDA Global Substance Registration System for substances matching a chemical structure query supplied as a SMILES string. Supports substructure, similarity, exact-match, and flexible (disconnected moiety) search types.
gsrs_structure_search( smiles, type = c("sub", "sim", "exact", "flex"), cutoff = 0.8, top = 10L, verbose = TRUE )gsrs_structure_search( smiles, type = c("sub", "sim", "exact", "flex"), cutoff = 0.8, top = 10L, verbose = TRUE )
smiles |
Character string. A valid SMILES or SMARTS string describing
the query structure (e.g., |
type |
Character string. Search type. One of:
|
cutoff |
Numeric in |
top |
Integer. Maximum number of records to return. Default |
verbose |
Logical. If |
A data frame with the same columns as gsrs_search(), plus a
query_smiles column recording the input SMILES. Returns NULL on error
(with a warning).
gsrs_structure(), gsrs_search()
Sys.sleep(2) # Exact match for aspirin out <- gsrs_structure_search("CC(=O)Oc1ccccc1C(=O)O", type = "exact") if (!is.null(out)) print(out[, c("approval_id", "preferred_name")]) Sys.sleep(2) # Similarity search out_sim <- gsrs_structure_search("CC(=O)Oc1ccccc1C(=O)O", type = "sim", cutoff = 0.7, top = 5) if (!is.null(out_sim)) print(out_sim[, c("approval_id", "preferred_name")])Sys.sleep(2) # Exact match for aspirin out <- gsrs_structure_search("CC(=O)Oc1ccccc1C(=O)O", type = "exact") if (!is.null(out)) print(out[, c("approval_id", "preferred_name")]) Sys.sleep(2) # Similarity search out_sim <- gsrs_structure_search("CC(=O)Oc1ccccc1C(=O)O", type = "sim", cutoff = 0.7, top = 5) if (!is.null(out_sim)) print(out_sim[, c("approval_id", "preferred_name")])
Retrieves the top-level metadata for a single substance identified by its
UNII (Unique Ingredient Identifier / approval ID). Internally this performs
a filtered search using root_approvalID:<unii>.
gsrs_substance(unii, verbose = TRUE, delay = 0.5)gsrs_substance(unii, verbose = TRUE, delay = 0.5)
unii |
Character vector of one or more UNII codes
(e.g., |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups when
|
A data frame with the same columns as gsrs_search(), with one row
per input UNII. Rows for unrecognised UNIIs will contain NA except for
the query column (which is always set to the input UNII). Returns
NULL on error (with a warning).
gsrs_search(), gsrs_names(), gsrs_codes()
Sys.sleep(2) out <- gsrs_substance("R16CO5Y76E") # aspirin if (!is.null(out)) print(out)Sys.sleep(2) out <- gsrs_substance("R16CO5Y76E") # aspirin if (!is.null(out)) print(out)
For each supplied name, queries GSRS using root_names:<name> and returns
the best-matching UNII together with the preferred substance name and
substance class. This is useful for converting common or systematic names
to the canonical FDA UNII identifier.
gsrs_unii_from_name(names, top = 1L, verbose = TRUE, delay = 0.5)gsrs_unii_from_name(names, top = 1L, verbose = TRUE, delay = 0.5)
names |
Character vector of substance names to resolve. |
top |
Integer. Maximum number of candidate records to consider per
name query. Default |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between individual lookups.
Default |
A data frame with columns:
The UNII / approval ID of the matched substance.
Preferred display name in GSRS.
Substance class (e.g., "chemical").
Record status.
Internal GSRS UUID.
Date the response was received.
The name supplied by the caller.
Unresolved names produce a row of NAs with query set. Returns NULL
on error (with a warning).
gsrs_substance(), gsrs_search(), gsrs_names()
Sys.sleep(2) out <- gsrs_unii_from_name(c("aspirin", "ibuprofen")) if (!is.null(out)) print(out)Sys.sleep(2) out <- gsrs_unii_from_name(c("aspirin", "ibuprofen")) if (!is.null(out)) print(out)
Fetches all (or a page of) controlled vocabulary entries from
GET /api/v1/vocabularies. The result is one row per vocabulary term,
with the parent domain and type attached to every row. This is useful for
understanding allowed values for fields such as name type, substance class,
relationship type, code system, and more.
gsrs_vocabularies(top = NULL, verbose = TRUE, delay = 0.5)gsrs_vocabularies(top = NULL, verbose = TRUE, delay = 0.5)
top |
Integer. Maximum number of vocabulary domains to return per
request. Default |
verbose |
Logical. If |
delay |
Numeric. Seconds to wait between paginated requests.
Default |
A data frame with columns:
Vocabulary domain name (e.g., "NAME_TYPE",
"SUBSTANCE_CLASS", "RELATIONSHIP_TYPE").
Vocabulary term type identifier.
Logical; TRUE if the vocabulary can be extended.
Logical; TRUE if the vocabulary supports filtering.
The controlled term value (used in the API/data).
Human-readable display label for the term.
Logical; TRUE if the term is hidden from the UI.
Logical; TRUE if the term is selected by default.
Date the response was received.
Returns NULL on error (with a warning).
Sys.sleep(2) vocab <- gsrs_vocabularies(verbose = FALSE) if (!is.null(vocab)) { # See all name type values print(vocab[vocab$domain == "NAME_TYPE", c("value", "display")]) }Sys.sleep(2) vocab <- gsrs_vocabularies(verbose = FALSE) if (!is.null(vocab)) { # See all name type values print(vocab[vocab$domain == "NAME_TYPE", c("value", "display")]) }
Each element of df_list is written to its own sheet. Requires the
openxlsx package (listed in Suggests).
write_dataframes_to_excel(df_list, filename)write_dataframes_to_excel(df_list, filename)
df_list |
A named list of data frames. |
filename |
Character string. Path to the output |
Invisible filename.
tmp <- tempfile(fileext = ".xlsx") write_dataframes_to_excel(list(sheet1 = mtcars, sheet2 = iris), tmp)tmp <- tempfile(fileext = ".xlsx") write_dataframes_to_excel(list(sheet1 = mtcars, sheet2 = iris), tmp)