Breadcrumbs

Glossary

There are a variety of terms that you’ll come across throughout the NAMHub and these help docs. Refer back to this page if you’re ever unsure about what something means or the difference between terms.


Access Request

An electronic application submitted via Synapse by a user seeking permission to access and use controlled access data. An access request may be submitted by a single user on behalf of several collaborating Synapse users from the same institution.


Annotation

Descriptive information added to files in an entity such as a project, file, folder, table, or view. This information, in the form of controlled vocabulary, provides context and additional details about the contents of an entity, making it easier for users to understand what the entity contains. Annotations are what allow users to systematically search for and find specific data or other resources of interest.


Bug

Term used to describe an error occurring in a computer program or hardware.


Complement-ARIE

Acronym for the Complement Animal Research in Experimentation program funded by the United States National Institutes of Health (NIH). This is an initiative to promote the development, standardization, validation, and use of human-based New Approach Methodologies (NAMs) in place of animal models to study diseases or test new drugs.


Controlled Access Data

Some human data on the portal is considered controlled access and requires you to request access by reading and electronically agreeing to data-specific terms. Read more about controlled access data here.


Controlled Value

A pre-formatted value that must be used as defined. Example: True instead of yes; female instead of woman.


Data Model

A model that organizes data elements and describes their relationships to one another, usually in a graph-based form such as a flowchart diagram. The structure of the data is dictated by the data model. In other words, a data model structures how information is organized and related in a particular context. In the context of the NAMHub, the data model typically refers to JSON-LD files used by SCHEMATIC (Schema Engine for Manifest Ingress and Curation).


Data Sharing Plan (DSP)

A document associated with a project or study that includes information such as anticipated types of data, estimated number of samples, expected data upload dates, data access and licensing requirements, and inclusion of human data. DSPs are used to facilitate data sharing through portals such as the NAMHub and to ensure that all necessary regulatory approvals and data access and use restrictions are in place before data is shared.


Dataset

Multiple similar data files bundled together into one large bulk file that can be downloaded at once. NAMHub datasets range in size from a few files to hundreds of files. Sometimes, datasets have also been processed or harmonized to improve their usefulness.


Digital Object Identifier (DOI)

A string of alphanumeric characters used to provide a digital link to an entity such as a journal article, abstract, or dataset.


File

Any type of individual, downloadable content within the NAMHub, such as individual data outputs from raw sequencing runs (e.g., FASTQ files).


File annotations

A set of controlled vocabulary associated with files that describes properties of the content of the files (often data) and enables queries. Also known as metadata, these annotations are essentially extra information about the files so that you can properly search and filter it.


File Schema

The JSON schema associated with a Synapse File Entity.


General Research Use (GRU)

A designation that indicates that data can be used for broad research purposes without limitations such as disease-specific research or institution-specific research. GRU is the broadest standard NIH consent group option for controlled access data.


Governance

A system of policies, procedures, and tools for managing and protecting data. Due to the open-access nature of the platform, Synapse operates under comprehensive governance policies that define the rights and responsibilities of Synapse users. This includes our standard operating procedures (SOPs), privacy policy, code of conduct, community standards, and more.


Grant

A sponsored project that is represented by a contract number and/or digital object identifier.


HIPAA

Acronym for the Health Insurance Portability and Accessibility Act of 1996, a United States federal law intended to prevent disclosure without consent of sensitive patient health information, to define standards for electronic health data security, and to enable insurance coverage portability across healthcare providers.


HIPAA-Limited Data

Data that excludes all protected health information (PHI, as defined by HIPAA) except for at least one of the following:

  • dates such as date of admission, discharge, or service; date of birth; or date of death

  • city, state, or five digit or extended zip code

  • age in years, months, days, or hours

Note: HIPAA-limited data should always be categorized as controlled access data.


Individual ID

The identifier for a specific individual (human subject or single animal).


Individual Level Data

A designation applied to any data file that contains values for an individual, as opposed to aggregate data, which is summary data from multiple individuals.


Initiative

A group of projects that were funded under the same grant mechanism.


Intended Data Use Statement (IDU)

A detailed description submitted through an access request identifying the requester's research purpose for accessing and using controlled access data via the NAMHub. The IDU is reviewed by a Data Access Committee (DAC) to determine whether access to the data should be allowed. IDUs should address the following questions: What do you want to do with the data? Why are you doing it? How do you want to do it? It must also include the list of the NAMHub studies you want to access.


Key data

Specific data that fulfills one or more of the following criteria: (1) a dataset that contains data generated using one or more high-throughput methods that output raw data presented in a widely used systematic format and has more than one or two samples; (2) a dataset that is considered to be validation data for a new method that is being developed in the funded program; (3) a dataset that is specifically deemed of interest by an investigator for some other reason, such as particularly unique data or data that is difficult to recreate; (4) a dataset that is specifically deemed of interest by the program’s funder for some other reason.


Manifest

A list of files and their metadata. There are several different types of manifests used throughout Synapse:

  • Upload manifest: This is used to upload metadata for a batch of files. More details are provided here.

  • Download manifest: This is used when downloading data programmatically. The template is provided by the Synapse Python Client.

  • File Schema Driven Manifest: This is based on the File Schema.

  • Portals Manifest: This is currently provided when exporting data.


Metadata

Standardized information included alongside data (or other resources) to give it context (i.e., data about the data). Metadata is what allows data in the portal to be searchable, discoverable, accessible, reusable, and understandable to others, including those who were not involved in the data generation process. Metadata can be descriptive (e.g., the name of the file), administrative (e.g., provenance information), or research-based (e.g., information about the sampling and handling of data).


Metadata dictionary

Documentation of metadata fields and values.


Metadata validation

The act of checking metadata for expected values and formatting.


NAM(s)

Acronym for New Approach Methodology (Methodologies), which are human-based alternatives to animal models, such as organ-on-chip devices and organoids grown from patient cells. NAMs hold promise as more reliable tools than existing animal models for modeling human health and disease.


NAMHub

Acronym for the New Approach Methodologies Hub of the Complement Animal Research in Experimentation (Complement-ARIE) program funded by the United States National Institutes of Health (NIH). Complement-ARIE is an initiative to promote the development, standardization, validation, and use of human-based New Approach Methodologies (NAMs) in place of animal models to study diseases or test new drugs. The NAMHub is the central portal to collect, organize, and
share data on these new methods.


TDC Data Manager

A person responsible for managing contributed data through annotation, validation, quality checking, etc.


NDHCC

Acronym for the NYU-Sage NAM Data Hub and Coordinating Center, the central coordinating center for the overall Complement-ARIE program and the team responsible for the NAMHub.


NIH

Acronym for the United States National Institutes of Health.


Open Data / Open Science

Transparent and accessible knowledge that is shared and developed through collaborative networks.

The goal of open science is to make scientific research – including publications, data, physical samples, and software – and its dissemination accessible to all levels of an inquiring society, whether amateur or professional.

The general driving idea behind open data and open science is that scientific research can and should be accessible to anyone – because, well, why not? This system benefits all parties involved: researchers gain wider-reaching recognition and appreciation for their work, study subjects get to witness the palpable value of providing their personal data, scientists and other professionals are able to use previously funded research to aid in their own research / work, and the general public gains helpful information and knowledge from trusted sources. This is truly a win-win situation. Collective consciousness is a global good!

Read more about data sharing here.


PHI

Acronym for protected health information, as defined by HIPAA.


Publication

A preprint or peer-reviewed journal article generated from funded studies that contribute data to or use data on the NAMHub. If you have produced a publication using data available on the NAMHub and do not see it on the portal, please let us know at namhub@sagebase.org.


Raw Data

The initial, unmodified information collected directly from sources, not yet processed or analyzed. For instance, in biological imaging, raw data is often in .ome-tiff format, preserving all details and metadata from the microscopy instrumentation. In genomics, raw data typically appears as .fastq files.


Schema (Metadata Schema)

Rules and standardization for a data model. A metadata schema outlines additional requirements governing the management of metadata through constraints such as the optionality or valid values of attributes.


Sensitive Data

Data that must be protected from unauthorized access to safeguard the privacy or security of an individual or organization. This includes human data at risk of re-identification.

Note: “De-identified” data (data maintained in a way that does not allow association with a specific person) is not considered sensitive.


SOP

Acronym for a standard operating procedure, which is a document that serves as a step-by-step guide to accomplish a particular task in a consistent manner.


Specimen ID

The identifier for a sample from a specific individual (e.g., a brain tissue sample or a blood sample).


Study

The primary unit of data organization in the NAMHub. Essentially, each study represents an individual research project with specific objectives and focus. One project can operate multiple studies. A study can represent data generated from a specific human cohort, data from experiments on a model system, cross-consortium data processing and analysis efforts, or data associated with a specific publication.

In our context, a study is typically associated with a grant, so the terms study and grant are often used interchangeably. However, some studies span multiple grants or are led by program partners that are not grant-funded.

On the NAMHub, a study bundles multiple pieces of project information together, including a study title, summary, lead investigator, access requirements, acknowledgement statements, files (data, metadata, others), tools, publications, and / or related studies. Not all of these components will necessarily be present, particularly if the study is currently active.


Synapse

An online software system (synapse.org) developed by Sage Bionetworks that allows users to upload, store, analyze, and track data in a private or public space.


Template (Manifest Template)

A document, usually an Excel spreadsheet, that outlines a collection of specific metadata attributes pertaining to a data type (or other resource type). The columns of the template refer to the metadata attributes to be collected for a set of corresponding data. In other words, the metadata template describes a set of key / value pairs that can be assigned to a data file of the specified data type.


TDC

Acronym for a Comprehensive NAMs Technology Development Center, a team tasked with developing combinatorial NAMs as part of the Complement-ARIE program.


Tool

Resources, often based on experimental models or consisting of software, that are available to the research community to assist with data research and analysis.


VQN

Acronym for the Validation and Qualification Network, the component of the Complement-ARIE program that is tasked with working with regulatory authorities to accelerate the regulatory use of NAMs by developing standards and procedures for validating and qualifying NAMs.