HEAL Stewards Glossary


This glossary is intended to provide clarity around common terms used in the HEAL Data Ecosystem, especially those pertaining to the structure of the ecosystem and FAIR data principles. The HEAL Stewards provide links, when possible, to NIH resources. Where NIH reference sources are not available, other reputable resources are referenced to provide a definition or further details of interest. No reference is intended to promote or endorse a website or service. Where no reference is provided, the HEAL Stewards authored the definition.

Application Programming Interface (API)


A set of definitions and protocols for building and integrating application software that lets a product or service communicate with others, without having to know how they’re implemented (Red Hat). APIs help connect the HEAL Data Platform to data repositories where researchers deposited their study data, enabling researchers to work with data in the Platform.


Source: Red Hat

Authentication (AuthN)


Validation of a given user’s identity.

Authorization (AuthZ)


Permissions granted to a user to access resources, e.g. (meta)data and files.

Basic Research


Systematic study directed toward greater knowledge or understanding of the fundamental aspects of phenomena and of observable facts without specific applications towards processes or products in mind (NIH Grants Glossary). There are a variety of study types in the HEAL Data Ecosystem, including studies on the mechanisms of pain and Opioid Use Disorder (OUD) in in vitro systems and cell lines, as well as development of novel assay methods, computational methods for diagnosis, and research on biomarkers.

Brain Imaging Data Structure (BIDS)


A neuroimaging standard for data generated by MRI, EEG, and other experimental modalities. The HEAL-compliant repository OpenNeuro, for example, only accepts data in the BIDS format.


Source: BIDS

The Biomedical Research Integrated Domain Group (BRIDG) Model


A biomedical standard that acts as a bridge between many domains of research (e.g. pre-clinical, clinical, translational).

Case Report Forms (CRFs)


A standardized tool used in clinical research to systematically collect data from study participants. It serves as the primary tool for documenting key information required by the research protocol, such as participant demographics, study procedures, observations, outcomes, and adverse events. CRFs can be electronic (eCRFs) or physical (paper forms), depending on the study setup. They are often tailored to the specific needs of a study, project, or institution, which means formats and content may vary across research efforts.

Clinical Data Interchange Standards Consortium (CDISC)


A clinical research and therapy consortium that established many standards (some of which were adopted by the FDA) for the medical research community.


Source: CDISC

CDISC Clinical Data Acquisition Standards Harmonization (CDASH)


Provides a standardized way of collecting clinical data to be stored in the Study Data Tabulation Model (SDTM) format (See Clinical Data Interchange Standards Consortium).

CDISC Shared Health And Research Electronic library (SHARE)


A metadata repository for working with CDISC standards (See Clinical Data Interchange Standards Consortium).

CDISC Study Data Tabulation Model (SDTM)


A format for storing clinical data, which is required for data submissions to the FDA (See Clinical Data Interchange Standards Consortium).

Clinical Document Architecture (CDA)


An older but still widely used healthcare standard developed by Health Level Seven (HL7) prior to the creation of Fast Healthcare Interoperability Resources (FHIR).

Clinical Trial


A clinical trial is a research study that tests safety or effectiveness of a medical, behavioral, or surgical intervention in people. A trial can range in scope from those with a small number of healthy human participants in Phase I, exploring in vivo tolerance and pharmacokinetics, to Phase II and III trials exploring efficacy in randomized controlled studies with longitudinal participant tracking. There are a variety of study types in the HEAL Data Ecosystem, including clinical trials.

Clinical Vocabularies


A terminology that primarily serves to provide a systematized and controlled vocabulary of clinically relevant phrases that can be used during data capture to provide a more precise and shareable expression than might be obtained by using free text (Park, H. and Hardiker, N., 2009). Using clinical vocabularies makes data more findable and interoperable on the HEAL Data Ecosystem.

Cloud Computing


Internet-based computing, wherein computing power, networking, storage, or applications running on computers outside an organization are presented to that organization in a secure, services-oriented way. The HEAL Data Platform is a cloud-based web interface and provides workspaces for secure data analysis environments in the cloud.

Common Data Elements (CDEs)


Standardized terms for the collection and exchange of data, which may be used to form a vocabulary (e.g., the HEAL CDE Library) or may be deployed within data capture tools such as a Case Report Form. Using CDEs facilitates comparing and integrating data across research studies. Pain clinical studies are required to use HEAL Pain CDEs. All HEAL studies are encouraged to use CDEs to broaden opportunities for future data analyses.

Creative Commons License


A license that gives everyone from individual creators to large institutions a standardized way to grant the public permission to use their creative work under copyright law (Creative Commons). Some HEAL-compliant data repositories, such as Figshare, have adopted a Creative Commons License as the default tool for researchers to share their datasets.

Data Access/Security


Security measures put into place on information accessible through the network. Access controls can be implemented for a variety of reasons, including to protect sensitive or personal information.

Data Administrator


A person working in the areas of information systems and computer science that plans, organizes, describes and controls data resources.

Data Architecture


A multifaceted area of study concerning the way data is used, shaped, and stored, including: 1) the physical manifestation of data, 2) the logical linkage of data, 3) the internal format of data, and 4) the file structure of data (Inmon et al., 2019). The data architecture underlying the HEAL Data Ecosystem centers on making HEAL study data FAIR (see FAIR Guiding Principles).

Data Curation


The ongoing processing and maintenance of data throughout its lifecycle to ensure long term accessibility, sharing, and preservation NNLM Glossary. The HEAL Stewards develop resources that aid research data and metadata curation. Some HEAL-compliant repositories offer data curation services.


Source: NNLM Glossary

Data Dictionary


A centralized, file-based collection, containing information about data, such as the meaning, relationship to other data, origination source, use, and format; a file containing variable level or study level metadata Data dictionaries are crucial to making HEAL data findable, accessible, and reusable via both HEAL Semantic Search and the HEAL Data Platform.

Data Ecosystem


The programming languages, packages, algorithms, cloud-computing services, and general infrastructure an organization uses to collect, store, analyze, and leverage data (5 Key Elements of a Data Ecosystem, 2021). The NIH HEAL Initiative elements function as a data ecosystem, composed of different components like the HEAL Data Platform, HEAL Stewards, HEAL Connections, and other project stakeholders.

Data Governance


The activities necessary to manage data integrity (Inmon et al., 2019). The HEAL Data Ecosystem allows for distributed governance by having investigators submit data to their HEAL-compliant repository of choice. HEAL leadership and NIH have governance practices in place to support the HEAL data quality.

Data Harmonization


All efforts to combine data from different sources and provide users with a comparable view of data from different studies.

Data Management


The development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their life cycles (DAMA-DMBOK, 2nd edition).

Data Management and Sharing Plan (DMSP)


A written document that describes the data you expect to acquire or generate during the course of a research project, how you will manage, describe, analyze, and store those data, and what mechanisms you will use at the end of your project to share and preserve your data (HEAL DMSP Guidance). The 2023 NIH Data Management and Sharing policy requires prospective HEAL studies to submit a DMSP as part of their award application.

Data Model


A set of systematically organized data elements and relationships that describes some entity or process.

Data Permissions


The specific rights and privileges granted to individuals or entities regarding the access, use, and manipulation of data. Data permissions ensure sensitive or confidential information is only available to authorized individuals, while also allowing data to be shared and utilized by those who need it.

Data Portal


The point of access to request data from a network. The HEAL Data Platform serves as the NIH HEAL Initiative data portal, where users can explore HEAL datasets and conduct in-platform analysis.

Data Standard


The guidelines (for format, meaning, and beyond) by which data are described and recorded to enable the sharing, exchange, combining, and understanding of data (U.S. Geological Survey). Metadata and community standards resources are available on the HEAL Stewards website.

Data Transfer and Use Agreement (DTUA)


A legally-binding document of terms for nonpublic research data transfer between organizations (Montclair State University).

Digital Imaging and Communications in Medicine (DICOM)


A standard format for viewing, storing and transmitting medical images and related data (DICOM).


Source: DICOM

Digital Object Identifier (DOI)


A string of numbers, letters, and symbols used to uniquely identify an article or document, and to provide it with a permanent web address (URL) DOIs provide a standard mechanism for retrieval of metadata about the object, and generally a means to access the data object itself. Some HEAL-compliant repositories issue a DOI for submitted research datasets or objects.

Distributed Architecture


A form of data architecture that can be applied across the scope of a network, ranging from distributed governance to distributed database infrastructure, including independent computing elements and a system that appears as a single system to the user (van Steen and Tanenbaum, 2016, p. 968) The HEAL Ecosystem functions as a distributed system where HEAL investigators submit metadata to the HEAL Data Platform, and submit both data and metadata to HEAL-compliant repositories for storage and archiving.

Dug


Dug is an open source semantic search engine developed by RENCI and RTI International. It is the underlying technology that powers HEAL Semantic Search - a tool that allows users to search studies, datasets and variables to identify novel relationships.

Electronic Health Record (EHR)


An electronic version of patient medical history, maintained by the provider over time; includes key administrative clinical data relevant to that persons care under a particular provider, including demographics, progress notes, problems, medications, vital signs, past medical history, immunizations, laboratory data and radiology reports (Centers for Medicare and Medicaid Services.

Extract-Transform-Load (ETL)


The process for data migration from some source to a final destination (for example, taking data out of structured text files and entering that data into a database).

Fast Health Interoperability Resources (FHIR)


A modern standards framework to enable interoperability of healthcare data, developed and maintained by Health Level Seven (HL7), for exchanging biomedical information, including high-level data models (“resources” such as “Patient”, “AdverseEvent”, and “Observation”) that map to data in the EHR.


Source: HL7 FHIR

FAIR Guiding Principles


The Findability, Accessibility, Interoperability, and Reusability of digital assets, emphasizing data being human readable and have machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) (Go FAIR). The NIH HEAL Initiative’s compliance requirements for data are centered around the FAIR Guiding Principles.


Source: Go FAIR

FAIR Maturity


The degree to which a digital asset adheres to each of the FAIR guiding principles.

Federated Architecture


A form of data architecture where the constituent databases are interconnected via a (geographically decentralized) computer network, with the constituent database systems remaining autonomous (McLeod & Heimbigner, 1980).

Federated Governance


A form of data governance where the schema of each component is defined and controlled by a component data administrator (McLeod & Heimbigner, 1980).

FHIR Argonaut Project


A private sector initiative to accelerate the use and adoption of FHIR, sponsored by prominent EHR vendors, Apple, and others.

FHIRCap


Software for converting data in a REDCap database into the FHIR format.

Globally Universal Identifier (GUID)


A way to uniquely and permanently identify any digital object (e.g. data set, file, application), potentially assigning metadata to allow GUIDs and their objects to be searched and discovered in compliance with FAIR principles. Some HEAL-compliant repositories issue a GUID for research objects submitted to them.

HEAL Collective Board


A formal group of representative HEAL Community members working together as subject matter experts, opinion leaders, and arbitrators to guide the overall strategy and direction of the HEAL Data Ecosystem.

HEAL Collective Board Coordination Group


Representatives of HEAL Data Ecosystem Program Team, the HEAL Stewards, HEAL Connections, and the HEAL Data Platform who work together as fair brokers, facilitating consensus among and providing support to the Collective Board.

HEAL Community


A dynamic and diverse group of people, organizations, and communities who participate in, or will benefit from, HEAL research to address the opioid crisis.

HEAL Data Asset Inventory (DAI)


A record describing HEAL award-generated data assets, maintained by the HEAL program, and providing insights into metadata, data assets and data types, which can also include information about volume, storage, and sharing practices.

HEAL Data Community


Investigators, data generators and processors, HEAL Stewards, HEAL Data Platform members, and those who access the HEAL Data Platform. Also, those who work with HEAL data and share a common goal of making that data FAIR to enable faster science, supporting improved patient outcomes.

Data Coordinating Centers (DCCs)


Data coordinating centers serve as central hubs for managing critical functions of clinical and survey research, managing details such as study design, collection, verification, and data storage, and supporting administrative requirements (RTI).


Source: RTI

HEAL Data Ecosystem


The HEAL Data Ecosystem is part of the NIH HEAL Initiative®, an NIH-wide effort to speed scientific solutions to stem the evolving national opioid public health crisis. The goal of the HEAL Data Ecosystem is to accelerate sharing HEAL-generated data and results among the broad community of researchers, health care providers, community leaders, policy makers, and other HEAL stakeholders who can benefit from learning the results of initiative research. The HEAL Data Ecosystem connects the HEAL community, enabling HEAL data to be searched, analyzed, and used to make new discoveries. By empowering researchers to make their HEAL-generated data FAIR (findable, accessible, interoperable, and reusable), the HEAL Data Ecosystem promotes data sharing (NIH About the HEAL Data Ecosystem).

HEAL Data Ecosystem Program Team (HEAL Program Team)


A liaison between NIH HEAL Leadership and the Collective Board conveying NIH HEAL Initiative priorities and timelines, and informing the Collective Board about relevant events and initiatives. Also, works with the NIH program staff leading the nine targeted program areas, acting as a nominating committee to select the members to the Collective Board.

HEAL Data Platform


A secure data access and computing environment that provides a searchable web interface to discover and analyze HEAL results and data. It will be used by the HEAL Community and is being developed by two expert data resource teams: one from the University of Chicago and the other from the Renaissance Computing Institute at the University of North Carolina at Chapel Hill/RTI International (RENCI/RTI).

HEAL External Ethics Expert (Panel)


An individual (or group of individuals) selected to advise the Collective Board about complex ethical challenges surrounding HEAL data and may, in consultation with the Coordination Group, form a small panel of external ethicists to advise on specific matters as needed.

HEAL Investigators/HEAL Investigator Cohort


The principal investigators funded by the NIH HEAL Initiative.

HEAL Program Officers (POs)


NIH Staff scientists who administer HEAL grants, assigned by the Administering Institute or Center.

HEAL Public Access and Data Sharing Policy


An NIH policy that seeks to create an infrastructure addressing the need for researchers, clinicians, and patients to collaborate on sharing their collective data and knowledge about opioid misuse and pain to provide scientific solutions to the opioid crisis (NIH HEAL Public Access and Data Sharing Policy).

HEAL Research Programs


The NIH HEAL Initiative is organized into seven research focus areas. Within those focus areas, most NIH Institutes and Centers are leading more than 30 research programs to find scientific solutions to the opioid crisis (NIH HEAL Initiative Research).

HEAL Data Stewardship Group (HEAL Stewards)


An NIH-funded organization that works with research teams throughout the HEAL Initiative to provide solutions for managing and coordinating diverse HEAL data, with guidance in implementing FAIR data management and sharing practices for the diverse datasets generated by HEAL-funded projects (HEAL Data Sharing, HEAL Data Stewardship Group); can be abbreviated as HEAL Stewards (S is always capitalized).

HEAL Semantic Search


A semantic search tool that incorporates knowledge regarding biomedical concepts, synonyms and relationships to enable users to explore the HEAL research landscape; discover related biomedical concepts, studies, and variables; and identify datasets and variables for further analysis (HEAL Semantic Search).

HEAL Studies


HEAL-funded research studies. The number of HEAL awards is significantly higher than the number of studies since the same study may receive multiple awards (renewals or extensions).

Health Level Seven (HL7)


A set of healthcare standards, including FHIR and CDA, developed by HL7 International.

Implementation Plan (IP)


Outlines how the various elements from the planning phase of a project will come together to form a concrete, operationalized platform. The HEAL project has an implementation plan that serves to guide the planning of work and document the organization of the ecosystem.

Intellectual Property (IP)


A ‘creation of the mind’ such as inventions, literary or artistic works, designs or symbols, and names or images. In biomedical research this includes patents, trademarks, and copyrights that are protected to safeguard inventions, processes, materials, and ideas, particularly when the outcome has potential commercial value (NIH Tribal Health Research Office Fact Sheet & World Intellectual Property Organization (WIPO)).

International Classification of Diseases (ICD)


A set of ID codes for diseases and medical conditions (For example, the code H01.111 refers to “allergic dermatitis of the right upper eyelid”), currently on version 10.

Interoperability


When investigators use machine-actionable ontologies and controlled vocabularies to support connectivity to other similarly open data systems, exponentially maximizing the utility of individual data assets. The HEAL data ecosystem is designed to facilitate interoperability by suggesting standards that enable HEAL datasets to work together.

JavaScript Object Notation (JSON)


A file format commonly used for transmitting data, serving a similar purpose as XML. Data is stored as attributes and their associated values (e.g., “phoneNumber”: “240-555-5555”, “age”: 45, “receivingTreatment”: true).

Knowledge Graph


A data model that integrates diverse and heterogeneous data across multiple domains in a network structure, with nodes representing entity types such as disease, gene, and chemical exposure, and edges providing predicates that describe the relationship between entity types such as ‘causes’, ‘is associated with’, and ‘is expressed in.’ HEAL Semantic Search utilizes knowledge graphs to combine otherwise disparate data types to answer new questions.

Logical Observation Identifiers Names and Codes (LOINC)


A collection of names and ID codes for clinical and laboratory tests (for example, the code 67293-1 refers to “Other MRI scan”).

Machine Readable


When data is presented in a format that is structured and can be processed by machines (Wikipedia).


Source: Wikipedia

Metadata


Data that describes other data and is often used for search purposes, such as data describing a scientific data set (for example: author, organization, journal, date of creation, and beyond). There are two kinds of metadata that are important to the HEAL Data Ecosystem, study-level metadata (see Study-Level Metadata) and variable-level metadata (see Variable-Level Metadata). These two sets of metadata help describe the study and the data that the study is capturing.

Metadata Schema


A structured framework or set of rules that defines how data or information should be organized, described, and recorded. Metadata, in this context, refers to data about data. It provides information about the characteristics, attributes, and properties of the data, making it easier to manage, search, retrieve, and understand. The HEAL study-level metadata schema, for example, describes the properties of a study. Examples of the elements that make up the metadata schema are study type and data availability.

Meta-vocabulary


A joined superset of vocabularies created by a set of mappings between individual vocabularies. For example, the UMLS meta-vocabulary is a set of mappings between roughly 200 different vocabularies.

Neuroimaging Informatics Technology Initiative (NIfTI)


An imaging format common in neuroscience research.

NIH HEAL Executive Committee


A committee that reviews program and funding plans for NIH HEAL Initiative research, ensures coordination among the multiple NIH HEAL Initiative programs, and considers the input from the external expert advisory groups; made up of NIH IC Directors supporting the initiative, co-chaired by the NIH Director and the Directors of the National Institute on Drug Abuse and the National Institute of Neurological Disorders and Stroke (NIH HEAL: About Page).

NIH HEAL Leadership


The NIH HEAL initiative is jointly managed by NIH’s National Institute on Drug Abuse (NIDA) and National Institute of Neurological Disorders and Stroke (NINDS) in close collaboration with other NIH Institutes, Centers (ICs), and Offices. (NIH HEAL: Leadership Page).

Observational Medical Outcomes Partnership Common Data Model (OMOP CDM)


A meta-vocabulary of medical terms with mappings for SNOMED, ICD-9, ICD-10, and LOINC; also a data model that describes medical observations, for example from an EHR (OMOP CDM).


Source: OMOP CDM

Observational Studies


A study of participant treatments and outcomes over a period of time with no intervention made to affect the outcomes (important for understanding the differences between populations experiencing similar and differing environmental or community-level stressors).

Ontology


A collection of terms, their meanings, and any relationships between those terms which may be represented as lists or as tree structures (e.g., the Plant Ontology); sometimes used interchangeably with a vocabulary. Utilizing domain-relevant ontologies in data collection makes data more findable and interoperable within the HEAL Data Ecosystem.

Preclinical Research


For HEAL, a study in human subjects and animal models on the mechanisms of pain and opioid use disorder (OUD), frequently focused on drug target identification and validation, the study of novel biological mechanisms, and discovery and validation of potential biomarkers for pain and OUD.

Provenance


Includes licensing information articulating how assets can be shared; facts about a digital object’s creation, such as the instruments used to take measurements, the author of documents, times of modification; and the biomedical semantic content of the data, such as the chemicals, genes, diseases, phenotypes, and biological processes involved. Machine readable provenance is essential for the semantic linking of data.

Query


A request for data or information from a database table or combination of tables.

Repository


A centralized storage (generally digital) that collects, maintains, aggregates, and organizes data for future mining, reporting, and analysis.. The HEAL Stewards curated a list of HEAL-compliant data repositories for researchers to consult when deciding where to deposit their study data.

Reproducibility Crisis


The reported, large-scale phenomenon of reproduced research studies not being able to successfully replicate results. Recent literature noted the potential benefit of more open science in helping to prevent this trend (Stanford Encyclopedia of Philosophy).

Risks


The potential for harm. Researchers have an ethical obligation to do no harm. It is important to consider multiple factors when conducting research with animals or humans. It is also important to consider how the research data will be protected from risks, such as data breaches (University of Virginia).

Search Engine Indexing (Indexing)


Collecting, parsing, and storing data to facilitate fast and accurate information retrieval; may also be called web indexing or just indexing (Wikipedia).


Source: Wikipedia

Search Engine Indexing Design


The process of creating the structure for search engine indexing by incorporating interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science (Wikipedia).


Source: Wikipedia

Semantic Search


A search that returns relevant query results based on the meaning of the query rather than the lexical terms used (HEAL Metadata Overview). The HEAL Data Ecosystem uses semantic search to allow users to search across disparate HEAL datasets, allowing for opportunities to uncover relationships between concepts and variables within the landscape of pain and addiction research, and enable the development of novel insights into pain and opioid use disorder (OUD).

SMART


A platform (with variable user categories for healthcare workers, patients, pharmacists, or researchers) for creating software that can integrate with EHR and other parts of the health IT system, including a diverse array of applications such as allowing users to connect health records from multiple sites, monitor patients, track medications, and assess cardiovascular risks.

Standard


A uniform way of conducting technical processes or structuring components, with some (e.g., FHIR, STDM) created by standards organizations (e.g. HL7, CDISC) and others ratified by widely known standards issuing groups like ISO and ANSI.

Study-Level Metadata (SLMD)


The highest level of described metadata, which includes information like a study’s name, description, unique identifier, and investigators. This metadata enables searches on the HEAL Data Platform for data collections at the study level.

System Performance


The amount of work accomplished by a computer system.

Systematized Nomenclature of Medicine (SNOMED)


A vocabulary consisting of clinical and laboratory codes, each of which refer to a specific patient test or observable (for instance, the code 57021-8 refers to “complete blood count panel”).

Therapeutics Development


A research study working towards the validation of new drugs, treatments, or devices on the path to approval for human use, which may focus on human subjects and animal models.

Topology


Characteristics of the interconnection between a network and its processing nodes (Schnorr et al., 2013), or the study of what the connected components of a space are; more generally, the study of connectivity in information (Carlsson, 2009).

Unified Medical Language System (UMLS)


A meta-vocabulary that maps or cross-matches terms from different biomedical vocabularies, including MeSH, SNOMED, and LOINC.

Upper Ontology


A top-level or foundational ontology that consists of general terms used across a variety of domains to support semantic interoperability across domain-specific ontologies.

Usability


The degree of ease with which products such as software and Web applications can be used to achieve required goals effectively and efficiently.

Variable-Level Metadata (VLMD)


Information about each dataset variable, including the variable name, description (or label), type, format, terminology, source, and beyond; helps ecosystem users understand each dataset variable and enables semantic search; may be recorded, in part or in full, in file types such as data dictionaries, codebooks, and READMEs.

Virtual Machine


An isolated computing environment with its own operating system.

Vocabulary


A collection of terms and their meaning, sometimes used interchangeably with ontology. Utilizing domain-relevant vocabularies in data collection makes data more findable and interoperable within the HEAL Data Ecosystem.

XML (eXtensible Markup Language)


A text-based, hierarchical format for representing data. XML-based file formats, like CDISC’s ODM, are useful for transmitting data, in a fashion similar to the JSON file format. While not explicitly XML-based, other standards like HL7’s FHIR are capable of being represented in XML.