These examples are from HEAL-funded studies that have submitted data to a HEAL-compliant repository. The datasets are publicly accessible, and the Principal Investigators have given the HEAL Stewards permission to link to their data packages. Some data types below do not have data package examples available yet. Examples will be added as they become available. In the meantime, general data sharing guidance materials are provided to help investigators prepare their data packages. While reviewing the examples below, look for the symbols that identify which core (✅) and additional (✔️) components each data package includes.
Animal Behavioral, lmaging, and Observational Data
This data package has a clear folder structure and file naming convention that makes it easy to find the data files and understand their relationships. The data package uses repository-specific (SPARC) requirements, documenting the study protocol using Protocols.io.
Components in this Data Package:
✅ Data files
✅ README or Summary file
✅ Variable-level Metadata documentation
✅ Repository-specific documentation
✔️ Code used to conduct analyses
✔️ Study Protocol
This dataset contains EEG and EMG recordings conducted in non-human primates. In addition to all the core components, this data package includes study methodology and links to the software used in analysis.
Components in this Data Package:
✅ Data files
✅ README or Summary file
✅ Variable-level Metadata documentation
✅ Repository-specific documentation
✔️ Code used to conduct analyses
✔️ Publication Citations
✔️ Context or explanatory documents
The Brain Imaging Data Structure (BIDS) is currently developing BEP 032: Microelectrode electrophysiology as a community-developed standard to provide clear structure and documentation practices for organizing, describing, and sharing animal electrophysiology data that promote transparency, interoperability, and long-term reuse.
Animal Audiovisual Data
Best Practices:
For non-human subjects data, such as video recordings of mice behavior, data repository options may be limited by large file sizes. Audiovisual recordings involving animals may raise ethical, regulatory, or programmatic considerations that extend beyond standard identifiability concerns. Research teams should carefully assess these factors when determining what to share and how to document the materials. Researchers may choose to share processed data from audiovisual recordings in addition to or rather than the raw recordings.
Sharing data behind access controls is an appropriate safeguard to manage sensitive data concerns.
Biomedical Imaging Data
Human EEGs
Best Practices:
EEG data sharing should follow community standards, such as the Brain Imaging Data Structure (BIDS), which aligns files and metadata with FAIR principles to support interoperability and reproducibility. Investigators should convert proprietary formats into open, widely supported ones, such as the European Data Format (EDF), and ensure all accompanying metadata are complete and machine-readable. Metadata files (in .json and .tsv formats) should describe hardware details, such as sampling rate and electrode placement, along with task design, event markers, and de-identified participant information.
Neuroimaging data are typically shared in standardized formats such as DICOM (.dcm) and NIfTI (.nii or .nii.gz). DICOM files include structured headers containing image metadata, while NIfTI files store image data with an optional .json metadata file, often generated during conversion (e.g., using dcm2niix). For sharing and curation, investigators should de-identify data, removing facial features and any “burned-in” text containing protected health information (PHI). When possible, organize files according to the Brain Imaging Data Structure (BIDS) to ensure consistency and interoperability. Common tools for viewing or validating files include ImageJ, MRIcron, and AFNI.
Body MRI data cover multiple anatomical regions, including abdominal, cardiovascular, and musculoskeletal areas, and should be carefully documented to capture acquisition parameters, such as body region, coil type, contrast timing, and patient positioning. DICOM is the standard format for preserving image metadata, but when converting to shareable formats like NIfTI, investigators should retain all relevant sequence parameter and anatomical coverage information. While community standards for body MRI data organization are still developing, investigators should apply consistent folder structures and include sidecar metadata files (e.g., .json and .tsv) describing imaging series, de-identification methods, and protocol details.
MRI, fMRI, and body MRI Resources:
Neuroimaging DICOM and NIfTI Primer (Data Curation Network) is a joint effort by the Data Curation Network, National Institutes of Mental Health, and National Institute of Neurological Disorders, including a comprehensive Primer for DICOM and NIfTI neuroimaging data.
The COIBIDAS Report (PDF), developed by the Organization for Human Brain Mapping (OHBM), outlines best practices for fMRI data analysis and sharing. While not a required standard, these community recommendations are widely recognized for promoting neuroimaging research transparency, reliability, and reproducibility.
For body MRI data, the American College of Radiology’s best practices for data sharing emphasize protecting patient anonymity, curating clinically relevant information to support reuse, and ensuring transparency, compliance, and responsible body MRI data sharing.
Microscopy (Cellular Imaging)
Best Practices:
Preparing microscopy data for long-term use and reproducibility involves converting proprietary image files into open formats such as OME-TIFF or the cloud-optimized OME-Zarr, which keep complex image data accessible and reusable. Each data package should include detailed metadata—both the technical acquisition information captured by the microscope and the broader experimental context summarized in a README file. Using clear, descriptive file names and a consistent folder structure to separate raw, standardized, and processed data helps others (and the original study team, in the future) navigate the package easily.
Microscopy Resources:
Open Microscopy Environment includes free, open tools and metadata specifications to support microscopy data management.
Ultrasound data is often organized and shared in the Digital Imaging and Communications in Medicine (DICOM) format, which preserves standardized image data and acquisition details. For research applications, DICOM files can be converted to NIfTI to support analysis and visualization. Investigators should keep both the original manufacturer files and standardized exports. Metadata should include details about the imaging device, acquisition settings, and imaging mode, but all Protected Health Information (PHI) must be removed or de-identified. Organizing files in a BIDS-like structure with clear, descriptive names, supports readability and automated processing. Any details not captured in the DICOM header should be added in accompanying .json or .tsv files. Documenting software, processing steps, and data provenance helps maintain transparency and reproducibility.
NIH-funded clinical trials must be registered on ClinicalTrials.gov, and the assigned CTN number should be included in all data packages. For human subjects studies focused on pain, investigators must incorporate the HEAL Pain Common Data Elements (CDEs) to ensure data collection consistency, enable cross-study comparisons, and support integration with other HEAL-funded research. To promote consistency, interoperability, and reuse, clinical trial data may be organized using standardized formats such as Clinical Data Interchange Standards Consortium (CDISC) standards, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), or another standard that fits the data type. Each data package should include de-identified participant-level data, a detailed data dictionary, study protocols, analysis code, and informed consent materials that describe data use and sharing. Maintaining traceability from raw data to analyzed results ensures transparency, reproducibility, and NIH HEAL data sharing requirement alignment.
Rethinking Clinical Trials: A Living Textbook is an expert-curated resource for pragmatic clinical trial design, conduct, and reporting, offering best practices, case studies, and tools to help researchers improve trial execution and data sharing.
This dataset is openly accessible (but repository access requires a name and email) and includes thorough documentation describing data preparation and de-identification. It also provides a data dictionary, null value file, and study-level metadata with the protocol. Together, these elements make it a clear, well-structured example of a tabular data package that aligns with HEAL data sharing expectations.
Components in the Data Package:
✅ Data file(s)
✅ README or Summary file
✅ Variable-level Metadata documentation
✅ Repository-specific documentation
✔️ Code used to transform raw data to analytic dataset
✔️ Code used to conduct analyses
✔️ Publication Citation(s)
✔️ Study Protocol
✔️ Blank data collection instruments
✔️ Context or explanatory documents
This dataset contains a well-structured experimental data package supporting transparency and reuse across multiple research domains. It combines diverse data types (biochemical assays, imaging, electrophysiology, and behavioral studies) with detailed documentation that links methods, results, and analysis workflows. By providing both raw and processed data under an open data license, it demonstrates best practices for sharing complex, multimodal research in alignment with HEAL and FAIR principles.
Components in the Data Package:
✅ Data file(s)
✅ README or Summary file
✅ Variable-level Metadata documentation
✅ Repository-specific documentation
✔️ Context or explanatory documents
This SPARC dataset provides an example of how long-read sequencing data can be packaged for transparency and reuse. The data include both raw and processed files, detailed documentation of experimental methods and sequencing workflows, and clear metadata describing instruments, file types, and analysis tools.
Components of the Data Package:
✅ Data file(s)
✅ README or Summary file
✅ Variable-level Metadata documentation
✅ Repository-specific documentation
✔️ Study Protocol
✔️ Context or explanatory documents
These connected data packages demonstrate short-read sequencing data sharing through linked GEO and SRA records. The GEO submission includes both raw and processed files with clear metadata describing experimental design and analysis methods, while the SRA record provides access to the underlying sequencing reads in an open, standardized format. Together, they illustrate how coordinated repository submissions can support transparency, reproducibility, and long-term reuse.
Components of the Data Package:
✅ Data file(s)
✅ Summary or README file
✅ Variable-level Metadata documentation
✅ Repository-specific documentation
✔️ Publication Citation(s)
GEO Submission Guidance provides step-by-step instructions for submitting functional genomics data to the Gene Expression Omnibus (GEO) repository, including file preparation, metadata, and repository-specific requirements.
GEO Templates offers downloadable spreadsheet templates to help researchers organize and format GEO submissions in a consistent, machine-readable structure.
SRA Submission Guidance explains how to prepare, validate, and submit sequencing data to the Sequence Read Archive (SRA), covering accepted file types, metadata, and submission tools.
Proteomic Data
Best Practices:
Share raw and processed data files, organized in a consistent folder structure, using open formats such as mzML or mzIdentML and accompanied by complete metadata, describing instruments, software, and analytical methods. Include version information, a descriptive README, and identifiers that link related files.
MassIVE, a HEAL-compliant repository, offers a dedicated platform to archive, browse, and re-analyze mass-spectrometry proteomics data, supporting community reuse and transparency through structured submission workflows.
Qualitative and Social Science Data
Audiovisual Data
Best Practices:
Each package should include metadata describing recording conditions, equipment, and any processing or editing steps, along with documentation noting consent status and collection context. When sharing audiovisual data, researchers should ensure consent aligns with NIH and HEAL data sharing expectations and should remove or obscure identifiable information, such as names, faces, or voices, to protect participant privacy.
Sharing data behind access controls is an appropriate safeguard to manage sensitive data concerns. Researchers may choose to share processed data derived from audiovisual recordings in addition to or rather than the raw recording files. For example, processed data may include redacted transcripts, coding schemas, or observational battery data.
A well-prepared qualitative data package should include de-identified transcripts or text files, documentation describing the study design and analytic approach, and supporting materials, such as codebooks and interview guides, that provide essential reuse context. These components help ensure qualitative data are clearly documented, ethically shared, and useful for future research. When investigators worry that transforming the data files does not sufficiently de-identify them, access controls offer an additional layer of protection against deductive disclosure risk.
Qualitative Data Resources:
Qualitative Data Curation Primer (Data Curation Network) provides practical guidance to prepare, document, and share qualitative research data in alignment with NIH and HEAL expectations for responsible, transparent, and reusable data.
Guide for Sharing Qualitative Data at ICPSR outlines best practices to prepare and document qualitative data, so it can be shared responsibly, understood by others, and reused in ways that align with HEAL’s transparency and data stewardship goals.
This dataset provides a well-documented example of questionnaire data, including multiple rounds of expert responses, clearly defined variables, and comprehensive supporting documentation. It illustrates how structured data can be shared in a trusted and access-controlled repository to promote discoverability, transparency, and reuse.
Components in the Data Package:
✅ Data file(s)
✅ README or Summary file
✅ Variable-level Metadata documentation
✅ Repository-specific documentation
✔️ Blank data collection instruments
✔️ Context or explanatory documents
This data package includes clear metadata, data dictionaries, and organized folders for raw data, processed outputs, and analysis code, making the workflow transparent and reproducible. By providing open access under an MIT license and including both documentation and code, it aligns with HEAL accessibility, transparency, and responsible data sharing principles.
Components in the Data Package:
✅ Data file(s)
✅ README or Summary file
✅ Variable-level Metadata documentation
✅ Repository-specific documentation
✔️ Code used to transform raw data to analytic dataset
✔️ Code used to conduct analyses
✔️ Publication Citation(s)
✔️ Study Protocol
✔️ Context or explanatory documents
✔️ Clear terms of reuse