Skip to main content

Technical Data Standards

Technical data standards are an important element of the data governance strategy, as they ensure that data are Findable and Interoperable and that comparisons and mappings between different data sources can be established.

Standards for data, including metadata terms, attributes, structure and standardized naming conventions, enables data to be catalogued and insights to be drawn on data assets. For example, standard, well defined naming conventions allow for data to be queried. We are able to keep track of what types of data we have and in which areas more data is required. With standard vocabularies between data sources, mappings between sources can be created, allowing us to understand the relationship between data sources and provide insight on the quality and trustworthiness of these data sources.

Technical data standards exist in each of the following categories: (meta)data structure, (meta)data content (vocabularies) and meta(data) format.

Metadata Content

To annotate metadata terms, standard vocabularies are employed (see Appendix D). For example, all dates reported in the metadata are...

Metadata Terms

Metadata terms are adapted from the

An index of metadata standards can be found at the Metadata Standards Catalog.

GBADs will further extend and refine terms to fulfill specific needs for the governance and provenance of data collected by or handled by the GBADs data portal system.

Metadata Structure

Metadata schemas will be stored in the JSON-LD format. JSON-LD is a structured data format that supports the implementation of linked data. Google’s structured metadata tool will be used as metadata is created, revised and edited through our data lifecycle.

There are 3 main types of metadata, each serving a specific purpose in the lifecycle of the data (Riley, 2017) (Table 1). Descriptive metadata includes information about the data source used for finding and understanding a resource. This includes information about the context of the data, including information about provenance, if applicable. Descriptive metadata provides information that will allow for the data to be Findable and Interoperable. Administrative metadata includes two subsets of metadata: technical and rights metadata. Riley (2017) describes administrative metadata as metadata that are used for the management of a given data resource. Rights metadata provides information about the licensing and copyright information of a given source while technical metadata provides information about the file type and size of the data. Structural metadata describes the relationships between parts of resources to others. This might include information about how data are arranged or viewed, hierarchical relationships or the sequence of data objects. For the contexts of GBADs, structural metadata is also important in storing information about the Accessibility of private data sources. This is discussed in section [FIXME ADD SECTION] of the Handbook.

Table 1: Types of metadata and definitions. Table adapted from Riley, 2017. | Type of Metadata | Definition | Example Properties | Primary Uses | | :--------------- | ---------- | ------------------ | ------------ | | Descriptive metadata | For finding and understanding a resource; for data quality. | Title, Author, Subject, Genre, Publication Date | Findability, Display, Interoperability | | - Administrative metadata- Technical metadata - Rights metadata | - For decoding and rendering files - Long term management of files - Intellectual property rights attached to content | Technical: file type, file size, creation date/time Rights: Copyright status, License terms, Rights holder | Interoperability, Digital object management, Preservation | | Structural metadata | Relationships of parts of resources to one another. | Sequence, Place in hierarchy | Navigation | | | | | |

Metadata can include different terms depending on the context of the source. Table 2 has a list of all possible terms that metadata could include as well as their URIs. When a metadata term is used, the URI of that term must be accounted for in the @ context section of the JSON-LD script. This allows for the definition of the term to be traced back to a standardized definition.

Each data asset in the system must be at minimum described with the following terms (the first set of terms before the blank line in Table 2): URI, accessRights, accrualMethod, contributor, coverage, date, dateSubmitted, description, format, identifier, language, mediator, provenance, spatial, temporal, title, type, subjectThesaurus, subjectClassification.

When data are modified by GBADs or are from the output of a model, this must be accredited in the metadata. In these cases, the following terms may be used: modified, alternative, replaces, isPartof, isFormat, hasFormat, hasPart, isRequiredBy, isVersionOf, hasVersion, provenance, requires. For example, in the scenario that GBADs has acquired a data asset that was changed from one vocabulary to another metadata terms isVersionOf, provenance, requires and modified would be used to reflect the changes.

Table 2: All possible metadata terms and respective URIs.

"@context": {
"URI": "",
"accessRights": "",
"accrualMethod": "",
"contributor": "",
"coverage": "",
"date": "",
"dateSubmitted": "",
"description": "",
"format": "",
"identifier": "",
"language": "",
"mediator": "",
"provenance": "",
"rights": "",
"spatial": "",
"temporal": "",
"title": "",
"type": "",
"subjectThesaurus": "",
"subjectClassification": "",

"accrualPeriodicity": "",
"accrualPolicy": "",
"alternative": "",
"audience": "",
"available": "",
"bibliographicCitation": "",
"conformsTo": "",
"created": "",
"creator": "",
"dateAccepted": "",
"dateCopyrighted": "",
"extent": "",
"hasFormat": "",
"isFormatOf": "",
"hasPart": "",
"isPartOf": "",
"hasVersion": "",
"isVersionOf": "",
"isReferencedBy": "",
"replaces": "",
"isReplacedBy": "",
"requires": "",
"isRequiredBy": "",
"license": "",
"modified": "",
"references": "",
"rightsHolder": "",
"source": "",
"subject": "",
"tableOfContents": ""

Rich metadata with terms describing the process of how the original data was obtained, recorded and collected provides context about the data quality and which data sources can be combined with others. Table 3 provides metadata terms that were adapted from GATHER guidelines. When a data contributor provides a data source, they will be required to fill out a form to provide context about the data provided. The responses on the form will generate the descriptive metadata for that data source. The terms that are used to describe data assets will depend on the nature of the data source; for example, not all data sources will contain information about disease.

As more metadata is created, controlled vocabularies will be created or adapted for each term.

Table 3: Metadata terms and definitions adapted from GATHER guidelines

speciesSpecies classification of organism
diseaseIf based on reported disease, provide diseases that were included in study
inclusionCriteriaDemographic, clinical or geographic inclusion
exclusionCriteriaDemographic, clinical, or geographic exclusion
dataCollectionMethodHow was data collected? Survey? Field study?
ageRangeAge of study subjects.
sexProvide sex(es) of species included.
sampleSizeSample size
measurementMethodHow the outcome of interest was measured.
diagnosticCriteriaHow the disease was diagnosed.
definedBiasIdentify and describe any categories of input data that have potentially important biases.
fileFormatFormat of data file (csv, json etc.)
analysisMethodHow the outcome of interest was analyzed - should also provide link to GitHub if applicable
studyDesignInformation about research design. Terms such as cohort study, randomized trial, cost benefit analysis, descriptive study, diagnostic study etc. are expected.