Skip to main content

Data Licenses, Privacy and Security

Data on a spectrum

GBADs disseminates and in some cases stores data that has various access, usage and reuse restrictions. Not all data can be open, and data privacy isn't as simple as having either open or private data. In order to encourage sharing, it is important that data contributors are given the option to select how they would like their data to be used, what they want it to be used for and who they would like it to be used by. Data licensing agreements make sure that data usage isn't confused, and inform our system on who can see, download or use data.

**Even data that is defined as "Open" needs a license!** When you use Open data you still need to determine how to properly attribute (or cite) the data set. In addition, data can be considered Open but may still have restrictions on what it can be used for. For example, some Open data licenses restrict the use of data for commercial purposes. 

The Open Data Institute communicates this idea by putting data on a spectrum from closed to open data.

Categories on the data spectrum

We used the spectrum to come up with four discrete data licensing categories:

Open data: “Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).”

Public access data: The data is protected by a licensing agreement that limits the use and dissemination of the data and/or the models the data can be used for. This could include the way the data can be used and for what purposes, attribution requirements etc.

Group-based access data: Authentication is required to access the data. Like public access data, the data is also protected by a licensing agreement that limits the use and dissemination of the data and/or the models the data can be used for.

Named access data and internal access data: A special contract will be required to articulate the use, attribution and access restrictions of the data. This will be explicitly assigned by a contract and/or NDA, which will require direct contact with the GBADs legal team. We grouped these two, because both will need a data contract and require named (and authenticated)access to use.

- How will users be authenticated? 
- How will groups of users be authenticated?
- What license will we use for models generated by GBADs and data outputs generated by the models?

Personal Identifiable Information (PII)

Personal Identifiable Information (PII) is any information that can be used to identify a person, residence or farm. This could include names, email addresses, geolocation or vet records for example. No matter the type of PII, data containing should be managed carefully.

PII should be protected and secure, with restricted access requirements. Depending on the use case, the data may be able to transformed to protect the PII. For example, geolocations can move up in spatial granularity and data can be provided in regions or zones or by country. Email addresses, phone numbers and names of farms can be encrypted on ingest and removed from data tables.

:class: tip
As the GBADs Knowledge Engine is a cloud service, any data that includes PII will be stored in a secure bucket, such as the [Amazon S3](https://aws.amazon.com/s3/) bucket.

Licensing

Licenses inform who can access data, how data can be used, who it can be used by and for what purposes and how to properly attribute the data.

License uses

Licenses have 3 utilities for GBADs, each which are informed by the CARE principles:

**Any time data is contributed to GBADs, data holders will be required to select a license for their data.** 

This is a CARE sharing mechanism because licenses enable data contributors to have the Authority to Control their data throughout it's lifecycle and with licenses that dictate the usage restrictions of the data, the data can be used for the Collective Benefit of the data holder individually, or the group that the data holder represents.

Publicly available licenses will be linked to in the metadata, and the citation/attribution information will be disseminated alongside the dataset.
Each data set will be licensed and the licensing and citation information will be available in the data set's metadata. Therefore, data users will be informed of how they can use the data that they access and the attribution that they must use. 
Open and public data will be available to any user who enters the site, but group or named access data will need authentification, and therefore will be inaccessible by default. 

In other words, the view of GBADs Knowledge Engine will be informed by the licensing agreement. In some cases, this may mean that even the metadata will not be shown to unauthorized users. In other cases, the descriptive metadata may be available and users could request access. What the public, or certain users and groups can see will be governed by the choices of the data user.

License selection

Data holders contributing Open or public access data must chose a licensing agreement for their data. There is a suite of data licensing agreements that data holders can choose from. These include:

GBADs is exploring how to make licenses machine readable, so that data that flows through the knowledge engine and is stored in GBADs repositories can be more FAIR. In addition, we use data privacy restrictions to inform system views for different users to help protect the usage requirements set forward by data contributors. 
GBADs must determine whether data contributors can change the license on their data after they submit it and if so how to communicate to individuals who may have downloaded the data in question. In addition, GBADs must decide how the retraction of data affects pre-existing models.