Data Licenses, Security, and Privacy
There are 3 core instances where data licensing, privacy, and security are important:
- Ensuring that we use, store, and share data according to the licensing agreement, and that we are citing data properly
- Securely storing private data to protect PII
- Attributing a license any output data created by GBADs
Ensuring that the processes involved in the security and access of data in GBADs fosters a community of trust with data contributors and users.
Note on Private Data:
We are currently only using public data for models in GBADs. In anticipation for private data we have conceptualized and created the infrastructure that will support the security of private data.
Working Group 1 should be aware of licensing and privacy when creating partnerships and alliances with potential data contributors.
Data 'Openness' On a Spectrum
GBADs disseminates and in some cases stores data that has various access, usage and reuse restrictions. Not all data can be open, and data privacy isn't as simple as having either open or private data. In order to encourage sharing, it is important that data contributors are given the option to select how they would like their data to be used, what they want it to be used for and who they would like it to be used by. Data licensing agreements make sure that data usage isn't confused, and inform our system on who can see, download or use data.
**Even data that is defined as "Open" needs a license!** When you use Open data you still need to determine how to properly attribute (or cite) the data set. In addition, data can be considered Open but may still have restrictions on what it can be used for. For example, some Open data licenses restrict the use of data for commercial purposes.
The Open Data Institute communicates this idea by putting data on a spectrum from closed to open data.
Categories on the data spectrum
We used the spectrum to come up with four discrete data licensing categories:
Open data: “Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).”
Public access data: The data is protected by a licensing agreement that limits the use and dissemination of the data and/or the models the data can be used for. This could include the way the data can be used and for what purposes, attribution requirements etc.
Group-based access data: Authentication is required to access the data. Like public access data, the data is also protected by a licensing agreement that limits the use and dissemination of the data and/or the models the data can be used for.
Named access data and internal access data: A special contract will be required to articulate the use, attribution and access restrictions of the data. This will be explicitly assigned by a contract and/or NDA, which will require direct contact with the GBADs legal team. We grouped these two, because both will need a data contract and require named (and authenticated)access to use.
- How will users be authenticated?
- How will groups of users be authenticated?
- What license will we use for models generated by GBADs and data outputs generated by the models?
Personal Identifiable Information (PII)
Personal Identifiable Information (PII) is any information that can be used to identify a person, residence or farm. This could include names, email addresses, geolocation or vet records for example. No matter the type of PII, data containing should be managed carefully.
PII should be protected and secure, with restricted access requirements. Depending on the use case, the data may be able to transformed to protect the PII. For example, geolocations can move up in spatial granularity and data can be provided in regions or zones or by country. Email addresses, phone numbers and names of farms can be encrypted on ingest and removed from data tables.
Secure Data Storage Infrastructure
As the GBADs Knowledge Engine is a cloud service, any data that includes PII will be stored in a secure bucket, such as the Amazon S3 bucket.
Licenses inform who can access data, how data can be used, who it can be used by and for what purposes and how to properly attribute the data.
Licenses have 3 utilities for GBADs, each which are informed by the CARE principles:
- Protect Data Contributors Any time data is contributed to GBADs, data holders will be required to select a license for their data.
This is a CARE sharing mechanism because licenses enable data contributors to have the Authority to Control their data throughout it's lifecycle and with licenses that dictate the usage restrictions of the data, the data can be used for the Collective Benefit of the data holder individually, or the group that the data holder represents.
Publicly available licenses will be linked to in the metadata, and the citation/attribution information will be disseminated alongside the dataset.
Inform Data Users Each data set will be licensed and the licensing and citation information will be available in the data set's metadata. Therefore, data users will be informed of how they can use the data that they access and the attribution that they must use.
Inform System View Open and public data will be available to any user who enters the site, but group or named access data will need authentification, and therefore will be inaccessible by default.
In other words, the view of GBADs Knowledge Engine will be informed by the licensing agreement. In some cases, this may mean that even the metadata will not be shown to unauthorized users. In other cases, the descriptive metadata may be available and users could request access. What the public, or certain users and groups can see will be governed by the choices of the data user.
Data holders contributing Open or public access data must chose a licensing agreement for their data. There is a suite of data licensing agreements that data holders can choose from. These include:
- Creative Commons Licenses. The Creative Commons license selector tool allows individuals to select the features of usage, adaptation and sharing, and provides a license that reflects these preferences.
- Open Data Commons Licenses including the Open Data Commons Open Database License (ODbl), the Open Data Commons Attribution License and the Open Data Commons Public Domain Dedication and License (PDDL).
In some cases, private data agreements will have to be made with a legal team to ensure that the usage restrictions, security, and licensing information are properly agreed upon between the data holder and GBADs.