Skip to main content

Data Quality

The quality of data that is used by GBADs is assessed and communicated as per Data Goal 4. Our definition of data quality is rooted in the 6 dimensions of data quality.

Data quality is important for GBADs because the quality of data will have a direct impact on the quality of models and outputs produced by models. Therefore, the input data must be assessed for quality and the quality of this data must be reported to modellers and users of GBADs' data and models. While we do not have a direct influence on the quality of the input data, we can report our confidence in the data and potential quality issues that may impact the certainty of estimates.

Similarly, we must ensure that the output data produced by models and estimates are of high quality to ensure that they can be used with confidence for decision-making and users can trust GBADs outputs.

Data quality falls under the responsibility of Working Groups 2 and 3 - see the Data Governance Operating Model for more details on data responsibilities

Defining Data Quality

There is no agreed definition of data quality; the definition of data quality depends on the context in which data will be used. Many dimensions of data quality have been proposed, however, the dimensions that are relevant to the use case of an organization are again dependent on the context of use.

In this section we outline the dimensions of data quality that are relevant to GBADs and the processes, metrics, and tools that are used to assess the quality of data that is used and produced.

Dimensions of Data Quality

Tools to Support Data Quality Analysis

GBADs Informatics is working on a number of data quality tools. These tools encompass the following tasks:

  • Data “stories” that visualise and provide commentary on data from the Ethiopian Central Statistical Agency Livestock reports at both the national and regional levels

  • Analysis of data sources such as FAOSTAT and WOAH in terms of internal and external agreeability (visualisations, measurements, and metrics)

  • Ontology and SHACL for validating data and assessing data quality relating to the categories of species in the WOAH population data (software demonstration of this capability to be available in the 1st quarter of 2023)

While it is up to the organization or data contributor to correct their data at source, we have procedures in place to communicate the results of quality assessments.

Roles in Data Quality