Skip to main content

Data User Guide

The GBADs data portal is a prototype dashboard that allows users to access datasets from FAOSTAT and OIE WAHIS. The prototype allows you to select data of interest, visualize it on bar line graphs, download the data in .csv and .json format and see the Application Programming Interface (API) call.

**What is `.json`?**

[JSON](https://en.wikipedia.org/wiki/JSON) stands for JavaScript Object Notation. JSON is a file format that stores data in a standard structured format, allowing data to be both human and machine readable. JSON data can be read into all programming languages, and can be parsed into `excel`. JSON data is readable and easier to parse than other file formats such as [XML](https://en.wikipedia.org/wiki/XML).

```{admonition} We want to hear from you! 
:class: tip

Please note that this page will be updated as we continue to improve our data portal, and gain access to more data sources. We welcome feedback on what you like about the system, what you'd like to see and anything that you think could be more clear!
```
* Readers should understand what an API is, how it works, and why GBADs is using APIs
* Readers should understand how to use the GBADs API to get FAOSTAT and OIE WAHIS data
* Readers should understand how to use the GBADs API to read data into their `R` and `python` programs

Getting started with our API

What is an API?

An Application Programming Interface (API) is a machine-to-machine way to ask a server for data, get the server retrieve and interpret the data and return it to your machine. APIs are everywhere; they allow applications to 'talk' to each other. For example, when you check the weather on a weather app, the app is using an API to grab the data and present it in a usable and interpretable fashion on your phone. APIs provide the most up-to-date data without having to store data on your own machine.

For the data needs of GBADs, APIs work like this:

  1. You/your program requests data through the API call
  2. The webserver looks through its internal database for the data that you asked for
  3. The database gives the server the data that you asked for
  4. The data is returned to you/your program

Your workflow and APIs

To explain the concept of an API in more depth, we will discuss the common workflow to access data for modelling and where APIs come into play.

Manual data accrual method

If you are not using APIs in your current workflow, accessing data likely consists of navigating to a data portal or source, looking through the data catalogue or searching for a data set of your interest and then downloading the data. Each time the data is updated you have to repeat the process, find the data, download it again, import the file into your model and rerun with the updated numbers. While this workflow works, using APIs can eliminate the manual work of going to the website and getting the data everytime you need it.

When you are interacting with the website to get the data you would like, you are likely indirectly interacting with an API, which is working in the backend to get the data that you've selected and present it back to the webpage. However, you can use an API call to request data from the server where the data of interest resides {numref}APIcall.

An **API call** is the way that you ask a server for data. 

Using APIs to get data

Instead of manually downloading data from a website each time, you can incorporate API calls into your work flow to request the most up-to-date data from the source. This allows you to rerun code without having to change your code.

:name: APIcall

Simple breakdown of how an API works.

Once you have this 'API call' you can simply input into the program of your choice and automate your workflows and have access to the data without having to search through data catalogues each time. APIs are built on HTTP protocols, providing another plus: you can use APIs with virtually any programming language including R and Python, which are the most popular among our current users. This means that instead of loading data files into your R or python program each time, you can simply access the API right in your program. An added benefit is that this allows you to rerun your programs without having to download data from your sources each time that source is updated or modified.

Some data sources that GBADs uses such as FAOSTAT and The World Bank have APIs that can be used to get data. However, GBADs is handling the API management by developing an API that can request data from other APIs ({numref}GBADsAPI).

:name: GBADsAPI

Overview of GBADs API infrastructure. The GBADs API can communicate with various other open APIs to access data from other data stores, such as FAOSTAT. The GBADs API also allows data to be requested from the GBADs data store. Users can access data from various sources through an API call to GBADs API.
```{admonition} Special Access Data
:class: tip

_Please note that some data is not publicly available, and therefore is not available to all users_. See [the chapter on Data Licenses, Privacy and Security](http://www.gbadske.org/Documentation/DataGovernanceHandbook/dataOwnership.html) for more information about how GBADs handles confidential and sensitive data.
```

Using the GBADs API

You can check out our more extensive API documentation [FIXME here]. However, this section will show you the basics of using our API to fetch some data.

We will provide two examples of API calls to the GBADs API. One in Python, and one in R. In both examples we will use the same API call which will give you stock price of chickens in Ethiopia from 2005 and 2018 from the FAO. Our API call for this type of data is: http://35.183.203.15:8000/gbads/LiveAnimals/?year_start=2005&year_end=2018&element=Stocks&item=Chickens

You will notice that if you put the API call directly in your browser you will be brought to a page with the data in JSON format. You'll also notice that the API call specifies the category (LiveAnimals), the start and end year, the element, which are the stock prices and the item, Chickens. Currently our portal only supports the retrieval of Ethiopian data as that is the focus of our pilot study.
We are still developing our metadata API. 

In our Python use case you will need three libraries downloaded: json, requests and pandas.

import json
import requests
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Create apiCall
apiCall = "http://15.223.72.239:8000/gbads/LiveAnimals/?year_start=2005&year_end=2018&element=Stocks&item=Chickens"

response = requests.get(apiCall).json()

# Print the response so we can see what we got.
print(response)

In some cases, you may want to convert your response to a pandas dataframe, visualize the result, or save the result to a csv file. Below we will demonstrate how you can accomplish each of the following:

# Create pandas dataframe from api response
response = pd.DataFrame(response)

# What is our result? Print the first 10 rows of the dataframe.
print(response.head(10))

Before we go ahead and graph this data, we can use pandas to get a general overview of the data that we got from the api call.

# Which columns do we have? 
print(response.columns)

We can also see summaries of the columns:

response.describe()
# Graph time!
response.plot.scatter(x='Year',
y='Value',
c='DarkBlue')

And for fun, lets visualize a linear relationship through seaborn's linear regression function. The function provides a regression line on a plot with a 95% confidence interval.


ax = sns.regplot(x="Year", y="Value", data=response)

# Set axis labels
ax.set(xlabel='Year', ylabel='Number of Live Animals (1000 Heads)')

# Add a title
plt.title("Number of Live Chickens in Ethiopia")

# Show the result
plt.show(ax)

As you can see, with very little work we have gathered the data from the API, converted into a pandas dataframe, and plotted a regression.

We could also plot the data and visualize which points correspond to official data, and which were imputted:

# Different colours for the flag descriptions
sns.scatterplot(x="Year", y="Value", hue="Flag Description", data=response)

# Set axis labels
ax.set(xlabel='Year', ylabel='Number of Live Animals (1000 Heads)')

# Add a title
plt.title("Number of Live Chickens in Ethiopia")

# Show the result
plt.show(ax)

If you are interested in simply gathering the data from the API and saving it as a csv, you can use the code below to do so.

import json
import requests
import pandas as pd

# Create apiCall
apiCall = "http://15.223.72.239:8000/gbads/LiveAnimals/?year_start=2005&year_end=2018&element=Stocks&item=Chickens"

response = requests.get(apiCall).json()

# Encoding/decoding dataframe to get it in csv format
response = response.to_json(orient='split')
response = pd.read_json(response,orient='split')

# Name of outfile. Replace this with the path to where you would like to store the file, and the filename.
outfile = 'path/to/outfile/outfilename.csv'

# Save to outfile using pandas
response.to_csv(outfile, index=False)

Here's our R implementation:

You will need to make sure that you have the httr and jsonlite R packages downloaded.

# Uncomment the line below if you don't already have the libraries 
# install.packages(c("httr", "jsonlite"))

# Load in libraries
library(httr)
library(jsonlite)

# Create API call
apiCall = "http://15.223.72.239:8000/gbads/LiveAnimals/?year_start=2005&year_end=2018&element=Stocks&item=Chickens"

# Send request
response = GET(apiCall)

# See what the response gives us
response

# Create a dataframe from the API response
data = fromJSON(rawToChar(response$content))

# Check to make sure that worked
class(data)

# See what the first 5 rows of the data look like
head(data)

Creating a User Profile

The guide above allows you to access the API, which allows access to open data. In the future, we anticipate private data sources which you will only have access to if you are given permissions. Our system will support the ability to create a verified user log in, which will give you access to the private sources you are granted access to via a personalized portal and API key.