Introduction

Today the petition to revoke Article 50 and for the UK to remain in the European Union has passed over 3 million signatures. I took a look at the petition webpage and saw that they have the signature data available as a json file. Not needing an excuse to get some practice with shapefiles and geopandas, I thought we could look at something non football related for a change.

Packages

For this tutorial we will be using geopandas to handle of geographic information and shapefiles, numpy to add in some NaNs, pandas for our dataframes, Bokeh for interactivity, and json to convert our dataset into a Bokeh-friendly format. I have also imported requests to get our data from the petition url.

In [1]:
import pandas as pd
import numpy as np
import json

#The following packages and dependencies can be installed with pip from cmd / terminal

# $ pip install geopandas
# $ pip install bokeh

import geopandas as gpd

from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.palettes import viridis


import requests

Styling

In [2]:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
    border-style: solid;
    border-width: thin;
    border-color: black;
}

table.dataframe th {
    background-color: grey;
    color: white;
    
}

table.dataframe tr:hover {
    background-color: rgba(46, 139, 87, 0.8);
    color: white;
    
}
</style>

Getting the geo data

the dataset I am using for this tutorial is downloaded from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/

On that page, click on "Download Countries" from Admin 0 - Countries. This will download a folder which should be placed in whatever directory is handiest for directing your python script to, I just put it in the same folder as this notebook.

Below I am importing the shapefile and adding the columns I am interested in to a dataframe using geopandas. You'll notice we then have 3 columns which I have named "name", "code", and "geometry". The geometry column holds the shapefile information we need to draw our map. It contains a range of values that will be used to draw the borders of our countries and later of our constituencies in the United Kingdom. For sake of brevity, I have only shown the first 10 rows.

In [3]:
shapefile = 'ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp' # path to folder and file #

gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.sort_values("ADM0_A3",inplace=True)
gdf.columns=["name","code","geometry"]
gdf[:10]
Out[3]:
name code geometry
103 Afghanistan AFG POLYGON ((66.51860680528867 37.36278432875879,...
74 Angola AGO (POLYGON ((12.99551720546518 -4.78110320396188...
125 Albania ALB POLYGON ((21.0200403174764 40.84272695572588, ...
84 United Arab Emirates ARE POLYGON ((51.57951867046327 24.24549713795111,...
9 Argentina ARG (POLYGON ((-68.63401022758323 -52.636370458874...
109 Armenia ARM POLYGON ((46.50571984231797 38.77060537368629,...
159 Antarctica ATA (POLYGON ((-48.66061601418252 -78.047018731598...
23 French Southern and Antarctic Lands ATF POLYGON ((68.935 -48.62500000000001, 69.58 -48...
137 Australia AUS (POLYGON ((147.6892594748842 -40.8082581520226...
114 Austria AUT POLYGON ((16.97966678230404 48.12349701597631,...

Getting the petition data

You can go to the petition webpage and follow the link to the json data if you like or you can access it here https://petition.parliament.uk/petitions/241584.json


I will use requests and json to read the data rather than copy-pasting it so I can automatically update the dataset as new signatures are added.

In [4]:
url = ("https://petition.parliament.uk/petitions/241584.json")
response = requests.get(url)
all_data = response.text
all_data = json.loads(all_data)

We can work our way through the json data now by first looking at how it is structured.

In [5]:
all_data.keys()
Out[5]:
dict_keys(['links', 'data'])

We are interested in the 'data' key so we will filter to that by calling the key in square brackets. Again, for sake of brevity I will skip ahead but you can find sub-key by repeating the above - e.g all_data['data'].keys()

Below I have filtered down to signatures by country and am showing the first 10 rows in the data.

In [6]:
country_data = all_data['data']['attributes']['signatures_by_country']
country_data[:10]
Out[6]:
[{'code': 'AF', 'name': 'Afghanistan', 'signature_count': 14},
 {'code': 'AL', 'name': 'Albania', 'signature_count': 11},
 {'code': 'DZ', 'name': 'Algeria', 'signature_count': 9},
 {'code': 'AS', 'name': 'American Samoa', 'signature_count': 3},
 {'code': 'AD', 'name': 'Andorra', 'signature_count': 24},
 {'code': 'AO', 'name': 'Angola', 'signature_count': 7},
 {'code': 'AI', 'name': 'Anguilla', 'signature_count': 8},
 {'code': 'AG', 'name': 'Antigua and Barbuda', 'signature_count': 16},
 {'code': 'AR', 'name': 'Argentina', 'signature_count': 93},
 {'code': 'AM', 'name': 'Armenia', 'signature_count': 4}]

Now we have a list of dictionaries we can use to build a new dataframe from. I have noticed that the codes from our geodata do not match with those from the petition data so have dropped them. There are also some differences in country names which I will also fix below.

In [7]:
country_data = pd.DataFrame(country_data)
country_data['name'] = country_data.name.str.replace("United States", "United States of America")
country_data['name'] = np.where(country_data.name == "Congo (Democratic Republic)", "Democratic Republic of the Congo",country_data.name)
country_data['name'] = np.where(country_data.name == "Congo", "Republic of the Congo",country_data.name)
country_data['name'] = np.where(country_data.name == "The Gambia", "Gambia",country_data.name)
country_data['name'] = np.where(country_data.name == "Guinea", "Equatorial Guinea",country_data.name)
country_data['name'] = np.where(country_data.name == "The Occupied Palestinian Territories","Palestine",country_data.name)
country_data['name'] = np.where(country_data.name == "Somalia","Somaliland",country_data.name)
country_data['name'] = np.where(country_data.name == "Serbia","Republic of Serbia", country_data.name)
country_data['name'] = np.where(country_data.name == "Tanzania","United Republic of Tanzania", country_data.name)



del country_data['code']

For the first chart we will make, I am interested in those who have signed from outside the United Kingdom. However, I would still like the UK to be visable in the visualistion. To deal with this, I have removed the number of signatures from the total count and set the count and percentage of signatures for the UK to NaN (Not a Number). The reason I have done this is that the signatures are overwhelmingly from the UK - around 96% at the time of writing - and a viz that shows only the UK coloured is pretty useless. Again, I will look exclusively at the UK shortly.

Having looked at the datasets, I have noted that a few countries are missing, The following code will look for countries not in our petition dataset append them to it.

In [8]:
df = gdf.merge(country_data)
ndf = gdf[~gdf['name'].isin(df.name.values)].copy()

# I AM BEING LAZY AND NOT LOOKING UP THE BEST WAY TO PARTIAL MATCH THE FOLLOWING; 
# INSTEAD I'VE REPLACED STRINGS TO CONFORM WITH OTHER DATASET

df = df.append(ndf,ignore_index=True)
df.fillna(0,inplace=True)

df['perc_sig'] = round((df.signature_count / (sum(df.signature_count) - df[df['name'] == "United Kingdom"].signature_count.item())*100),3)
df['perc_sig'] = np.where(df.name == "United Kingdom", np.nan,df.perc_sig)
df['signature_count'] = np.where(df.name == "United Kingdom", np.nan,df.signature_count)
df['s_perc_sig'] = df.perc_sig.astype(str) + "%"
df[:10]
Out[8]:
code geometry name signature_count perc_sig s_perc_sig
0 AFG POLYGON ((66.51860680528867 37.36278432875879,... Afghanistan 14.0 0.010 0.01%
1 AGO (POLYGON ((12.99551720546518 -4.78110320396188... Angola 7.0 0.005 0.005%
2 ALB POLYGON ((21.0200403174764 40.84272695572588, ... Albania 11.0 0.007 0.007%
3 ARE POLYGON ((51.57951867046327 24.24549713795111,... United Arab Emirates 1485.0 1.011 1.011%
4 ARG (POLYGON ((-68.63401022758323 -52.636370458874... Argentina 93.0 0.063 0.063%
5 ARM POLYGON ((46.50571984231797 38.77060537368629,... Armenia 4.0 0.003 0.003%
6 AUS (POLYGON ((147.6892594748842 -40.8082581520226... Australia 10524.0 7.162 7.162%
7 AUT POLYGON ((16.97966678230404 48.12349701597631,... Austria 1882.0 1.281 1.281%
8 AZE (POLYGON ((46.40495079934882 41.86067515722731... Azerbaijan 10.0 0.007 0.007%
9 BDI POLYGON ((30.46967364576122 -2.41385475710134,... Burundi 5.0 0.003 0.003%

The next step is to convert our dataframe to json and convert the json back to a string-like object. After that, we can look at plotting our first interactive viz.

In [9]:
df_json = json.loads(df.to_json())
json_data = json.dumps(df_json)

Plotting the data

Now that we have our data ready, we can use Bokeh and it's interactive features to create our interactive visualisation. Bokeh has it's own way of working as it aims to combine python functionality with HTML and js elements. This makes it a ... pain. But there is plenty of hard to read documentation available for you to sink your teeth into if you are looking to learn more which you can find here https://bokeh.pydata.org/en/latest/

The cool thing about Bokeh are the interactive tools we can use. I have added a hover tool that will show information when you mouse over a country. There are also tools you can use to zoom in on areas of the map for a clearer view, and a reset button to take you back to the original viz.

In [10]:
geosource = GeoJSONDataSource(geojson = json_data)
palette = viridis(256)

color_mapper = LinearColorMapper(palette = palette, low = 0, high = 20,nan_color = '#d9d9d9')
tick_labels = {'0': '0%', '2': '2%', '4':'4%', '6':'6%', '8':'8%', '10':'10%','12':'12%','14':'14%','16':'16%','18':'18%','20':'>20%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 10, major_label_text_font_size="14pt",
border_line_color="white",location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)



hover = HoverTool(
        tooltips=[
            ("Country", "@name"),
            ("Signatures", "@signature_count"),
            ("Percentage", "@s_perc_sig"),
        ])

p = figure(title = 'Revoke Article 50 - Signatures per Country Excluding UK', 
           plot_height = 500 , plot_width = 850, tools=[hover,"pan,wheel_zoom,box_zoom,reset"])


p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :'perc_sig', 'transform' : color_mapper},
          line_color = 'black', line_width = 0.25, fill_alpha = 1)

p.add_layout(color_bar, 'below')
p.axis.visible = False
p.title.text_font_size = '21pt'
output_notebook()

show(p)
Loading BokehJS ...

Results

As we can see from the above, most of the signatures from outside the UK come from France, Spain, and Germany with 20.9%, 11.6%, and 9.2% of non-UK signatures respectively. The Russian bots haven't fired up yet, with just 28 total signatures coming from there. In the English speaking world, almost 4000 people from the US, 1600 from Canada, 1800 from Ireland, 2500 from Australia, 1000 from New Zealand have signed. Let's now take a look at the UK. I won't go over the code again, but you can get the shapefiles for constituency boundaries from here http://geoportal.statistics.gov.uk/datasets/5ce27b980ffb43c39b012c2ebeab92c0_2

In [11]:
shapefile = 'UK_Constituencies/Westminster_Parliamentary_Constituencies_December_2017_Generalised_Clipped_Boundaries_in_the_UK.shp' # path to folder and file #

uk_gdf = gpd.read_file(shapefile)[['pcon17nm', 'pcon17cd', 'geometry']]
uk_gdf.columns = ['name','ons_code','geometry']
uk_data = all_data['data']['attributes']['signatures_by_constituency']
uk_data = pd.DataFrame(uk_data)
uk_df = uk_gdf.merge(uk_data)


uk_df['perc_sig'] = round((uk_df.signature_count / sum(uk_df.signature_count)*100),3)
uk_df['s_perc_sig'] = uk_df.perc_sig.astype(str) + "%"




uk_df_json = json.loads(uk_df.to_json())
json_data = json.dumps(uk_df_json)
In [12]:
geosource = GeoJSONDataSource(geojson = json_data)

palette = viridis(256)
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 10000,nan_color = '#d9d9d9')

color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 10, major_label_text_font_size="14pt",
border_line_color=None,location = (0,0), orientation = 'horizontal')#, major_label_overrides = tick_labels)



hover = HoverTool(
        tooltips=[
            ("Constituency", "@name"),
            ("Signatures", "@signature_count"),
            ("Percentage", "@s_perc_sig"),
        ])

p = figure(title = 'Revoke Article 50 - Signatures per Constituency', 
           plot_height = 950 , plot_width = 750, tools=[hover,"pan,wheel_zoom,box_zoom,reset"])


p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

p.patches('xs','ys', source = geosource,fill_color = {'field' :'signature_count', 'transform' : color_mapper},
          line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
p.axis.visible = False
p.title.text_font_size = '21pt'
#output_notebook()
                    #uncomment output_notebook() & show(p) to show
#show(p)

Below is a screenshot of the UK results as the actual code takes way too long to load so I have commented out the output of the viz. If you want to see it fully interactive you can follow along this tutorial and remove the # from output_notebook() and show(p).