Introduction

So far in our codethroughs we've primarily dealt with pandas dataframe which we have created from .csv files. However, we won't always have our data in a csv format - particularly if we are getting data from websites. As such it's key to get used to working with different file formats, and one of the most common we will deal with are JSON files.

What is JSON?

JSON is short for JavaScript Object Notation. It is an easy to read format which is easily transferable between servers and webpages. It is stored as text which makes it lightweight on memory and storage.

Today we will take a look at understanding, reading, and using JSON files. For this tutorial we will be using Statsbomb's publicly available data which you can access after accepting their T&Cs which you can find here:

https://statsbomb.com/resource-centre/

an example:
example = {"person": [
                        {"name": "Peter",
                         "age": 28,
                         "nationality": "Irish"},

                        {"name": "Paul",
                         "age": 44,
                         "nationality": "German"},

                        {"name": "Stephen",
                         "age": 22,
                         "nationality": "English"},

                        {"name": "Paul",
                         "age": 26,
                         "nationality": "Scottish"}
                         ]          
                  }


If you've taken a look at some of my tutorials before or if you've gone through any intro to python course some things will look familiar to you here.

As a quick recap, in python we create dictionaries by bounding them in { }. Every dictionary has key-value pairs. In the example variable we have the key "person" and the information that follows the : is the value. Within "person", the pattern repeats. "name" is a key, and "Peter" is the corresponding value.

You'll also notice above we have square brackets [ ]. In python, these always bound lists. So while JSON can look daunting when approaching it for the first time, we are essentially dealing with dictionaries and lists of dictionaries. We can show this by using type:

In [1]:
print(type(example))
example
<class 'dict'>
Out[1]:
{'person': [{'age': 28, 'name': 'Peter', 'nationality': 'Irish'},
  {'age': 44, 'name': 'Paul', 'nationality': 'German'},
  {'age': 22, 'name': 'Stephen', 'nationality': 'English'},
  {'age': 26, 'name': 'Paul', 'nationality': 'Scottish'}]}

As we can see, we are working with a dictionary (

class 'dict'
)

We can quickly see what keys and values we have in our dictionary by using

example.keys()
and
example.values()
In [2]:
example.keys(), example.values()
Out[2]:
(dict_keys(['person']),
 dict_values([[{'name': 'Peter', 'age': 28, 'nationality': 'Irish'}, {'name': 'Paul', 'age': 44, 'nationality': 'German'}, {'name': 'Stephen', 'age': 22, 'nationality': 'English'}, {'name': 'Paul', 'age': 26, 'nationality': 'Scottish'}]]))

Because we are dealing with a dictionary, we can get into the data by using the 'person' key. You'll notice this is wrapped in [ ] so we know we are returning a list:

In [3]:
print(type(example['person'])) # Just checking what type of object we are dealing with
example['person']
<class 'list'>
Out[3]:
[{'age': 28, 'name': 'Peter', 'nationality': 'Irish'},
 {'age': 44, 'name': 'Paul', 'nationality': 'German'},
 {'age': 22, 'name': 'Stephen', 'nationality': 'English'},
 {'age': 26, 'name': 'Paul', 'nationality': 'Scottish'}]

We get items from lists by its index (position). So if we want to get the first item in the list we use

[0]
If we want the next item we use
[1]
For longer lists it's not practical to count to find the position of each item, so we'll deal with getting specific information later on, but for now I'll just add we can get the last item by using
[-1]
and the second-last with
[-2]
and so on.

We'll test this by getting the first and last item from our example:

In [4]:
example['person'][0]
Out[4]:
{'age': 28, 'name': 'Peter', 'nationality': 'Irish'}
In [5]:
example['person'][-1]
Out[5]:
{'age': 26, 'name': 'Paul', 'nationality': 'Scottish'}
In [6]:
print(type(example['person']))
example['person']
<class 'list'>
Out[6]:
[{'age': 28, 'name': 'Peter', 'nationality': 'Irish'},
 {'age': 44, 'name': 'Paul', 'nationality': 'German'},
 {'age': 22, 'name': 'Stephen', 'nationality': 'English'},
 {'age': 26, 'name': 'Paul', 'nationality': 'Scottish'}]

As we should hopefully already know, we can get the information we need from a dictionary by referencing its key. In our example, if we type

example['person'][0]['name']
we should receive "Peter" as our output:

In [7]:
example['person'][0]['name']
Out[7]:
'Peter'

Let's assume now that we want to get the nationalities for each person called Paul. We can do this one at time, but again, in larger datasets that's just not practical. Instead, we can run a loop to either return the complete dictionary if "Paul" is the value that matches our "name" key, or just the nationality:

In [8]:
all_Paul = []
the_nation_of_Paul = []
for e in example['person']:
    if e['name'] == "Paul":         
        all_Paul.append(e)        ## This will append each dictionary to the all_Paul list
        
    if e['name'] == "Paul":
        the_nation_of_Paul.append(e['nationality'])  ## This will append the nationality of matching records to 
                                                     ## the_nation_of_Paul list
In [9]:
all_Paul
Out[9]:
[{'age': 44, 'name': 'Paul', 'nationality': 'German'},
 {'age': 26, 'name': 'Paul', 'nationality': 'Scottish'}]
In [10]:
the_nation_of_Paul
Out[10]:
['German', 'Scottish']

You will use one or the other or both depending on your use case. I'm purposely avoiding pandas here, but you should get the idea of how you could turn JSON data into a pandas dataframe from the above examples.

Reading Local JSON files

So far in our tutorials we've primarily imported data using pd.read_csv. We're working with json data this time though, so let's get our data by using with open instead.

In [11]:
import json

with open("sb_example_file.json",'r') as f:
    data = json.load(f)
    f.close()

In the brackets above we give the name of the file (the file is in the same folder as this python script, if it's in another location you need to use full path -> e.g /User/Downloads/sb_example.json), and we use 'r' to state that we want to read the file. If this file was in bytes we would use 'rb', and if we wanted to save some data to file we could do this with "w" or "wb" -> (write)

We type f.close() to close the file as we have already stored it in the data variable we declared in the line above. Let's take a look at the data:

In [12]:
data
Out[12]:
[{'lineup': [{'country': {'id': 220, 'name': 'Sweden'},
    'jersey_number': 16,
    'player_id': 4633,
    'player_name': 'Magdalena Ericsson'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 24,
    'player_id': 4638,
    'player_name': 'Drew Spence'},
   {'country': {'id': 220, 'name': 'Sweden'},
    'jersey_number': 1,
    'player_id': 4640,
    'player_name': 'Rut Hedvig Lindahl'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 14,
    'player_id': 4641,
    'player_name': 'Francesca Kirby'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 4,
    'player_id': 4642,
    'player_name': 'Millie Bright'},
   {'country': {'id': 121, 'name': 'Korea (South)'},
    'jersey_number': 10,
    'player_id': 4647,
    'player_name': 'So-yun Ji'},
   {'country': {'id': 221, 'name': 'Switzerland'},
    'jersey_number': 23,
    'player_id': 4659,
    'player_name': 'Ramona Bachmann'},
   {'country': {'id': 201, 'name': 'Scotland'},
    'jersey_number': 22,
    'player_id': 4660,
    'player_name': 'Erin Cuthbert'},
   {'country': {'id': 220, 'name': 'Sweden'},
    'jersey_number': 20,
    'player_id': 10222,
    'player_name': 'Jonna Andersson'},
   {'country': {'id': 171, 'name': 'Norway'},
    'jersey_number': 18,
    'player_id': 10395,
    'player_name': 'Maren Nævdal Mjelde'},
   {'country': {'id': 249, 'name': 'Wales'},
    'jersey_number': 5,
    'player_id': 15549,
    'player_name': 'Sophie Ingle'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 15,
    'player_id': 15550,
    'player_name': 'Bethany England'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 8,
    'player_id': 15553,
    'player_name': 'Karen Julia Carney'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 7,
    'player_id': 19422,
    'player_name': 'Jessica Carter'}],
  'team_id': 971,
  'team_name': 'Chelsea LFC'},
 {'lineup': [{'country': {'id': 68, 'name': 'England'},
    'jersey_number': 12,
    'player_id': 4643,
    'player_name': 'Georgia Stanway'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 23,
    'player_id': 4648,
    'player_name': 'Abbie McManus'},
   {'country': {'id': 61, 'name': 'Denmark'},
    'jersey_number': 10,
    'player_id': 4650,
    'player_name': 'Nadia Nadim'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 17,
    'player_id': 4654,
    'player_name': 'Nikita Parris'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 24,
    'player_id': 4658,
    'player_name': 'Keira Walsh'},
   {'country': {'id': 40, 'name': 'Canada'},
    'jersey_number': 11,
    'player_id': 4992,
    'player_name': 'Janine Beckie'},
   {'country': {'id': 22, 'name': 'Belgium'},
    'jersey_number': 25,
    'player_id': 10099,
    'player_name': 'Tessa Wullaert'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 1,
    'player_id': 10170,
    'player_name': 'Karen Bardsley'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 8,
    'player_id': 10172,
    'player_name': 'Jill Scott'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 6,
    'player_id': 10185,
    'player_name': 'Stephanie Houghton'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 7,
    'player_id': 15547,
    'player_name': 'Melissa Lawley'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 4,
    'player_id': 15554,
    'player_name': 'Gemma Bonner'},
   {'country': {'id': 68, 'name': 'England'},
    'jersey_number': 15,
    'player_id': 15555,
    'player_name': 'Lauren Hemp'},
   {'country': {'id': 201, 'name': 'Scotland'},
    'jersey_number': 5,
    'player_id': 17524,
    'player_name': 'Jennifer Beattie'}],
  'team_id': 746,
  'team_name': 'Manchester City WFC'}]

you'll notice that at the very beginning of our file we have a

[
and at the very end a
]

As we covered earlier, this means we are dealing with a list

In [13]:
type(data)
Out[13]:
list

Let's take a look at the first item:

In [14]:
print(type(data[0])) #Just checking what type of object we are dealing with

data[0]['lineup']
<class 'dict'>
Out[14]:
[{'country': {'id': 220, 'name': 'Sweden'},
  'jersey_number': 16,
  'player_id': 4633,
  'player_name': 'Magdalena Ericsson'},
 {'country': {'id': 68, 'name': 'England'},
  'jersey_number': 24,
  'player_id': 4638,
  'player_name': 'Drew Spence'},
 {'country': {'id': 220, 'name': 'Sweden'},
  'jersey_number': 1,
  'player_id': 4640,
  'player_name': 'Rut Hedvig Lindahl'},
 {'country': {'id': 68, 'name': 'England'},
  'jersey_number': 14,
  'player_id': 4641,
  'player_name': 'Francesca Kirby'},
 {'country': {'id': 68, 'name': 'England'},
  'jersey_number': 4,
  'player_id': 4642,
  'player_name': 'Millie Bright'},
 {'country': {'id': 121, 'name': 'Korea (South)'},
  'jersey_number': 10,
  'player_id': 4647,
  'player_name': 'So-yun Ji'},
 {'country': {'id': 221, 'name': 'Switzerland'},
  'jersey_number': 23,
  'player_id': 4659,
  'player_name': 'Ramona Bachmann'},
 {'country': {'id': 201, 'name': 'Scotland'},
  'jersey_number': 22,
  'player_id': 4660,
  'player_name': 'Erin Cuthbert'},
 {'country': {'id': 220, 'name': 'Sweden'},
  'jersey_number': 20,
  'player_id': 10222,
  'player_name': 'Jonna Andersson'},
 {'country': {'id': 171, 'name': 'Norway'},
  'jersey_number': 18,
  'player_id': 10395,
  'player_name': 'Maren Nævdal Mjelde'},
 {'country': {'id': 249, 'name': 'Wales'},
  'jersey_number': 5,
  'player_id': 15549,
  'player_name': 'Sophie Ingle'},
 {'country': {'id': 68, 'name': 'England'},
  'jersey_number': 15,
  'player_id': 15550,
  'player_name': 'Bethany England'},
 {'country': {'id': 68, 'name': 'England'},
  'jersey_number': 8,
  'player_id': 15553,
  'player_name': 'Karen Julia Carney'},
 {'country': {'id': 68, 'name': 'England'},
  'jersey_number': 7,
  'player_id': 19422,
  'player_name': 'Jessica Carter'}]

and at what our keys are (try this yourself. If you are using a lineups file you should get the same result):

In [15]:
 
Out[15]:
dict_keys(['team_id', 'team_name', 'lineup'])

Great, now we should be able to get our all of the data we want from our lineups file. Let's create a dictionary - not a muti-dict - with each player's id as the key, and some information as our value in the key-value pair.

In [16]:
player_dict = {} # This creates an empty dictionary.

for team in data: #remember data is a list (each team)
    for player in team['lineup']:
        k = player['player_id']  # setting the key
        v = [player['jersey_number'],player['player_name']] # adding values -> we use [ ] as shorthand for a list
        player_dict.update({k:v})
        
player_dict
        
Out[16]:
{4633: [16, 'Magdalena Ericsson'],
 4638: [24, 'Drew Spence'],
 4640: [1, 'Rut Hedvig Lindahl'],
 4641: [14, 'Francesca Kirby'],
 4642: [4, 'Millie Bright'],
 4643: [12, 'Georgia Stanway'],
 4647: [10, 'So-yun Ji'],
 4648: [23, 'Abbie McManus'],
 4650: [10, 'Nadia Nadim'],
 4654: [17, 'Nikita Parris'],
 4658: [24, 'Keira Walsh'],
 4659: [23, 'Ramona Bachmann'],
 4660: [22, 'Erin Cuthbert'],
 4992: [11, 'Janine Beckie'],
 10099: [25, 'Tessa Wullaert'],
 10170: [1, 'Karen Bardsley'],
 10172: [8, 'Jill Scott'],
 10185: [6, 'Stephanie Houghton'],
 10222: [20, 'Jonna Andersson'],
 10395: [18, 'Maren Nævdal Mjelde'],
 15547: [7, 'Melissa Lawley'],
 15549: [5, 'Sophie Ingle'],
 15550: [15, 'Bethany England'],
 15553: [8, 'Karen Julia Carney'],
 15554: [4, 'Gemma Bonner'],
 15555: [15, 'Lauren Hemp'],
 17524: [5, 'Jennifer Beattie'],
 19422: [7, 'Jessica Carter']}

CAUTION: USING UPDATE WILL DELETE NON-UNIQUE KEYS FROM DICTIONARY

As we are working with one dictionary here rather than a multi-dict, if a player id already exists, it will be replaced by the new key-value pair.

Just as a quick recap, we can get the player jersey number or name by using the player id, and the list position [0] or [1]:

In [17]:
print("jersey number & player name: {}".format(player_dict[4633]))
print("jersey number: {}".format(player_dict[4633][0]))
print("player name: {}".format(player_dict[4633][1]))
jersey number & player name: [16, 'Magdalena Ericsson']
jersey number: 16
player name: Magdalena Ericsson

Reading online JSON files

In [19]:
from bs4 import BeautifulSoup
import requests
import json


#url =  #GO TO THE STATSBOMB LINK AND FOLLOW THE INSTRUCTIONS TO GET THE LINK, THEN GO TO DESIRED FOLDER #
base_url = "https://raw.githubusercontent.com"
response = requests.get(url).text
soup = BeautifulSoup(response,'lxml')
links = soup.find_all("a")
match_urls = []
for link in links:
    if ".json" in str(link):
        l = link['href']
        l = str(l).replace("blob/","")
        match_urls.append(l)

master_data = []
for match in match_urls:
    data = json.loads(requests.get(base_url+match).text)    
    master_data.append(data)
master_data[0][:5] # just showing the first five results of the first file
Out[19]:
[{'away_score': 0,
  'away_team': {'away_team_id': 970, 'away_team_name': 'Yeovil Town LFC'},
  'competition': {'competition_id': 37,
   'competition_name': "FA Women's Super League",
   'country_name': 'England'},
  'data_version': '1.0.3',
  'home_score': 3,
  'home_team': {'home_team_id': 968, 'home_team_name': 'Arsenal WFC'},
  'kick_off': '19:30:00.000',
  'last_updated': '2019-02-24T01:33:44.140837',
  'match_date': '2019-02-20',
  'match_id': 19797,
  'match_status': 'available',
  'referee_name': None,
  'season': {'season_id': 4, 'season_name': '2018/2019'},
  'stadium_name': None},
 {'away_score': 1,
  'away_team': {'away_team_id': 967, 'away_team_name': 'Everton LFC'},
  'competition': {'competition_id': 37,
   'competition_name': "FA Women's Super League",
   'country_name': 'England'},
  'data_version': '1.0.3',
  'home_score': 3,
  'home_team': {'home_team_id': 746, 'home_team_name': 'Manchester City WFC'},
  'kick_off': '19:00:00.000',
  'last_updated': '2019-02-24T18:34:00.333745',
  'match_date': '2019-02-20',
  'match_id': 19798,
  'match_status': 'available',
  'referee_name': None,
  'season': {'season_id': 4, 'season_name': '2018/2019'},
  'stadium_name': None},
 {'away_score': 1,
  'away_team': {'away_team_id': 970, 'away_team_name': 'Yeovil Town LFC'},
  'competition': {'competition_id': 37,
   'competition_name': "FA Women's Super League",
   'country_name': 'England'},
  'data_version': '1.1.0',
  'home_score': 0,
  'home_team': {'home_team_id': 967, 'home_team_name': 'Everton LFC'},
  'kick_off': '15:00:00.000',
  'last_updated': '2019-04-06T11:40:49.819570',
  'match_date': '2019-03-31',
  'match_id': 19808,
  'match_status': 'available',
  'referee_name': None,
  'season': {'season_id': 4, 'season_name': '2018/2019'},
  'stadium_name': None},
 {'away_score': 0,
  'away_team': {'away_team_id': 967, 'away_team_name': 'Everton LFC'},
  'competition': {'competition_id': 37,
   'competition_name': "FA Women's Super League",
   'country_name': 'England'},
  'data_version': '1.0.3',
  'home_score': 1,
  'home_team': {'home_team_id': 969, 'home_team_name': 'Birmingham City WFC'},
  'kick_off': '15:00:00.000',
  'last_updated': '2018-10-31T21:28:32.325274',
  'match_date': '2018-09-09',
  'match_id': 19718,
  'match_status': 'available',
  'referee_name': None,
  'season': {'season_id': 4, 'season_name': '2018/2019'},
  'stadium_name': 'Damson Park'},
 {'away_score': 0,
  'away_team': {'away_team_id': 970, 'away_team_name': 'Yeovil Town LFC'},
  'competition': {'competition_id': 37,
   'competition_name': "FA Women's Super League",
   'country_name': 'England'},
  'data_version': '1.0.3',
  'home_score': 4,
  'home_team': {'home_team_id': 974, 'home_team_name': 'Reading WFC'},
  'kick_off': '15:00:00.000',
  'last_updated': '2019-01-21T13:36:08.988224',
  'match_date': '2018-09-09',
  'match_id': 19716,
  'match_status': 'available',
  'referee_name': None,
  'season': {'season_id': 4, 'season_name': '2018/2019'},
  'stadium_name': 'Adams Park'}]

We haven't spoken about git yet, so for now this is how we will get our files. In the above I've taken each file in the matches folder and appended it to a master_data list -> If you ran the above without [0][:5] at the end of the last line, you will see two [[ at the start of the output. We can get the first match with the following:

In [20]:
master_data[0][0] # The first zero returns our first list, the second zero returns our first match
Out[20]:
{'away_score': 0,
 'away_team': {'away_team_id': 970, 'away_team_name': 'Yeovil Town LFC'},
 'competition': {'competition_id': 37,
  'competition_name': "FA Women's Super League",
  'country_name': 'England'},
 'data_version': '1.0.3',
 'home_score': 3,
 'home_team': {'home_team_id': 968, 'home_team_name': 'Arsenal WFC'},
 'kick_off': '19:30:00.000',
 'last_updated': '2019-02-24T01:33:44.140837',
 'match_date': '2019-02-20',
 'match_id': 19797,
 'match_status': 'available',
 'referee_name': None,
 'season': {'season_id': 4, 'season_name': '2018/2019'},
 'stadium_name': None}

Let's see what teams we have in the first file:

In [21]:
teams = []

for match in master_data[0]:
    teams.append(match['home_team']['home_team_name'])
teams = sorted(list(set(teams))) # we can use set to drop duplicates and sorted to sort our list
teams
Out[21]:
['Arsenal WFC',
 'Birmingham City WFC',
 'Brighton & Hove Albion WFC',
 'Bristol City WFC',
 'Chelsea LFC',
 'Everton LFC',
 'Liverpool WFC',
 'Manchester City WFC',
 'Reading WFC',
 'West Ham United LFC',
 'Yeovil Town LFC']

We'll use the match we returned by running master_data[0][0] for our next example. We already know that the data is saved in a json format and what the base url is, so we can use BeautifulSoup again to directly access the data

In [23]:
match_id = str(master_data[0][0]['match_id'])
match_url = base_url+GET_LINK_TO_EVENTS_FOLDER_BY_GOING_TO_STATBOMB_LINK_ABOVE+match_id+".json"
response = requests.get(match_url)
match_data = response.json()

Again, I don't want to show the data for the full match so I will just show the first 5 events. If we run match_data we will return the whole file which begins with [ -> meaning we are working with a list. The first couple of events cover lineups and match start, so I will start from the fourth position in the list.

In [24]:
match_data[4:10]
Out[24]:
[{'duration': 1.082172,
  'id': '040083bc-7200-4726-b1a9-76f4144e55ef',
  'index': 5,
  'location': [61.0, 41.0],
  'minute': 0,
  'off_camera': False,
  'pass': {'angle': 2.714965,
   'body_part': {'id': 40, 'name': 'Right Foot'},
   'end_location': [50.0, 46.0],
   'height': {'id': 1, 'name': 'Ground Pass'},
   'length': 12.083046,
   'recipient': {'id': 15716, 'name': 'Emily Syme'},
   'type': {'id': 65, 'name': 'Kick Off'}},
  'period': 1,
  'play_pattern': {'id': 9, 'name': 'From Kick Off'},
  'player': {'id': 21290, 'name': 'Erin Bloomfield'},
  'position': {'id': 23, 'name': 'Center Forward'},
  'possession': 2,
  'possession_team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'related_events': ['6199cdbc-7f79-4bb9-bfd6-792fc466c745'],
  'second': 1,
  'team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'timestamp': '00:00:01.182',
  'type': {'id': 30, 'name': 'Pass'}},
 {'id': '6199cdbc-7f79-4bb9-bfd6-792fc466c745',
  'index': 6,
  'location': [50.0, 46.0],
  'minute': 0,
  'off_camera': False,
  'period': 1,
  'play_pattern': {'id': 9, 'name': 'From Kick Off'},
  'player': {'id': 15716, 'name': 'Emily Syme'},
  'position': {'id': 13, 'name': 'Right Center Midfield'},
  'possession': 2,
  'possession_team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'related_events': ['040083bc-7200-4726-b1a9-76f4144e55ef'],
  'second': 2,
  'team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'timestamp': '00:00:02.264',
  'type': {'id': 42, 'name': 'Ball Receipt*'}},
 {'duration': 0.920238,
  'id': '3de4ded6-b208-4fec-9afc-4e57113324ed',
  'index': 7,
  'location': [68.0, 33.0],
  'minute': 0,
  'off_camera': False,
  'period': 1,
  'play_pattern': {'id': 9, 'name': 'From Kick Off'},
  'player': {'id': 15623, 'name': 'Vivianne Miedema'},
  'position': {'id': 23, 'name': 'Center Forward'},
  'possession': 2,
  'possession_team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'related_events': ['33d0c70d-7c0a-40c9-b540-03c1af99d82a'],
  'second': 2,
  'team': {'id': 968, 'name': 'Arsenal WFC'},
  'timestamp': '00:00:02.433',
  'type': {'id': 17, 'name': 'Pressure'}},
 {'duration': 1.216147,
  'id': '33d0c70d-7c0a-40c9-b540-03c1af99d82a',
  'index': 8,
  'location': [49.0, 46.0],
  'minute': 0,
  'off_camera': False,
  'pass': {'angle': -2.819842,
   'body_part': {'id': 40, 'name': 'Right Foot'},
   'end_location': [34.0, 41.0],
   'height': {'id': 1, 'name': 'Ground Pass'},
   'length': 15.811388,
   'recipient': {'id': 15715, 'name': 'Ellie Mason'}},
  'period': 1,
  'play_pattern': {'id': 9, 'name': 'From Kick Off'},
  'player': {'id': 15716, 'name': 'Emily Syme'},
  'position': {'id': 13, 'name': 'Right Center Midfield'},
  'possession': 2,
  'possession_team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'related_events': ['3de4ded6-b208-4fec-9afc-4e57113324ed',
   '7653bc09-9fe7-4806-b6b1-1b27e13b3ab3'],
  'second': 2,
  'team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'timestamp': '00:00:02.733',
  'type': {'id': 30, 'name': 'Pass'},
  'under_pressure': True},
 {'id': '7653bc09-9fe7-4806-b6b1-1b27e13b3ab3',
  'index': 9,
  'location': [34.0, 41.0],
  'minute': 0,
  'off_camera': False,
  'period': 1,
  'play_pattern': {'id': 9, 'name': 'From Kick Off'},
  'player': {'id': 15715, 'name': 'Ellie Mason'},
  'position': {'id': 4, 'name': 'Center Back'},
  'possession': 2,
  'possession_team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'related_events': ['33d0c70d-7c0a-40c9-b540-03c1af99d82a'],
  'second': 3,
  'team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'timestamp': '00:00:03.949',
  'type': {'id': 42, 'name': 'Ball Receipt*'}},
 {'duration': 1.436742,
  'id': '3a1af4e5-3758-443e-bea1-e979b89254c9',
  'index': 10,
  'location': [85.0, 41.0],
  'minute': 0,
  'off_camera': False,
  'period': 1,
  'play_pattern': {'id': 9, 'name': 'From Kick Off'},
  'player': {'id': 15623, 'name': 'Vivianne Miedema'},
  'position': {'id': 23, 'name': 'Center Forward'},
  'possession': 2,
  'possession_team': {'id': 970, 'name': 'Yeovil Town LFC'},
  'related_events': ['e162776b-540e-4080-a05a-efd264b523fe'],
  'second': 4,
  'team': {'id': 968, 'name': 'Arsenal WFC'},
  'timestamp': '00:00:04.516',
  'type': {'id': 17, 'name': 'Pressure'}}]

We can have a look at a random event:

In [25]:
match_data[147]
Out[25]:
{'duration': 1.5818,
 'id': '0943608d-6fd3-42e0-ac23-f3980fe663b4',
 'index': 148,
 'location': [74.0, 54.0],
 'minute': 3,
 'off_camera': False,
 'pass': {'angle': -1.2966288,
  'body_part': {'id': 38, 'name': 'Left Foot'},
  'end_location': [83.0, 22.0],
  'height': {'id': 1, 'name': 'Ground Pass'},
  'length': 33.24154,
  'recipient': {'id': 15619, 'name': 'Bethany Mead'}},
 'period': 1,
 'play_pattern': {'id': 1, 'name': 'Regular Play'},
 'player': {'id': 10658, 'name': 'Danielle van de Donk'},
 'position': {'id': 13, 'name': 'Right Center Midfield'},
 'possession': 9,
 'possession_team': {'id': 968, 'name': 'Arsenal WFC'},
 'related_events': ['319f2cc0-f478-4e6c-8631-94e11b1d840d',
  'dbd8e526-67ea-417b-af6e-d3fdce78a32a'],
 'second': 25,
 'team': {'id': 968, 'name': 'Arsenal WFC'},
 'timestamp': '00:03:25.483',
 'type': {'id': 30, 'name': 'Pass'},
 'under_pressure': True}

and get the x,y locations:

In [26]:
print("x value is: {}".format(match_data[147]["location"][0]))
print("y value is: {}".format(match_data[147]["location"][1]))
x value is: 74.0
y value is: 54.0

As I already have a parser for Statsbomb available on GitHub, I won't go further here. But the above should give you a good idea about how to get and work with json data. As I mentioned, I want to stay away from pandas here and work only with the data as we got it. As a mini-project, you could see how you can create a list of data for a specific player's shots (or for a team) - specifically x,y values, player/team name, xG value, and shot type and using my other tutorials see how you can plot the data. Please note though, that statsbomb have different pitch measurements and this will need to be adjusted to use my tutorials as they are. Also, if you post any viz from statsbomb data, take a look again at the T&Cs, always give full credit, and say thank you for making this kind of data open for all!