A few weeks back I posted on twitter an initial concept for a data collection app that I want to make publicly available. Since then I’ve redone a lot of it and switched over from Bokeh to Django due to scalability and functionality. Today I want to go through some of my thoughts and give you some more information about the app.
so this is still a few weeks away from being available, but I will have a web app that will allow you to collect your own xG data for most leagues. There will also be a scratch pad for non team stuff, like if you want to know what the xG of your Sunday League world was 😉 pic.twitter.com/DpU4Y5cMAv— Peter McKeever Ô_ō (@petermckeever) March 6, 2019
OPEN SOURCE FOOTBALL UPDATE:— Peter McKeever Ô_ō (@petermckeever) March 27, 2019
Not the finished product, but I choose some random inputs to give you all an idea of how it looks. There is an editable table for users incase they make a mistake (like I did) and need to edit a field/delete a row :). pic.twitter.com/jzn7KFW8Jw
Over the past ten or fifteen years there has been an explosion in research and data surrounding sport. Football was late to the party compared to American sports, but it is catching up. With company’s like Opta, Sportradar, InStat, Wyscout, Statsbomb, Stratabet(RIP), there are a myriad of firms pumping out data for clubs and media. However, when it comes to the fans there is barrier to entry, and that barrier is a financial one. One season of data for one league from Opta costs in the region of €10,000. Unless you are a club or a broadcaster that’s not a pill you’d be willing to swallow. However, the demand for data for public consumption is there and is growing. Right now there is a thriving community of (f)analysts on twitter and message boards spending hours of their own time creating tables and charts with no financial incentive. While Statsbomb have done a great service by making some of their data publicly available, the leagues that they and most other collection companies cover are those that can be monetised in terms of consultancy and data services to clubs. This leaves a huge hole in leagues around the world that fans want data for.
Football is the people’s game. It does not care where you are from, what beliefs you hold or how much money you’ve got in your pocket. It is the great includer. Right now, how many kids around the world are cocking their leg for a shot screaming Messi or Ronaldo as they blast the ball at the jumpers or piles of cut grass for goal posts?
For me it was the left footed wonder that was Roberto Carlos. We were losing 7 – 5 and we only had 5 minutes before Patrick had to go in for dinner and bring his ball. It was a big match – Ashwood v Alpine – rival estates. There is a cluster of trees on the green between our estates and the winners got the rights to use the goal we made by nailing a plank across two trees to make a crossbar. We’d won a free kick. I picked my spot, took 15 step back, thought about where to hit the ball on the outside of my foot, imagined the goal keeper as a fat head Fabien Barthez, and copying the step, the shuffle and long stride of that free kick, belted the ball.
I skied it into someone’s garden and we lost the match and the goal for that day. But it was fun to imagine yourself as a pro.
I was thinking about this a couple of months ago and wondered what the xG of that chance could have been. Of course I’ll never actually know, but it set me down the path of creating a tool to allow anyone to get an expected goals value for a shot once they have a few pieces of information.
This started to grow into something bigger then. I got to thinking about all of those leagues around the world that have no coverage. They have the fans, the players, the academies, but no one providing this kind of data. I was thinking of teams in local or regional leagues, how cool it would be to have data on your local side, to tell your star striker not to worry, he/she’s just been unlucky, show them where there shooting from, help coaches help their players improve their game. I feel I’m not alone in this thinking so decided to put this out publicly. Football is a people’s game and the data should be too.
Introducing Open Source Football
Open source football is a platform that seeks to crowd source data collection and allow anyone to gather their own expected goals data. All you need to know is the location and context surrounding a shot to calculate and compile your own expected goals data. This can be used for any league, for competitive fixtures or even in training, and it’s goal is to support people doing football analytics as well as players and coaches playing the game we love. Let’s have a look.
The above image is the main interface of the application. Again this is still being developed so the menu and extra features have not been migrated onto the platform yet. Above the pitch there are a series of dropdowns to select the country, league, and home and away team of the match you want to collect data for. Below this is an area for setting the date and the time in the match a shot was taken. In the top right, you can select which side this was for and choose your player. For teams that I do not have squad sheets for, there will be a button to manually add a new player to the database.
Below this is our match contexts. The first one is goal difference. Goal difference here is relative. For example, if the score is 1 – 0 and you are recording a shot for the home team, this would be set to 1. However, if you are recording a shot for the away team, this should be set to -1.
The next four categories give us some context for the shot, what situation did it occur from; was the shot with the left or right foot, was it a header, or did it bounce off the player’s arse etc..; how was the shot assisted; and what was the outcome of the shot.
Outcome is not used in calculating the expected goal value of the shot, but it’s important information to have.
The last category is optional and is there to provide further context for a shot – these are things I want to analyse and reflect in the model with sufficient data as I feel it can have a huge impact on the value of an attempt. Did the shot come from a 1 v 1 situation? Did it come from a rebound? Did the player strike it first time, or take a touch before shooting? Did the player dribble past an opposing player before taking a shot etc.. You can select more than one option here, or none if none of them fit.
Next we have the pitch. All you need to do here is click the location where the shot came from and the x y location will be recorded (you can see this below the pitch on the left hand side) .
Once all of this information is collected, when you click Calculate xG a table will appear below.
I’m in the middle of re-running the expected goals models now that I have more data so this is not yet imported. For now the xG column just reads ‘xG’. So in this hypothetical match example, let’s say I forgot to untick the injury time box for Erdmann’s shot. I can either use the x button on the right to delete the shot entirely, or just edit the cell with error by clicking on it and making the necessary changes:
You’ll also notice an export button below the table. For now this just outputs the table in JSON format. However, on release there will be a button to download the data as JSON and one to download it as a csv file.
I’m doing some last bits of work on the backend. There will be a members area where users can access data tables which have been compiled by users. This is moving along nicely, however I’m working on some ideas to help keep the data as valid as possible and am open to suggestions.
My line of thinking at the moment is to accept all games into a database and after a match has been compiled by 5 (?) different users, aggregate the results and set its status to confirmed / verified. Of course this will not be possible for all leagues so I will be looking to establish contact with people within leagues to help out in verifying some data, or creating some kind of data share system like lower level clubs do with video.
Long term, I would like to expand this to make it possible to record all actions in a match, but that’s a much larger project I’m not prepared to take on by myself just yet.
What about the expected goals model?
Right now I would say that the model is indicative only. It was trained on data from some of top leagues in the world and some lower leagues, however I could not say that an average PL player taking a shot from 15 meters would have the same expected outcome as an Isthmian league player. Again, with your help in collecting this data this will improve over time.
The data collection interface will be completely free of charge. You are free to compile and download as many matches as you like.
If you are an amateur team, a youth set up, or a women’s team then please get in touch as I will have further free access available for you. While not planned to be implemented in version 1, future versions will include a tactics board allowing registered users to save squads, lineups, formations, and a tactics book.
I would also like to set up some form of reward system for users who compile a lot of games. I can’t confirm this until I have a few months of usage data to work with but I’d like to give key compilers access to a selected number of leagues.
What's not free?
Access to the database will be at a set membership fee. This will be tiered with higher tiers gaining access to more leagues. I want to keep the price down as much as possible, as I said, I would love this to be open source. However, maintaining this will be expensive and the costs will only grow as users do. I’d also like to grow this into something bigger and paid membership will help me do that.
I want this platform to remain completely advertisement free to make the user experience enjoyable for everyone, again paid memberships will allow me to do this and I hope this is something you also want and can support.
Further planned premium features for future versions include a viz maker to make communicating data quick and simple to create and share. The tactics board and lineup templates will also be premium features, but a non-saveable one will be available for
scroungers free users
How can I help?
I’m just one guy doing this on my own. I’ve added a donate button at the end this page. If you are able to, I would be grateful for any support you can give, no matter how small. I want to grow this into something larger and meaningful and your help will allow me to achieve that. Donators will also get a special mention in the about section of the platform if they wish, I will be in touch with each contributor to personally thank them too.