How I Built an English Premier League Scores Tracker with Streamlit

The English Premier League is the most-watched football competition in the world. It is full of suspense, breathtaking, and unpredictable events. Few competitions have witnessed the aggression, excitement, and burst of energy seen in the Premier League. No wonder why it has drawn millions of football fans worldwide.

So, you can take advantage of that by building an app that involves the Premier League. This can be a prediction app or something else.

💡 In this tutorial, we will build a score tracker app using Python to keep track of the Premier League scores. The app will add features such as the Premier League table, top scorers, and assists.

This app does not make use of an API. We only performed web scraping to get real-time updates. However, those updates will manifest themselves only when you refresh the app. Also, the app does not have many features. Using an API, you can be able to add as many features as you like.

There are many reasons I chose not to use an API. Apart from the restrictions that come from using free API, we will not achieve the purpose of this tutorial which is to improve your problem-solving skills as well as your Python skills. Not only are we going to learn how to perform web scraping, you will also see how we dealt with an unexpected problem that results from web scraping.

Getting Started

There will be four Python scripts in this tutorial. I suggest you create a folder to keep those files. Many programmers find it ideal to create a virtual environment for a given project. It is completely optional as far as this project is concerned.

Create a file and call it app.py. This will be the main file. As always, you can check my GitHub page for the full code. Since I will be explaining everything step by step, I will only import the module that has a bearing on that particular snippet. Of course, no repetition.

We will start by creating the main() function, our main function. We use Streamlit’s selectbox() method to add features for users to select. Each option has a callback function.

import streamlit as st

def main():
    st.sidebar.header('English Premier League')
    st.sidebar.info('Welcome to the English Premier League. Choose the options below')
    option = st.sidebar.selectbox('Choose your option', ['Scores', 'Table', 'Stats'])
    if option == 'Scores':
        get_scores()
    elif option == 'Table':
        get_table()
    else:
        get_stats()

We import the Streamlit module which makes everything we are doing here possible. The app will have only three features. You are free to include more features. But more features mean more code snippets.

💡 Recommended: Basketball Statistics – Page Scraping Using Python and BeautifulSoup

Premier League Live Scores

If the ‘Scores’ option is selected, the get_scores() function will be called.

from fixtures import df, club_names

def get_scores():
    st.header('English Premier League LiveScores')
    st.info('Check the latest fixtures')
    op = st.radio('Which fixtures do you want to check?', ['All', 'Team'])
    if op == 'All':
        st.dataframe(df)
    else:
        team = st.selectbox('Select a team', club_names)
        team_select = df[(df.Home == team) | (df.Away == team)]
        st.dataframe(team_select)

The get_scores() function has a radio button with two options. The first will display the most recent EPL scores and fixtures while the second will display only the scores and fixtures of a given team. This should not be compared to other football websites. The aim here is to see how we use Python to make all this possible.

Notice we import df and teams from the fixtures.py file. The df variable is a Pandas dataset with home and away columns. The or operator same as | was used to filter and select the rows that match the given team’s name whether found in the home or away column. To better understand what I’m saying, let’s first examine how we got the df variable.

Create another file and call it fixtures.py.

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

link = f'https://onefootball.com/en/competition/premier-league-9/fixtures'
# getting data
source = requests.get(link).text
# scraping the data
page = bs(source, 'lxml')

# searching for the date and time
dateTime = page.find_all('div', class_='simple-match-card__match-content')
date_time = []

for i in range(len(dateTime)):
    date_time.append(dateTime[i].text.strip())

#  searching for the teams

team = page.find_all('span',  class_='simple-match-card-team__name')
teams = []
for i in range(len(team)):
    teams.append(team[i].text.strip())

# searching for the scores data
score = page.find_all('span', class_='simple-match-card-team__score')
scores = []
for i in range(len(score)):
    scores.append(score[i].text.split())

We use the requests module to get the content from the given website. After scraping it using BeautifulSoup we search for a div tag with the given CSS class. Then, we select the text leaving the HTML tags behind, and append it to the date_time variable.

💡 Note: To find the CSS name, go to developer tools using Chrome browser.

We get something like this:

['17/03/2023  Full time',
 '18/03/2023  Full time',
 '18/03/2023  Full time',
 '18/03/2023  Full time',
 '18/03/2023  Full time',
 '18/03/2023  Full time',
 '19/03/2023  Full time',
 '03/05/2023  20:00',
 '03/05/2023  20:00',
 '04/05/2023  20:00',
 …

We search for the teams, repeat the same process, and store it in the teams variable. Here is what we have:

['Nottingham Forest', 'Newcastle United', 'Aston Villa', 'Bournemouth', 'Brentford', 'Leicester', 'Southampton', 'Tottenham', 'Wolves', 'Leeds', …

Repeating the same for the scores data, we append to the scores variable. Here is also what we have:

[['1'], ['2'], ['3'], ['0'], ['1'], ['1'], ['3'], ['3'], …

We have already run into a problem.

The First Problem

When I was building this project, I initially search for a tag that combines all the variables at once. I got something like this for each row:

['Nottingham Forest, 1, Newcastle United, 2, 17/03/2023 Full time'],

Displaying to our users something like this is not nice at all. Just look at the above image. Isn’t that nice? But the strip() method couldn’t let me separate them. Then, I applied the split() method. It separated them and it was something like this:

['Nottingham', 'Forest', '1', 'Newcastle', 'United', '2', '17/03/2023', 'Full', 'time'],

This even worsens the situation. How am I going to deal with clubs with double names? What about when matches are postponed? What about when clubs with one single name and others with double names are playing? How am I going to display these to dynamically combine the double names without any problem?

To add salt to the injury, while still contemplating how to resolve this issue, the above HTML structure changed completely when the match kickoff.

I almost gave up on this project. Remember, I could have used API but I don’t want to. Not in this project. An idea came up, why can’t I search for separate HTML elements for each of these? That is what I did to get the three variables: date_time, teams, and scores.

The Second Problem

Looking at the teams variable, we finally get teams with double names altogether, not separated. However, I was expecting it to be something like this:

['Nottingham Forest', 'Newcastle United', 'Aston Villa', 'Bournemouth']

so that we can work on it easily. How am I going to know which of these teams are playing together? What about when matches are postponed?

The Solution

I have to find a workaround by applying another problem-solving skill, this time, programmatically. Let’s separate the team variable into home and away.

# separating the teams
home = []
for i in teams[0::2]:
    home.append(i)

away = []
for i in teams[1::2]:
    away.append(i)

Since the home teams came first and the away teams second, we apply the slicing technique to pick the next two names in the list. Very easy. If you want to know more about slicing in Python, check out this excellent book written by Dr. Christian Mayer. It’s a must-have for any Python programmer and it’s also free.

# separating the scores
home_scores = []
away_scores = []

for i in scores[0::2]:
    home_scores.append(i)

for i in scores[1::2]:
    away_scores.append(i)

We repeated the same to the scores variable. Then we combine all the new variables into a Pandas DataFrame.

# converting to Pandas DataFrame

df = pd.DataFrame({'Home': home, 'HG': home_scores, 'AG': away_scores, 'Away': away, 'Date_Time': date_time})

We decided to leave the date_time variables the way it is instead of separating them. Alright, the job is done.

The Premier League Table

Back to the main() function, to get the Premier League table is very easy. Please check my GitHub page for other things I added to the table.py file which we have to create.

After scraping the website, I got this saved in the tab variable:

[['1', 'Arsenal', '28', '22', '3', '3', '40', '69'],
 ['2', 'Manchester', 'City', '27', '19', '4', '4', '42', '61'],
 ['3', 'Manchester', 'United', '26', '15', '5', '6', '6', '50'],
 ['4', 'Tottenham', '28', '15', '4', '9', '12', '49'],
 ['5', 'Newcastle', 'United', '26', '12', '11', '3', '20', '47'],
...

Here is another problem. How are we to combine a club’s double names?

Let’s find the length of each row.

for i in table:
   print(len(i))

We got an interesting fact. The teams with double names are of the length 9 while the rest are 8. This makes it easy to combine the club with double names.

dd = []
df = []
for i in tab:
        if len(i) == 8:
            dd.append(i[1:])
        elif len(i) == 9:
            dd.append([f'{i[1]} {i[2]}', i[3], i[4], i[5], i[6], i[7], i[8]])
        df.append(i[0])
    return dd, df

Notice how we use f-strings to combine the clubs with double names. Your job as a developer includes learning how to deal with challenges, or roadblocks and find a workaround.

The Stats

For the stats.py file, what will be done to get EPL stats are similar to what we did before. But based on the site we scrape or should I say based on the information we are looking for; the text is well arranged.

It is just exactly the way we want it. So we simply use Pandas.read_html() to display the top scorers, assists, and all-time data.

Conclusion

We finally came to the end of this project exercise. You now have an app ready to be deployed on Streamlit Cloud. Simply move the files from your system to GitHub. Create an account on Streamlit, then follow the instructions to deploy the app.

You will see other things on my GitHub page that makes the deployment as seamless as possible.

Mine is already deployed. Check it out link to football prediction app. What we did is just the tip of the iceberg of what we can do with Python. However, learning this can be a good start in your programming journey. By working on projects, you are sharpening your programming skills.

💡 Recommended: How I Created a Football Prediction App on Streamlit