How I Built a Gold Price Prediction App Using Streamlit 🔮

💡 Try It Yourself: Run this app on Streamlit cloud and predict the GLD price for tomorrow!

Gold, as we all know, is one of the most valued precious metals. Everybody wants to lay their hands on it. Unfortunately, it is also one of the most scarce resources on earth. Thanks to the financial market, traders can invest directly in gold without needing to own it.

As GLD is traded in the financial market, predicting where the Gold price is headed is in high demand, especially if the predictions are good. 🥇

In this tutorial, we will create a Streamlit app that uses a Machine Learning model to make such predictions. The model will predict the next day’s Gold price using the past Gold ETF (GLD) prices.

Creating practical coding projects like this is one of the best ways we can improve our Python skills. Do not take this as financial advice. Trading is risky and should be done with full financial market knowledge.

The Model

Let’s first download the data using the Yahoo Finance Python module.

import matplotlib.pyplot as plt
import pandas as pd
import yfinance as yf
data = yf.download('GLD', '2008-01-01', '2023-01-01', auto_adjust=True)

We fetch GLD ETF price data for the past 15 years and store it in the data variable. Next, we will take only the column we need and store it in a separate variable.

df = data[['Close']]
plt.style.use(‘classic’)
data.Close.plot(figsize=(10,7), color='r')
plt.ylabel("Gold ETF Prices")
plt.title("Gold ETF Price Series")
plt.show()

Let’s add some valuable features using the Close data to help in improving the performance of the model. We will add in some rolling means.

df['weekly_mean'] = df.Close.rolling(window=7).mean()
df['monthly_mean'] = df.Close.rolling(window=30).mean()
df['quarterly_mean'] = df.Close.rolling(window=90).mean()
df['yearly_mean'] = df.Close.rolling(window-365).mean()

With the above, the model can evaluate current prices against recent ones.

Next, we will shift the Close data using the DataFrame.shift() method to move all rows forward one day. Remember, we are predicting the next day’s Gold price. So, we shift the data to avoid using the same day to make predictions. Then, we drop all null values.

df['next_day_price'] = data.Close.shift(-1)
df = df.dropna()

Next, we divide the dataset into dependent and independent variables. The dependent variable is the Gold ETF price we want to predict and the independent variable is used to predict the dependent variable.

X = df[['weekly_mean', 'monthly_mean', 'quarterly_mean', 'yearly_mean']]
target = df['next_day_price']

We will normalize the data for stable and fast training of the model. Then, we split the data into train and test data using 80% for training and the remaining one for testing.

scaler = StandardScaler()
features = scaler.fit_transform(X)
 
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=.2, random_state=0)

We don’t know which model will perform well for this dataset

Hence, we will evaluate them using different models, then, select the one with the lowest Mean Absolute Error (MAE) score.

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error


model = [
    LinearRegression(),
    KNeighborsRegressor(),
    RandomForestRegressor(),
    DecisionTreeRegressor(),
]
for i in range(5):
    model[i].fit(X_train, y_train)
    preds = model[i].predict(X_test)
    print(mean_absolute_error(y_test, preds))

Output:

1.6647659694002805
1.2136806728201068
1.1876351184761191
1.4823716304661763

It’s quite obvious that the Random Forest model has the lowest MAE score. We will select the model. Create a model.py file and add the following to it.

import yfinance as yf
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
import joblib
import warnings

warnings.filterwarnings('ignore')


data = pd.read_csv('gold.csv')
# select the Close data
df = data[['Close']]

# add extra features
df['weekly_mean'] = data.Close.rolling(window=7).mean()
df['monthly_mean'] = data.Close.rolling(window=30).mean()
df['quarterly_mean'] = data.Close.rolling(window=90).mean()
df['yearly_mean'] = data.Close.rolling(window=365).mean()

# add the target variable
df['next_day_price'] = data.Close.shift(-1)
df = df.dropna()

# define independent variable
X = df[['weeekly_mean', 'monthly_mean', 'quarterly_mean', 'yearly_mean']]

# define dependent variable
target = df['next_day_price']

# normalize the data
scaler = StandardScaler()
features = scaler.fit_transform(X)

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=.2, random_state=0)

# define xgboost model
model = RandomForestRegressor()
model.fit(X_train, y_train)

joblib.dump(model, 'model.pkl')

We save the data fetched using Yahoo Finance so that we can easily import it. Check my GitHub page for the full code. We also save the pickled model to model.pkl to be used in making predictions.

Predicting the Gold ETF Price 🔮

To predict the next day’s Gold ETF price, we simply repeat the previous steps. Then, use the pickled model to make the prediction. Create another file in the folder and name it app.py. This is for the Streamlit application.

import streamlit as st
import pandas as pd
import numpy as np
import joblib
import datetime
from sklearn.ensemble import RandomForestRegressor
import yfinance as yf
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')


def main():
    option = st.sidebar.selectbox('Make a choice', ['Visualize','Recent Data', 'Predict'])
    data = download_data()
    if option == 'Visualize':
        visualize_data(data)
    elif option == 'Recent Data':
        dataframe(data)
    else:
        predict(data)

@st.cache_resource
def download_data():
    df = yf.download('GLD', start='2008-01-01', end=datetime.now(), progress=False)
    return df

I made this app to be easy. The user is not required to input any value other than to select and view the options. Everything is done behind the scene. We first call the download_data() function to fetch, and return the Gold price data.

We then send the result as a parameter to the other three callback functions.

The decorator @st.cache_resource caches the result gotten, thus saving us from making repeated calls to the Yahoo Finance API. We fetch the data from 2008 to the current date to be used to predict next day’s price.

scaler = StandardScaler()
model = joblib.load('model.pkl')

We then load the pickled model. Remember that we normalized the trained model. So, we have to do the same for this data.

The next two functions are quite simple. We use a line chart to visualize the Close price.

def visualize_data(data):
    st.header('The Close Price')
    st.line_chart(data.Close)

To see recent data, we call the next function which displays the last 10 rows.

def dataframe(data):
    st.header('Recent Data')
    st.dataframe(data.tail(10))

The predict() function is where we repeated the previous steps to train the model.

def predict(data):
    df = data[['Close']]

    df['weekly_mean'] = df.Close.rolling(window=7).mean()
    df['monthly_mean'] = df.Close.rolling(window=30).mean()
    df['quarterly_mean'] = df.Close.rolling(window=90).mean()
    df['yearly_mean'] = df.Close.rolling(window=365).mean()

    df = df.dropna()
    # forecast the price

    features = df[['weekly_mean', 'monthly_mean', 'quarterly_mean', 'yearly_mean']].values

    scaler = StandardScaler()
    features = scaler.fit_transform(features)

    df['predicted_gold_price'] = model.predict(features)
    df['signal'] = np.where(df.predicted_gold_price.shift(1) < df.predicted_gold_price,"Buy","No Position")

    prediction = df.tail(1)[['signal','predicted_gold_price']].T
    st.header('Gold Price Prediction')
    st.write("Today's Price")
    st.dataframe(data.Close.tail(1))
    st.write('Next Day Predicted Price')
    st.dataframe(prediction)

We add a signal column that compares the predicted results to determine whether we should buy or not.

The above screenshot is taken from the app running on Streamlit Cloud. It’s telling us that Gold price is falling. It also displays the predicted price. This may not be the case in real life. But at least, we have an idea that the price is falling.

Conclusion

That’s how I created the Gold price prediction app and have it running on Streamlit Cloud. The full code is available on my GitHub page. No doubt, you have benefited from this project tutorial. We have seen how we used the trained model to make predictions.

Use this knowledge to create your app, and improve your Python skills.