๐ก Try It Yourself: Run this app on Streamlit cloud and predict the GLD price for tomorrow!
Gold, as we all know, is one of the most valued precious metals. Everybody wants to lay their hands on it. Unfortunately, it is also one of the most scarce resources on earth. Thanks to the financial market, traders can invest directly in gold without needing to own it.
As GLD is traded in the financial market, predicting where the Gold price is headed is in high demand, especially if the predictions are good. ๐ฅ
In this tutorial, we will create a Streamlit app that uses a Machine Learning model to make such predictions. The model will predict the next day’s Gold price using the past Gold ETF (GLD) prices.
Creating practical coding projects like this is one of the best ways we can improve our Python skills. Do not take this as financial advice. Trading is risky and should be done with full financial market knowledge.
The Model
Let’s first download the data using the Yahoo Finance Python module.
import matplotlib.pyplot as plt import pandas as pd import yfinance as yf data = yf.download('GLD', '2008-01-01', '2023-01-01', auto_adjust=True)
We fetch GLD ETF price data for the past 15 years and store it in the data
variable. Next, we will take only the column we need and store it in a separate variable.
df = data[['Close']] plt.style.use(โclassicโ) data.Close.plot(figsize=(10,7), color='r') plt.ylabel("Gold ETF Prices") plt.title("Gold ETF Price Series") plt.show()
Let’s add some valuable features using the Close data to help in improving the performance of the model. We will add in some rolling means.
df['weekly_mean'] = df.Close.rolling(window=7).mean() df['monthly_mean'] = df.Close.rolling(window=30).mean() df['quarterly_mean'] = df.Close.rolling(window=90).mean() df['yearly_mean'] = df.Close.rolling(window-365).mean()
With the above, the model can evaluate current prices against recent ones.
Next, we will shift the Close data using the DataFrame.shift()
method to move all rows forward one day. Remember, we are predicting the next day’s Gold price. So, we shift the data to avoid using the same day to make predictions. Then, we drop all null values.
df['next_day_price'] = data.Close.shift(-1) df = df.dropna()
Next, we divide the dataset into dependent and independent variables. The dependent variable is the Gold ETF price we want to predict and the independent variable is used to predict the dependent variable.
X = df[['weekly_mean', 'monthly_mean', 'quarterly_mean', 'yearly_mean']] target = df['next_day_price']
We will normalize the data for stable and fast training of the model. Then, we split the data into train and test data using 80% for training and the remaining one for testing.
scaler = StandardScaler() features = scaler.fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=.2, random_state=0)
We don’t know which model will perform well for this dataset
Hence, we will evaluate them using different models, then, select the one with the lowest Mean Absolute Error (MAE) score.
from sklearn.linear_model import LinearRegression from sklearn.ensemble import RandomForestRegressor from sklearn.neighbors import KNeighborsRegressor from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_absolute_error model = [ LinearRegression(), KNeighborsRegressor(), RandomForestRegressor(), DecisionTreeRegressor(), ] for i in range(5): model[i].fit(X_train, y_train) preds = model[i].predict(X_test) print(mean_absolute_error(y_test, preds))
Output:
1.6647659694002805
1.2136806728201068
1.1876351184761191
1.4823716304661763
It’s quite obvious that the Random Forest model has the lowest MAE score. We will select the model. Create a model.py
file and add the following to it.
import yfinance as yf import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestRegressor import joblib import warnings warnings.filterwarnings('ignore') data = pd.read_csv('gold.csv') # select the Close data df = data[['Close']] # add extra features df['weekly_mean'] = data.Close.rolling(window=7).mean() df['monthly_mean'] = data.Close.rolling(window=30).mean() df['quarterly_mean'] = data.Close.rolling(window=90).mean() df['yearly_mean'] = data.Close.rolling(window=365).mean() # add the target variable df['next_day_price'] = data.Close.shift(-1) df = df.dropna() # define independent variable X = df[['weeekly_mean', 'monthly_mean', 'quarterly_mean', 'yearly_mean']] # define dependent variable target = df['next_day_price'] # normalize the data scaler = StandardScaler() features = scaler.fit_transform(X) # split data into train and test X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=.2, random_state=0) # define xgboost model model = RandomForestRegressor() model.fit(X_train, y_train) joblib.dump(model, 'model.pkl')
We save the data fetched using Yahoo Finance so that we can easily import it. Check my GitHub page for the full code. We also save the pickled model to model.pkl
to be used in making predictions.
๐ก Recommended: How to Save and Load Machine Learning Models in Python
Predicting the Gold ETF Price ๐ฎ
To predict the next dayโs Gold ETF price, we simply repeat the previous steps. Then, use the pickled model to make the prediction. Create another file in the folder and name it app.py
. This is for the Streamlit application.
import streamlit as st import pandas as pd import numpy as np import joblib import datetime from sklearn.ensemble import RandomForestRegressor import yfinance as yf from sklearn.preprocessing import StandardScaler import warnings warnings.filterwarnings('ignore') def main(): option = st.sidebar.selectbox('Make a choice', ['Visualize','Recent Data', 'Predict']) data = download_data() if option == 'Visualize': visualize_data(data) elif option == 'Recent Data': dataframe(data) else: predict(data) @st.cache_resource def download_data(): df = yf.download('GLD', start='2008-01-01', end=datetime.now(), progress=False) return df
I made this app to be easy. The user is not required to input any value other than to select and view the options. Everything is done behind the scene. We first call the download_data()
function to fetch, and return the Gold price data.
We then send the result as a parameter to the other three callback functions.
The decorator @st.cache_resource
caches the result gotten, thus saving us from making repeated calls to the Yahoo Finance API. We fetch the data from 2008 to the current date to be used to predict next dayโs price.
scaler = StandardScaler() model = joblib.load('model.pkl')
We then load the pickled model. Remember that we normalized the trained model. So, we have to do the same for this data.
The next two functions are quite simple. We use a line chart to visualize the Close price.
def visualize_data(data): st.header('The Close Price') st.line_chart(data.Close)
To see recent data, we call the next function which displays the last 10 rows.
def dataframe(data): st.header('Recent Data') st.dataframe(data.tail(10))
The predict()
function is where we repeated the previous steps to train the model.
def predict(data): df = data[['Close']] df['weekly_mean'] = df.Close.rolling(window=7).mean() df['monthly_mean'] = df.Close.rolling(window=30).mean() df['quarterly_mean'] = df.Close.rolling(window=90).mean() df['yearly_mean'] = df.Close.rolling(window=365).mean() df = df.dropna() # forecast the price features = df[['weekly_mean', 'monthly_mean', 'quarterly_mean', 'yearly_mean']].values scaler = StandardScaler() features = scaler.fit_transform(features) df['predicted_gold_price'] = model.predict(features) df['signal'] = np.where(df.predicted_gold_price.shift(1) < df.predicted_gold_price,"Buy","No Position") prediction = df.tail(1)[['signal','predicted_gold_price']].T st.header('Gold Price Prediction') st.write("Today's Price") st.dataframe(data.Close.tail(1)) st.write('Next Day Predicted Price') st.dataframe(prediction)
We add a signal column that compares the predicted results to determine whether we should buy or not.
The above screenshot is taken from the app running on Streamlit Cloud. It’s telling us that Gold price is falling. It also displays the predicted price. This may not be the case in real life. But at least, we have an idea that the price is falling.
Conclusion
That’s how I created the Gold price prediction app and have it running on Streamlit Cloud. The full code is available on my GitHub page. No doubt, you have benefited from this project tutorial. We have seen how we used the trained model to make predictions.
Use this knowledge to create your app, and improve your Python skills.
๐ก Recommended: How I Built a Readability and Grammar Checker App Using Streamlit