A simple sales forecasting tool using Pycaret

Photo by Campaign Creators

In one of my previous blog posts, I have used the NeuralProphet library for predicting stock prices. In this blog, I am going to predict the future sales of Rossman stores using the Pycaret python library.

Rossmann operates over 3,000 drug stores in 7 European countries. Store sales are influenced by many factors, including promotions, competition, school and state holidays, seasonality, and locality.

Pycaret

PyCaret is an open source, low-code machine learning library in Python that allows us to go from preparing your data to deploying your model within minutes. Pycaret is well suited for experienced data scientists who want to increase the productivity of their ML experiments by using PyCaret in their workflows or for those data scientists who are new to data science with little or no background in coding.

Dataset

Here, I have used data from a Kaggle competition. The dataset has the historical sales data for 1,115 Rossmann stores. We will only use the following fields for our task,

  • Date: date of observation
  • Sales: the turnover for any given day (this is what you are predicting)

Install Pycaret

We will be using the new time series module which is available in the beta version of Pycaret. You can install the beta version of Pycaret on your machine by running the following commands.

#!/bin/sh

# create a conda environment
conda create --name pycaret python=3.8

# activate conda environment
conda activate pycaret

# install pycaret
pip install pycaret-ts-alpha[full]

Install other dependencies

#!/bin/sh

pip install pandas

Import libraries

import pandas as pd
from pycaret.internal.pycaret_experiment import TimeSeriesExperiment

In macOS you might end up with library not loaded error. Run the following command to fix it.

#!/bin/sh

brew install libomp

Load data

sales_df = pd.read_csv('./data/train.csv')
sales_df.head()
pycaret-forecaster-1

For now we will be only considering sales amount for the prediction process.

sales_df = sales_df[['Date','Sales']]
sales_df.head()
pycaret-forecaster-2
sales_series = sales_df['Sales'].groupby(sales_df['Date']).sum()
sales_df = pd.DataFrame({'Date':sales_series.index, 'Sales':sales_series.values})
sales_df.info()
pycaret-forecaster-3

Now we will set Date column as index of the Dataframe and set its frequency to daily.

sales_df.index = pd.DatetimeIndex(sales_df["Date"])
sales_df.drop(["Date"],axis=1,inplace=True)
sales_df = sales_df.asfreq('d')
sales_df.index
pycaret-forecaster-4

Check if any of the columns has any null values.

sales_df.isnull().values.any()
pycaret-forecaster-5

Run the following set of commands only if your data has any null values. I am skipping this step for now as the data contains no null values.

sales_df = sales_df.fillna(method='ffill')
sales_df.head()

We will now create a data object from Sales column for further processing.

data = sales_df.Sales
data

We will now initialize and setup the TimeSeriesExperiment object.

exp = TimeSeriesExperiment()
exp.setup(data = data, seasonal_period = 'W', session_id=10, fold=5, fh=30)
pycaret-forecaster-6

Statistical Testing

exp.check_stats()
pycaret-forecaster-7

Exploratory Data Analysis

# time-series plot
exp.plot_model(plot = 'ts')
pycaret-forecaster-8
# cross-validation plot
exp.plot_model(plot = 'cv')
pycaret-forecaster-9
# ACF plot
exp.plot_model(plot = 'acf')
pycaret-forecaster-10
# Diagnostics plot
exp.plot_model(plot = 'diagnostics')
pycaret-forecaster-11
# Decomposition plot
exp.plot_model(plot = 'decomp_stl')
pycaret-forecaster-12

Model Training and Selection

The compare_models() function trains and evaluates performance of all estimators available in the model library using cross validation. The output of this function is a score grid with average cross validated scores. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function. This function will return trained model or list of trained models, depending on the n_select param. Default value of n_select is 1, so this will return the best model after comparison.

best_model = exp.compare_models()
pycaret-forecaster-13

You can also use create_model() function for manually creating a model with a specific algorithm. create_model() in the time series module works just like it works in other modules.

xgboost_cds_dt = exp.create_model('xgboost_cds_dt')
print(xgboost_cds_dt)
pycaret-forecaster-14

Plotting Prediction

exp.plot_model(best_model, plot = 'forecast')
pycaret-forecaster-15

You can forecast for unknown future by passing the fh value in data_kwargs parameter like below,

exp.plot_model(best_model, plot = 'forecast', data_kwargs = {'fh' : 50})
pycaret-forecaster-16

Deployment

Now we will finalize and save the model for future predictions.

# finalize model
final_best_model = exp.finalize_model(best_model)
# generate predictions
exp.predict_model(final_best_model, fh = 90)
pycaret-forecaster-17
# save model for future use
exp.save_model(final_best_model, 'final_best_model')
pycaret-forecaster-18

Conclusion

Hooray! We have now created a simple sales forecasting system using the Pycaret library. You can experiment with the parameters used to get more accurate predictions. You can find the official Pycaret documentation here. I love your feedback, please let me know what you think.

All the code of this article is available over on Github. This is a python project, so it should be easy to import and run as it is.

Share