Data Science Primer: Sales Forecasting

Author by Brian Goodwin, PhD

Limits on humans’ abilities to take in vast amounts of data and make connections within the data set means conventional sales forecasting methods are limited in their accuracy and usefulness. Ultimately, a certain amount of human intuition is necessary—whether in what data to consider or what conclusions to draw from it.
 
Machine learning enables businesses to increase sales forecast accuracy because the techniques integrate millions of data points to find connections no human ever could. In this way, human intuition is eliminated from the equation as much as possible—freeing up people to focus on running and improving the business.
 
Even the best traditional methods are nowhere near sufficient to handle the number of inputs involved in a typical sales scenario. Raw material cost, labor cost, labor availability, lead time, and historic demand are just the beginning. What about the volatility of the price of raw materials, or day-to-day fluctuations in weather, or the number of Google searches for your competitor’s product two months prior?

As data scientists, we want to discover and understand the variables that influence businesses. Modern tools make it possible to move beyond the familiar handful of variables and look at dozens or even hundreds to create a mathematical model.
 
Once a model is built, we morph it into an artificial intelligence, which monitors all the inputs. Rather than using one simple model like a moving average, the AI will evaluate many different models to find the best fit. It then back-tests its predictions against historic data to assess model efficacy. The previously impossible level of detail creates an equally high level of accuracy.
 
One aspect of that higher degree of accuracy is machine learning’s capacity to customize models for individual SKUs—a process that would be impossible in traditional methods due to time required.
 
Once the artificial intelligence has estimated the optimal model for forecasting sales—whether for a division or an individual product—it works much faster than traditional methods. For example, in a recent project for an office supply company, the supply chain and demand forecast requires less than one hour to generate, compared to over 18 hours to run the client’s previous approach.
 
That project is a good example of upgrading an approach that required substantial employee effort to one that ran seamlessly on an automated basis—and demonstrated a 44% increase in forecast accuracy over the previous process.
 
Getting to positive outcomes such as these may feel like taking a leap of faith initially. That’s because many people perceive machine learning to be a “black box”—which is uncomfortable. In actual fact, however, the machine learning process is intuitive. It begins with understanding the business then moves to understanding and collecting data. Next comes modeling and experimentation.
 
This AI was built using cloud architecture in Azure including an Azure Data Factory (for data ingestion, procedure management, and data output) and Virtual Machines (VMs) with Microsoft SQL Server with the powerful R programming language (for data reduction and machine learning). The AI truly exists within a custom cross-validation algorithm, which is the heart and soul of machine learning whereby model results are compared with historical outcomes.
 
From there, the process moves from experimentation and evaluation into implementation. The actual sales forecasting process is typically set up to be fully automatic, taking in data, calculating a prediction, and sending the prediction back to the business through Power BI or similar tools.
 
While machine learning does of course include computationally intense processes, the work to get there is by no means all “black box.” As with any other type of project at Concurrency, our approach begins and ends with a focus on the business, its needs, and how best to accomplish those. When it comes to sales forecasting, machine learning can accomplish a level of ease and accuracy that simply isn’t attainable in any other way.
 
Author

Brian Goodwin, PhD

Data Scientist

Tags in this Article