Machine Learning for Your Sales Pipeline Forecasting

On average, more than half of companies miss their monthly revenue forecasts by more than 10 percent. Less than one in seven are within the ideal range of 5 percent. The same Forrester report, Revenue Operations And Intelligence Delivers Predictable Growth (August 2021), also revealed that about one-third of companies indicated ineffective technology is a cause of forecast variability. What’s the impact of these mishaps? Missing a forecast can lead to budget restrictions, damage a company’s reputation, and cause turnover in the sales organization.

Sales pipeline forecasting is a leading indicator of future bookings and revenues. It empowers Sales, Marketing, and Finance teams to understand how much pipeline currently exists, the timing of open sales deals, and how well-positioned the organization is to hit its future financial targets. It also helps teams understand where gaps and risks exist to better manage investments and marketing planning.

However, sales pipeline forecasting falls short of its promise. Many companies struggle with the accuracy and objectivity of their forecasting. These challenges point to limitations around what data is used, as well as the lack of flexibility and business context included in technology solutions. 

Company growth is tied directly to an organization’s ability to forecast sales pipeline with speed and accuracy. So, it’s time to move toward a model that uses all relevant data and can be easily customized to match an organization’s needs. These requirements demonstrate the need for machine learning (ML) in pipeline forecasting.

Future forecast: Machine learning delivers accurate, fast pipelines 

While the benefits of pipeline forecasting are many, so are the challenges that organizations face. The four most common issues include:

1. Long and complex sales cycles, especially in the B2B enterprise space, which make it hard to predict booking probabilities and timelines;

2. Human-driven processes and nuanced customer journeys, both of which can be challenging to capture and account for in forecasting; 

3. Highly varied data quality due to the multitude of human touch points across systems; and 

4. Disconnected systems and data silos, which impede the ability to conduct analysis on all available data.

While some organizations lack a pipeline forecasting process due to limited data, resources, or awareness, most companies use a CRM platform or commercial solution to deliver their forecasting. 

With CRM-based applications, pipeline forecasting is often configured on the vendor’s platform and delivers predictions based on data captured in the CRM system. As a result, the pipeline forecast is constructed from information entered by sales reps and represents a sum of open deals. The downfall to this approach is that organizations don’t have the ability to leverage all of their valuable data in this analysis and are limited to what the CRM holds.

Commercial solutions tend to provide pipeline forecasting by applying analytics or data science models to an organization’s data. The issue with third-party tools is straightforward: It’s extremely difficult for a vendor to build a one-size-fits-all model or to customize a solution based on each organization’s specific business context. Flexibility is crucial if you want to take a more granular approach to pipeline forecasting. 

What organizations require is machine learning. Specifically, an end-to-end machine learning application that is fully customizable, uses all relevant organizational data, produces near real-time pipeline predictions (as required by business cadence), and is accessible to all teams to ensure organizational alignment.  

With machine learning powering your pipeline forecasting, it’s possible to deliver an accurate picture of your current pipeline position, predictions around the timing of open sales deals, and a forecast for the quantity of new deals you expect to open in each sales geography. 

This ML model can also predict where your pipeline faces challenges and flag those sales regions, which empowers quick responses to improve outcomes. For example, marketing teams can prioritize targeted campaigns in those regions, and finance teams can make investment decisions that optimize needed resources. This cross-functional alignment focuses everyone on achieving desired bookings and revenue targets, which drives business growth.

How to deliver a pipeline forecasting model with machine learning

Three key stages are required to build a pipeline forecasting model with machine learning. 

1. Preparation: Determine what data is required, unify and make accessible all relevant data instantly, and build features for the model.

2. Modeling: Train the machine learning algorithm and accelerate the modeling process with near-zero operations.

3. Operations: Deliver insights to stakeholders and optimize the business with near real-time pipeline forecasting.

In order to build a powerful ML model and predict sales bookings with accuracy, it’s crucial to manage the requirements of each stage and understand why each is important. 


The preparation stage starts with determining what your essential core data should be for the model. 

One of your most important assets will be the datasets your team builds in the form of daily snapshots of your opportunities in your CRM system. This “point in time” data enables you to go back and see exactly what opportunities were open on any given day, when they were forecast to close, how much they were forecast for, and who owned them. It’s strongly recommended you capture and store this data to maintain an accurate historical reference.

What differentiates an in-house ML model from CRM or commercial solutions is that your team is able to create a bespoke dataset that captures the nuances unique to your business in a way that an outside team would not have the knowledge to do so. Work with your stakeholders to determine the fields in the CRM system relevant to modeling your sales cycle. You may find that this data becomes incredibly useful as a source of reporting. A good rule of thumb is to prioritize fields that are complex to recreate for a particular point in time or that you only have access to the current value today.

While business context determines which fields to capture, the data that forms the foundation of your ML model will likely fall into one of three categories:

  • Opportunity information, such as forecast category and forecast amount; 
  • Account information, including contacts and firmographics (company size, revenue, industry), the latter of which will likely come from third-party data sources; and
  • Sales engagement information, such as who is working the opportunity and what actions have been taken, including marketing touchpoints.

Once you determine what data you want to use, the next step is to unify it. To improve data scientists’ productivity, it’s best to bring all data onto a single data platform. This unification includes marketing data (email campaigns, social sources), third-party data, and public data sources, as well as any other relevant data sources required by your model.

Unifying data is best achieved through modern secure data sharing functionality, which enables access to live, governed data without ETL processes. In a similar manner, access to a modern data marketplace can help round out your model with third-party datasets.

The last preparation step is feature engineering, which is when you explore your data and create useful features for the model. Because this work is time-consuming, you may want to use a tool that keeps track of previous work, enables search for previous SQL queries, visualizes data and calculates summary stats, and allows for reuse of previous code. 

Ideally, this data will be built in collaboration with other departments or business units within your company that play a role in the sales cycle. Fostering a culture of sharing core business data creates alignment and enables subject matter experts to contribute within their respective domains. A data platform that enables this functionality not only saves time and effort but also removes any risk of data silos.


The next stage is to build the model, which starts with thinking through the granularity and time horizon you need to forecast. You’ll need to determine if your forecast should go as deep as an individual seller, how to aggregate your company’s product lines, and how far into the future you need to forecast.

The goal is to build the right model for your organization’s needs. Our suggestion is to take a bottom-up approach to forecasting. For most organizations, this will likely require one model focused on predicting the timing of existing pipeline and a second model to handle pipeline generation. While a time series approach may be useful to get a quick baseline of performance, the volatility in the sales cycle can often make this class of models unreliable. The model for existing pipeline will generally be trained at the “opportunity” level while the pipeline generation model can be trained at the level of “sales representative”.

The objective of an opportunity-level model is to determine the most likely future quarter when an opportunity will close. The representative-level model forecasts the number and size of opportunities by each sales rep in each region and the distribution of close dates.

Used together, these two models enable you to forecast where you will land from a bookings perspective in the current quarter, where you’ll start the next quarter, where you’ll land in the next quarter, and so on. In addition, you can see by sales region how much revenue will be booked and how much will be open at the start of a future quarter. 

This methodology is helpful for two reasons:

1. Timing of individual sales deals can often make a significant impact on a forecast, which can be accounted for with this model. 

2. It enables the inclusion of features that can provide context to individual open deals, which can’t be done through time-series modeling. 

Behind this bottom-up ML model is a set of binary classification models, which are trained against multiple target variables. Using the same training data, you can have the model learn the probability that a particular opportunity will close in the current quarter, remain open at the start of next quarter, will close next quarter, and so on. You can then use this probability to calculate the expected value of closed pipeline or open pipeline in each of those periods.

With historical data enhanced with features relevant to the modeling problem, you can set up training data to forecast across a range of time horizons. Oftentimes, Marketing will want to know how much pipeline you will start the quarter with, while Sales wants to know how much pipeline they will close in the quarter. 

You’ll need to do some research to determine the most representative time period to use when training your model. For many B2B companies, there is a strong seasonality with the fourth quarter having the largest volume of deals won as sales teams are often incentivized to close their deals before the end of the fiscal year. When you are in the fourth quarter, you should use a snapshot from last year as your training data so you can best capture this reality. That said, every business is unique and you should evaluate the performance with various different time periods in your training data so you can build a reliable model.

The beauty of an ML model is that it learns the characteristics of deals that are likely to remain in forecast versus deals that are likely to be pushed or closed. As a result, the model becomes even more precise in its predictions over time. 


The final stage is to modernize the operational process and provide your sales pipeline forecast back to the relevant business units through a single dashboard that updates on a regular cadence. 

When used by Sales, Marketing, and Finance, this dashboard can help align teams to the present pipeline situation and future pipeline predictions. Stakeholders should be able to evaluate the forecast in the context of future bookings goals and drill down into sales geographies for deeper inspection and understanding.  

Ideally, the dashboard should also enable stakeholders to see how the pipeline changed in comparison to previous quarters and provide additional context on sales performance. The latter might include insights into why particular deals have the predicted timeline they do, to help the team understand and address any issues. For example, perhaps a deal hasn’t had any marketing engagement or the customer hasn’t hit key milestones. These insights provide actionable steps that can save a deal or accelerate a booking. 

Ready to accelerate growth? Invest in a cloud data platform 

Building an accurate pipeline forecasting model requires unfettered support for machine learning. There’s only one architecture today that delivers the power you need, and that’s a modern cloud data platform. 

With secure, governed access to all data, a cloud data platform ensures fast modeling and continuous machine learning. Data scientists require virtually unlimited performance and scale and near-zero maintenance, both of which come from investing in a fully-managed cloud data platform delivered as a service. 

To unify all ML processes in one location and improve data scientists’ productivity, your cloud data platform should provide:

  • Data warehouse or data lake for a single source of data, available to all users;
  • Secure Data Sharing to access live data from its original location in a controlled, governed manner without any ETL processing; 
  • Data marketplace to discover and access third-party datasets via the same secure data sharing technology, so you don’t have to copy and move data;
  • Data engineering and feature engineering for easy and fast data transformation;
  • Data science for model training in a programming language of choice; and
  • Data applications to capture data consumption and results.

With a modern cloud data platform, your pipeline forecasting model can be built with flexibility and speed. I speak from experience. At Snowflake, I started as a new employee, conceptualized the first model we wanted, built our new pipeline forecasting model, and shared results with stakeholders—all within a four-month period and working as a single resource on the project. 

Having all data available to me in a single platform was a game changer. I was able to jump in and access everything I needed to build and operationalize a powerful model that is now used by our Sales, Marketing, and Finance teams on a daily basis. 

This pipeline forecast has given our go-to-market teams a common understanding and language, with which to strategize and tackle pipeline opportunities together. For example, our field marketing team uses the pipeline forecast to prioritize and run campaigns that target specific sales accounts and support the reps in closing those deals. As a result, we’ve been able to shorten our sales cycle and address deals that were at risk of slipping. 

With machine learning and a cloud data platform, the future of sales pipeline forecasting is no longer, as they say, a pipe dream. 
To learn how Snowflake uses the Data Cloud for ML-based lead sales pipeline forecasting, view this link. Or, to speak to someone from Snowflake about how the Data Cloud can transform your marketing intelligence and analytics, click here.


Leave a Reply