How to choose a data analytics and machine learning platform

Analytics platforms have evolved considerably over the last decade, adding capabilities that extend far beyond the last generation’s on-premises reporting and business intelligence (BI) tools. Modernized data visualization, dashboarding, analytics, and machine learning platforms serve different business use cases, end-user personas, and data complexities.  

While analytics platforms have reached mainstream adoption, many businesses in lagging industries want to develop their first dashboards and predictive analytics capabilities. They recognize that managing analytics in spreadsheets is slow, error-prone, and hard to scale, while using reporting solutions tied to one enterprise system can be limiting without integrations to other data sources.

Larger enterprises that have allowed departments to select their own analytics tools may find it the right time to consolidate to fewer analytics platforms. Many enterprises seek analytics platforms that support collaboration between business users, dataops engineers, data scientists, and others working in the data visualization, analytics, and modelops life cycle.

Further, as organizations become more data-driven, the ability to address compliance and data governance within analytics workflows become a critical requirement.

This article serves as a guide to data visualization, analytics, and machine learning platforms. Here I will discuss the features, use cases, user personas, and differentiating capabilities of these different platform types, and offer my recommended steps for choosing analytics platforms.

How to choose a data analytics and machine learning platform

  1. Identify business use cases for analytics
  2. Review big data complexities
  3. Capture end-user responsibilities and skills
  4. Prioritize functional requirements
  5. Specify non-functional technical requirements
  6. Estimate costs beyond pricing
  7. Evaluate platform types and products

1. Identify business use cases for analytics

Many businesses strive to be data-driven organizations and use data, predictive analytics, and machine learning models to aid decision-making. This overarching goal has driven several use cases:

  • Empower business people to become citizen data scientists, accelerate smarter decision-making, and perform storytelling through data visualizations, dashboards, reports, and other easy-to-build analytics capabilities.
  • Increase the productivity and capabilities of professional data scientists throughout the machine learning lifecycle, including performing discovery on new data sets, evolving machine learning models, deploying models to production, monitoring model performance, and supporting retraining efforts.
  • Enable devops teams to develop analytical products, which includes embedding dashboards in customer-facing applications, building real-time analytics capabilities, deploying edge analytics, and integrating machine learning models into workflow applications.
  • Replace siloed reporting systems built into enterprise systems with analytics platforms connected to integrated data lakes and warehouses.

Two questions that arise are whether organizations need separate platforms for these different use cases and whether supporting multiple solutions is advantageous or costly. 

“Organizations are trying to do more with less and often have to compromise on their data analytics platform, resulting in a myriad of data management challenges, including slow processing times, inability to scale, vendor lock-in, and exponential costs,” says Helena Schwenk, VP in the chief data and analytics office at Exasol. “While business needs will likely dictate which data analytics platform is chosen, finding one that ensures productivity, speed, flexibility, and without sacrificing on cost helps combat these challenges.”

Finding optimal solutions requires further investigation into the data and into organizational, functional, operational, and compliance factors.

2. Review big data complexities

Analytics platforms differ in how flexible they are when working with different data types, databases, and data processing.

“Choice of data analytics platform should be driven by the current and future use cases for data within the organization, particularly in light of the recent advances in deep learning and AI,” says Colleen Tartow, field CTO and head of strategy at VAST Data. “The entire data pipeline for both structured and unstructured data—from storage and ingestion through curation and consumption—must be considered and streamlined, and cannot simply be extrapolated from existing composable, BI-focused data stacks.”

Data science, engineering, and dataops teams should review the current data integration and management architectures and then project an idealized future state. Analytics platforms should address both current and future states while considering what data processing capabilities may be needed within the analytics platforms. Below are several important factors to consider.

  • Are you primarily focused on structured data sources, or are you also looking to perform text analytics on unstructured data?
  • Will you be connected to SQL databases and warehouses, or are you also looking at NoSQL, document, columnar, vector, and other database types?
  • What SaaS platforms do you plan to integrate data from? Do you need the analytics platform to perform these integrations, or do you have other integration and data pipeline tools for these purposes?
  • Is data cleansed and stored in the desired data structures up front, and to what extent will data scientists need analytics tools to support data cleansing, data prepping, and other data wrangling tasks?
  • What are your data provenance, privacy, and security requirements, especially considering SaaS analytics solutions often store or cache data for processing visualizations and training models?
  • What scale is the data, and what time lags are acceptable from data capture, through processing, to availability to analytics platforms?

Because data requirements evolve, reviewing a platform’s data and integration capabilities before other functional and non-functional requirements can help you narrow the candidates more quickly. For example, with growing interest in generative AI capabilities, it’s important to establish a consistent operating model for analytics solutions that may be a source for large language models (LLMs) and retrieval-agumented generation (RAG).    

“Integrating generative AI within a business hinges on a solid foundation of trusted and governed data, and selecting a data analytics platform that can adeptly govern AI policies, processes, and practices with data assets is indispensable,” says Daniel Yu, SVP of solution management and product marketing at SAP Data and Analytics. “This not only provides the needed transparency and accountability for your organization but also ensures that ever-changing data and AI regulatory, compliance, and privacy policies will not bottleneck your need for rapid innovation.”

3. Capture end-user responsibilities and skills

What happens when organizations don’t consider the responsibilities and skills of end users when deploying analytics tools? We have three decades of spreadsheet disasters, duplicate data sources, data leakage, data silos, and other compliance issues that show how important it is to consider organizational responsibilities and data governance.

So, before getting wowed by an analytics platform’s beautiful data visualizations or its gargantuan library of machine learning models, consider the skills, responsibilities, and governance requirements of your organization. Below are some common end-user personas:

  • Citizen data scientists will prize ease of use and the ability to analyze data, create dashboards, and perform enhancements easily and quickly.
  • Professional data scientists prefer working on models, analytics, and visualizations while relying on dataops to handle integrations and data engineers to perform the required prep work. Analytics platforms may offer collaboration and role-based controls for larger organizations, but smaller organizations may seek platforms that empower multi-disciplined data scientists to do data wrangling work efficiently.
  • Developers will want APIs, simple embedding tools, more extensive JavaScript enhancement options, and extension capabilities for integrating dashboards and models into applications.
  • IT operations teams will want tools to identify slow performance, processing errors, and other operational issues.

Some governance considerations:

  • Review current data governance policies, particularly around data entitlements, confidentiality, and provenance, and determine how analytics platforms address them.
  • Evaluate platform flexibilities in creating row, column, and role-based access controls, especially if you will be using the platform for customer-facing analytics capabilities.
  • Some analytics platforms have built-in portals and tools for centralizing data sets, while others offer integration with third-party data catalogs.
  • Ensure analytics platforms meet data security requirements around authorization, encryption, data masking, and auditing.

The bottom line is that analytics platforms should fit the operating model, especially when access is provided to multiple departments and business units.

4. Prioritize functional requirements

Do you really need a doughnut chart type, or are pie charts sufficient? Analytics platforms compete across data processing, visualization, dashboarding, and machine learning capabilities, and all the vendors want to wow customers with their latest capabilities. Having a prioritized functionality list can help you separate the musts from the nice-to-haves.    

“In choosing a data analytics platform, it is important to think through the full spectrum of analytic and AI use cases you’ll need to support both now and in the future,” says Dhruba Borthakur, co-founder and CTO of Rockset. “We’re seeing a convergence of analytics, search, and AI, and it’s common to filter on some text before performing aggregations or incorporating geospatial search to limit analytics to regions of interest.”

One area to dive deeply into is the analytics platforms’ generative AI capabilities. Some platforms now enable using prompts and natural language to query data and produce dashboards, which can be a powerful tool when deploying these tools to larger and less-skilled user communities. Another feature to consider is generating text summaries from a data set, dashboard, or model to help identify what trends and outliers to pay attention to.

Generative AI is also creating more interest for organizations to embed query and analytics capabilities directly into customer-facing applications and employee workflows.

“The fusion of AI innovation with the growing API economy is leading to a developer-focused shift, enabling intuitive and rich applications with sophisticated analytics embedded into the user experience.” Says Ariel Katz, CEO of Sisense. “In this new world, developers become innovators, as they can more easily integrate complex analytics into apps to provide users with insights precisely when needed.”

5. Specify non-functional technical requirements

Non-functional requirements should include setting performance objectives, reviewing machine learning and generative AI model flexibilities, evaluating security requirements, understanding cloud flexibilities, and considering other operational factors.

“Technical leaders should prioritize data platforms that offer multi-cloud and support for various generative AI frameworks,” says Roy Sgan-Cohen, GM of AI, platforms, and data at Amdocs. “Cost-effectiveness, seamless integration with data sources and consumers, low latency, and robust privacy and security features, including encryption and role-based access controls are also essential considerations.”

Cloud infrastructure is one technology consideration, but IT leaders should also weigh in on implementation, integrations, training, and change management considerations.

“When choosing the right analytics platform, consider ease of implementation and level of integration with the rest of the tech stack, and both should not generate unnecessary costs or consume too many resources,” says Piotr Korzeniowski, COO of Piwik PRO. “Consider the onboarding process, available educational materials, and ongoing vendor support.”

Bennie Grant, COO of Percona, adds that portability and vendor lock-in should be considered, and notes that easy options can quickly become expensive. “Open-source solutions reduce exposure to lock-in and favor portability, and having the flexibility of an open-source solution means you can easily scale as your data grows, all while maintaining peak performance.”

6. Estimate costs beyond pricing

Analytics platforms are in a mature but evolving technology category. Some vendors bundle their analytics capabilities as free or inexpensive add-ons to their other capabilities. Pricing factors include the number of end users, data volumes, the quantity of assets (dashboards, models, etc.), and functionality levels. 

Keep in mind that the vendor’s pricing for the platform can be a small component of total cost when you factor in implementation, training, and support. Even more important is understanding productivity factors, as some platforms focus on ease of use while others target comprehensive functionality.