Snowflake Is All in on Python, Data Pipelines, and Apps

Snowflake is committed to helping developers focus on building their apps and businesses rather than on infrastructure management. At this year’s Snowday, Snowflake announced a series of advancements that empower developers to do more with their data, enhancing productivity and unlocking new ways to develop applications, pipelines, and machine learning (ML) models with Snowflake’s unified data platform. From Snowpark for Python becoming generally available, to the integration of Streamlit in Snowflake, to the private preview of Dynamic Tables for simplifying streaming data, Snowflake is making it seamless and more efficient for developers to build in the Data Cloud. 

“Snowflake’s advancements provide developers with the capabilities to build powerful applications, data pipelines, and machine learning models with the utmost confidence, and eliminate complexity so they can drive value across their organizations with the Data Cloud,” said Torsten Grabs, a Director of Product Management at Snowflake.

Python, Python everywhere! 

Python is the most popular language for data scientists and the third most popular language among all developers. To help Python developers derive value from data without complex infrastructure management or complex package dependency management, Snowflake developed Snowpark for Python, now generally available and ready for your production workloads.

Snowflake is all in on Python, Data Pipelines and Apps

With this release, data engineers, data scientists, and developers can collaborate with other data teams while using Python’s familiar syntax and strong ecosystem of open source libraries without manual installs or package dependency management. Developers will also be able to eliminate the data security roadblocks that prevent projects from going into production, since data processing occurs in Snowflake’s governed, secure platform and runs Anaconda-verified, open source packages.

Snowflake also announced enhanced Snowpark capabilities including: 

  • Snowpark-optimized warehouses, in public preview, so Python developers can run large-scale ML training and other memory-intensive operations directly in Snowflake.
  • Unstructured data processing, in private preview for Python, so developers can access and process unstructured files including text files, documents, images, and more. 
  • Python Worksheets, in private preview, to enable the development of applications, data pipelines, and ML models inside Snowsight, Snowflake’s web interface.

In the months since its public preview announcement at Summit this summer, Snowpark for Python has seen 6x growth in adoption, with hundreds of customers including Western Union, NerdWallet, Northern Trust, Sophos, and more building with their data using Snowpark. 

Streamlit Integration: Python-based app development natively in Data Cloud

With tens of thousands of developers and over 1.5 million data apps built (as of November, 2022), Streamlit gives Python practitioners the power to bring their ML models and data to life in the form of interactive apps. Snowflake is now bringing together the ease-of-use and flexibility of Streamlit with the scalability, scope of data, and governance of the Data Cloud.

Currently in development, with private preview coming soon, the Streamlit integration will give data scientists and Python-savvy data practitioners the power to build Streamlit apps, deploy them on Snowflake’s secure and governed platform within minutes, and securely share interactive applications with their colleagues. This greatly reduces the complexity of building and sharing apps, eliminating the need to define routes, handle HTTP requests, or write HTML, CSS, or JavaScript. 

“Streamlit serves as the interaction engine for the vast majority of our data science and machine learning models today, actively transforming how our teams build, deploy, and collaborate on powerful applications with other stakeholders across the business,” said Sai Ravuru, GM Data Science & Analytics, JetBlue. “With Snowflake’s Streamlit integration, we can go from data to ML-insights all within the Snowflake ecosystem, where our data is already present, making it easier and more secure for us to create impactful applications to further mitigate the negative impact of flight disruptions, provide more predictability to our operational planning teams, and more customer personalization to give our customers the best possible experience.”

Simplified streaming pipelines and improved automation and observability

Snowflake is reimagining how users build data pipelines, making it easier to work with both batch and streaming data within a single platform, and further eliminating silos for customers. Additionally, the recently announced features improve developer productivity to allow for faster data onboarding, and a better ability to observe and manage pipelines natively in Snowflake.

Enhancements include:

  • Dynamic Tables (private preview): Formerly introduced as Materialized Tables, Snowflake is removing the boundaries between streaming and batch pipelines by automating incremental processing through declarative data pipelines development for coding efficacy and ease. This also simplifies use cases including change data capture and snapshot isolation, and is native to Snowflake so it can be shared across all Snowflake accounts while maintaining the security and governance requirements applied to the underlying data.
  • Observability and Experiences: To further meet the needs of developers, Snowflake is investing in native observability and developer experience features so they can build, test, debug, deploy, and monitor data pipelines with increased productivity through alerting (private preview), logging (private preview), event tracing (private preview), task graphs and history (public preview), and more. 
  • Schema Inference (private preview): Improves productivity and opens up broader self-service analytics by allowing business teams to onboard new data easily without the need to define schemas or get additional support from IT teams.
  • Serverless Tasks (generally available): Execute pipelines effortlessly and efficiently with Serverless Tasks. With Serverless Tasks, Snowflake automates and optimizes task management, resulting in improved cost effectiveness and more efficient use of compute, without manual configuration.

These are just some of the exciting announcements from Snowday 2022. Access the full Snowday sessions on demand here. For more content for the builders out there, we have a two-day virtual Data Cloud Dev Summit, BUILD! Immerse yourself in improving your apps, data pipelines, ML workflows, and more. Then meet with your peers at one of 15+ in-person BUILD.local events. We hope to see you there!


Leave a Reply