Google aims for BigLake data lake support for all unstructured data

In its continued bid to support all kinds of data and provide a one-stop data platform  in the form of BigLake, Google on Tuesday said that it will add support for most commonly used open-source table formats in data lakes.

The company, which made the announcement at its annual Cloud Next conference, describes BigLake as a service that allows data analytics and data engineering on both structured and unstructured data.

“Our storage engine, BigLake, will add support for Apache Iceberg, Databricks’ Delta Lake, and Apache Hudi,” Gerrit Kazmaier, vice president of data analytics at Google Cloud, wrote in a blog post. “By supporting these widely adopted data formats, we can help eliminate barriers that prevent organizations from getting the full value from their data.”

It’s part of Google’s ongoing effort to enhance the overall openness of its cloud data services as a strategy to compete with other cloud-based data warehouse and data lake providers.

Support for Apache Iceberg will be available in preview, the company said, adding that support for Hudi and Delta Lake would be coming soon. A specific timeline for the preview and general availability was not announced.

Google has decided to support open-source table formats as their addition will allow transaction management capabilities to data lakes, said Matt Aslett, research director at Ventana Research.

“More than one-half (57%) of data lake adopters are using at least one of these emerging table formats today, which has the potential to increase the use of data lakes as a replacement for data warehousing environments, supporting analytics workloads based on the processing of structured data,” Aslett said.

However, Ventana Research’s recent Data Lakes Dynamics Insights research indicated that less than one-quarter of organizations have adopted a data lake to replace an existing data warehouse environment, and data lake and data warehouse environments co-exist in almost three-quarters of organizations.

“This works in favor of Google’s BigLake as it has the ability to address both data warehousing and data lake approaches with a single environment,” Aslett said.

Google adding support to these open-source table formats seems to be a response to Snowflake and Databricks’ product updates, said Doug Henschen, principal analyst at Constellation Research.

“Apache Iceberg is the hot new option gaining traction because it promises openness as well as performance gains, but Google is making it clear it’s not picking sides by promising support for and Delta Lake and Hudi as well,” said Henschen.

Google rival Oracle may also announce similar features in its upcoming CloudWorld annual conference, said Tony Baer, principal analyst, dbInsight.

BigQuery supports unstructured data

As part of its Cloud Next announcements, Google has added also new features to its managed enterprise data warehouse, BigQuery, with the inclusion of adding support for unstructured data.

“Beginning now, data teams can analyze structured and unstructured data in BigQuery, with easy access to Google Cloud’s capabilities in machine learning (ML), speech recognition, computer vision, translation, and text processing, using BigQuery’s familiar SQL interface,” Kazmaier wrote.

Data teams in most enterprises, according to Google, mostly use structured data, which accounts for just 10% of all data produced. Structured data includes data from operational databases, SaaS applications such as Abode, SAP, ServiceNow, Workday and semistructured data in the form of JSON log files.

Unstructured data, on the other hand, includes video from television archives, audio from call centres or radio and documents in varied formats.

Google contends that enterprises face increasing demand to work with unstructured data.  

Google’s move to add support for unstructured data is a differentiating capability for the cloud service providers, analysts said.

No other rival cloud service provider is presently addressing the need to support unstructured data as aggressively as Google, Henschen said.

“Addressing all data types on a single platform promises to simplify things for CIOs, data scientists and developers alike,” Henschen added.

Other BigQuery updates at Cloud Next

Google also announced support for open-source unified analytics engine Apache Spark. The move is consistent with the company’s strategy to position its cloud service as a modern lakehouse that supports analytics, warehousing, and data science, analysts said.

The new integration, which will be in private preview, will allow enterprise data teams to create procedures in BigQuery, using Apache Spark, that integrate with their SQL pipelines, the company said.

“By embracing Spark, Google is embracing the most popular choice of data scientist,” Henschen said.

“In contrast with Google, Snowflake is still early in its journey to data science using Python and other languages through its Snowpark offering on top of its database, and it’s relying heavily on partners to for support,” Henschen added.

Another rival, Databricks, has also enhanced support for data warehouse and business intelligence (BI) workloads on its platform.

Meanwhile, Google also has integrated its change stream service, dubbed Datastream, with BigQuery.

“The new integration will help organizations more effectively replicate data from all kinds of sources—including real-time data in AlloyDB, PostgreSQL, MySQL and third-party databases like Oracle—directly into BigQuery,” the company said in a blog post.

Further, Google has updated its data unifier service, DataPlex, to automate processes associated with data quality.

“For instance, users will now be able to more easily understand data lineage—where data originates and how it has transformed and moved over time—reducing the need for manual, time consuming processes,” Kazmaier wrote in the blog post.

Looker Studio unifies business intelligence products

At Cloud Next, the company said that it will be unifying its business intelligence products by merging Looker and Data Studio to form Looker Studio, which in turn will be available in three options.

“Looker Studio currently supports more than 800 data sources with a catalog surpassing 600 connectors, making it simple to explore data from different sources,” Kate Wright, senior director of BI product management at Google Cloud, wrote in a blog post.

Looker Studio, which will offer private preview access to data models currently, is also expected to get a new interface, the company said, adding that the base version of Looker Studio will be free.

Before the merger of the products, Looker was a paid service and Data Studio was a free service. The free version, according to Aslett, is not expected to come with support. In order to get support and added features, enterprises will have to update to the Looker Studio’s Pro version.

“Customers who upgrade to Looker Studio Pro will get new enterprise management features, team collaboration capabilities, and SLAs [service level agreements]. This is only the first release, and we’ve developed a roadmap of capabilities, starting with Dataplex integration for data lineage and metadata visibility, that our enterprise customers have been asking for,” Wright said.

Other updates to Looker include support for visualization tools, such as Tableau and Microsoft Power BI, to access data, the company said.

Vertex AI Vision released

In an effort to help developers and data scientists build and deploy computer vision-based applications, Google has added a new feature called Vertex AI Vision to extend the capabilities of its machine learning platform Vertex AI.

The company has been working to ease machine learning (ML) operations with the launch of the Vertex AI platform last year in in May, followed by the introduction of collaborative development environment Vertex AI Workbench in October.

“The new end-to-end application development environment will help you ingest, analyze, and store visual data,” the company said, claiming that the new service can reduce the time to create computer vision applications from weeks to hours and at one-tenth the cost of current offerings.

Google claims that it achieves these efficiencies by providing a relatively easier to use interface and a library of pretrained machine learning models for common tasks such as occupancy counting, product recognition, and object detection.

“It also provides the option to import your existing AutoML or custom ML models, from Vertex AI, into your Vertex AI Vision applications. As always, all of our new AI products also adhere to our AI Principles,” the company said.

Copyright © 2022 IDG Communications, Inc.

Source