InfluxDB’s next-gen time-series engine is built on Rust, supports SQL

As enterprises see an unprecedented increase in real-time data analytics, InfluxDB on Wednesday said that it was releasing a next-generation time series engine for its managed database service InfluxDB Cloud.

Time series data, according to market research firm IDC, can be defined as a set of data points that are collected at regular time intervals with fixed time stamps.

These types of data sets are mostly used to reveal patterns or seasonality among other trends and can help enterprise analytics teams describe and understand what is happening with the data and why, to make better business decisions, Amy Machado, research manager at IDC, wrote in a research report.  

Time series databases or data sets have recently gained more prominence with the advent of streaming technologies, Machado wrote, adding that in contrast to the earlier practice of uploading such a database in a high-latency batch format, streaming technologies allows time series data to flow into the database in real-time.

“A time series database and analytics toolset work best to first handle a large influx of continuous data and then successfully mine the massive workloads of data for insights,” Machado wrote in the report.

Developed on Rust for performance, scale

The new engine, which is based on the company’s IOx open source project introduced in 2020, has been developed on the Rust programming language to enhance scale and performance, the company said in a statement.

In order to support performance in terms of faster storage, the company claims to have reengineered its columnar oriented storage, enabling the engine to ingest data in high volumes with unbounded cardinality.  

Typically, a column-oriented database is faster than a row-oriented one as it uses less memory to store data. This also enhances query output speeds as the system needs to access a smaller portion of the database to process it.  

Cardinality in a database management system can be defined as the relationships between the data in two database tables. The more cardinality is allowed, the better a database can scale.

The new engine can process queries across most time series data within milliseconds, the company said, adding that it uses Apache Parquet files on disk storage and Apache Arrow for data in-memory operations among components.

Writing queries in SQL

With the introduction of the new engine, the company said that it was finally adding support to allow developers to write queries in SQL.

SQL is the most popular database operating language as it is used across most traditional relational databases.

“The SQL capability that InfluxDB newly boasts about has, in fact, been built in from the get-go by Timescale, which has always been based on PostgreSQL,” said Tony Baer, principal analyst at market research firm dbinsight.

Previously, InfluxDB allowed developers to write queries with the help of APIs, Flux and InfluxQL.

Flux, which is built on open source, is a standalone scripting and query language focused on code reuse and optimised for extract, transform and load (ETL), the company said.

InfluxQL, on the other hand, is a query language that has SQL-like syntax.

Adding support for SQL is a growing trend overall for real-time data solutions, Machado said, noting that the number of developers who know SQL is large. “SQL support can boost your adoption rates. You can use exiting teams to add new use cases when you offer SQL support.”

All the query languages, according to the company, can be accessed via the DataFusion query engine—which is an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

Further, the new engine will add support for observability use cases as enterprises will have access to data needed for observability, such as traces, logs and metrics, the company said.

InfluxDB faces competition

InfluxDB is rated highly when it comes to time series data workloads and competes with the likes of Graphite, Prometheous, TimeScaleDB, QuestDB, Apache Druid and DolphinDB among others, according to database recommendation website dbengines.com.

When asked about InfluxDB’s momentum in the market, Baer said: “Out of the gate, InfluxDB became an early favorite with developers, but they wasted the opportunity with incompatible forks that slowed their momentum.”

“In the meantime, time series data has become a checkbox item with many cloud operational and analytic databases,” Baer added.

Time series data or workloads have been on the rise with the explosion of IoT and is in great demand for use cases around operations within oil and gas, logistics, supply chain, transportation, and healthcare, according to IDC.

Copyright © 2022 IDG Communications, Inc.

Source