We’re excited to announce the 0.2.0 release of Arroyo, our first since we open-sourced the project last month. Arroyo is a state-of-the-art stream processing engine that lets anyone build complex real-time data pipelines with SQL.

With the 0.2.0 release, we are continuing to push forward on features, stability, and productionization. We’ve added native Kubernetes support and easy deployment via a Helm chart, expanded our SQL support with features like JSON functions and windowless joins, and made many more fixes and improvements detailed below.

We are also thrilled to welcome three new contributors to the project:

@rtyler made their first contribution in #8
@akennedy4155 made their first contribution in #49
@jbeisen made their first contribution in #77

Looking forward to the 0.3.0 release, we will continue to improve our SQL support with the ability to create sources and sinks directly as SQL tables, views, UDFs and external joins. We will also be adding a native Pulsar connector and making continued improvements in performance and reliability.

Excited to be part of the future of stream processing? Come chat with the team on our discord, check out a starter issue and submit a PR, and let us know what you’d like to see next in Arroyo!

Features

Native Kubernetes support

As of release 0.2.0, Arroyo can natively target Kubernetes as a scheduler for running pipelines. We now also support easily running the Arroyo control plane on Kubernetes using our new helm chart.

See the docs for all the details.

Add Kubernetes scheduler by @mwylde in #79
K8s deployment and helm chart by @mwylde in #91

Nomad deployments

Arroyo has long had first-class support for Nomad as a scheduler, where we take advantage of the very low-latency and lightweight scheduling support. Now we also support Nomad as an easy deploy target for the control plane as well via a nomad pack.

See the docs for more details.

Support for deploying Arroyo to a nomad cluster by @mwylde in #50

SQL features

With this release we are making big improvements in SQL completeness. Notably, we’ve made our JSON support much more flexible with the introduction of SQL JSON functions including get_json_objects, get_first_json_object, and extract_json_string.

We’ve also added support for windowless joins.

Here are some of the highlights:

Initial JSON functions and raw Kafka Source by @jacksonrnewhouse in #86
Windowless Joins by @jacksonrnewhouse in #61
String functions by @jacksonrnewhouse in #17
Hashing Functions by @akennedy4155 in #49
Casting between numeric types and strings by @jacksonrnewhouse in #5
Casting timestamps to text by @jacksonrnewhouse in #32
String Concat Operator || in SQL by @akennedy4155 in #55
Add COALESCE, NULLIF, MAKE_ARRAY by @jacksonrnewhouse in #89

Connectors, Web UI, and platform support

Arroyo now supports SASL authentication for Kafka and FreeBSD

Add FreeBSD support by @rtyler in #8, #19
SASL authentication support to kafka connections by @jacksonrnewhouse in #20
Add support for changing pipeline parallelism in the Web UI by @jbeisen in #77

Fixes

Fix filter on partition_by parsing. by @jacksonrnewhouse in #27
Make parquet state management more reliable by @jacksonrnewhouse in #23
Fix the quoting of types in the sql package by @jacksonrnewhouse in #64

Improvements

SQL macro testing by @jacksonrnewhouse in #10
Add a SQL IR and factor out optimizations by @jacksonrnewhouse in #80
Multi-arch builds for Docker by @jacksonrnewhouse in #11
Prometheus and pushgateway in the docker image for working metrics by @mwylde in #16
Bump datafusion to 23.0, arrow to 37.0 by @jacksonrnewhouse in #92
Run compiler service locally, compile in debug mode if DEBUG is set by @jacksonrnewhouse in #83
Replace shelling out to rustfmt with prettyplease by @jacksonrnewhouse in #87

The full change-log is available at https://github.com/ArroyoSystems/arroyo/commits/release-0.2.0

Blog