Blog
Updates from the Arroyo team
Parsing custom formats with UDFs
User-defined functions (UDFs) allow users to extend Arroyo with new functionality by writing Rust code. In this tutorial, we'll walk through how to use UDFs to parse a custom data format: the Common Log Format used by Apache HTTP and other web servers.

Micah Wylde
CEO of Arroyo


Announcing Arroyo 0.8.0
November 28, 2023
Arroyo 0.8 is now available, with a new FileSystem source, Delta Lake sink, Redis sink, Avro support, global UDFs, and more.

Micah Wylde
CEO of Arroyo

What is streaming SQL?
November 15, 2023
What does it mean to apply SQL—a batch-oriented query language—to streams of data that are never complete? Read on for a deep dive into streaming SQL in Arroyo and other engines.

Micah Wylde
CEO of Arroyo

Running Arroyo on EKS
November 9, 2023
The easiest way to run a highly-scaled production Arroyo cluster is on Kubernetes. Setting up a Kubernetes cluster used to be a daunting task, but services like Amazon EKS have made it much easier. This post will walk through how to set up an EKS cluster and deploy Arroyo to it.

Micah Wylde
CEO of Arroyo

Can you replace Prometheus with a stream processor?
October 26, 2023
Recent versions of Arroyo have added support for HTTP sources, and treating individual lines of a response as streaming messages. So I wondered: could we use Arroyo to directly process metrics?

Micah Wylde
CEO of Arroyo

Announcing Arroyo 0.7.0
October 17, 2023
Arroyo 0.7.0 is now available, with custom partitioning for s3 writes, message framing, unnest, union, state compaction, and more.

Micah Wylde
CEO of Arroyo

Rust is the best language for data infra
September 27, 2023
Arroyo is written in Rust, a modern systems language. We think it's become the best choice for writing high-performance systems like databases and stream processing engines. Read on for why we chose Rust, and what we've learned along the way.

Micah Wylde
CEO of Arroyo

Announcing Arroyo 0.6.0
September 14, 2023
Arroyo 0.6 brings support for Google Cloud Storage, user-defined aggregate functions, SQL correctness tests, and more

Micah Wylde
CEO of Arroyo

Streaming data to S3 is surprisingly hard
September 8, 2023
Arroyo 0.5 added the FileSystem connector, a high-performance, transactional sink that lets you write pipeline outputs to file systems and object stores like S3—and makes Arroyo a great tool for performing real-time ETL. This turns out to be surprisingly tricky to do well. Read on for a deep dive into how Arroyo solved this with a new checkpointing strategy and some clever Parquet tricks.

Jackson Newhouse
CTO of Arroyo

Real-time Web Analytics with Arroyo
September 5, 2023
Working with real-time data can be daunting. We're working to solve that by building a new stream processing engine that's easy enough for anyone to use. So how easy is it to solve real-world streaming problems with Arroyo today? I decided to find out.

Micah Wylde
CEO of Arroyo

Arroyo + Warpstream
August 22, 2023
At Arroyo we're building a new stream processing engine to replace legacy Java systems like Flink and KSQL. So we were excited to see a project that's doing the same thing for Kafka. It's called WarpStream, and they're building a replacement for Kafka that's backed directly by S3.

Micah Wylde
CEO of Arroyo

Announcing Arroyo 0.5.0
August 16, 2023
Release 0.5 of Arroyo is all about connectors. We've added a high-performance transactional FileSystem sink, exactly-once Kafka support, a Kinesis connector, and more.

Micah Wylde
CEO of Arroyo

Why Not Flink?
July 18, 2023
Flink is a mature and powerful streaming engine. So why didn't we build Arroyo on top of it?

Micah Wylde
CEO of Arroyo

Announcing Arroyo 0.4.0
July 13, 2023
With the 0.4.0 release we've added Debezium support, a new REST API, and made the process of contributing connectors much easier

Micah Wylde
CEO of Arroyo

Announcing Arroyo 0.3.0
June 2, 2023
The Arroyo 0.3.0 release adds UDFs, DDL statements, custom event time and watermarks, web UI improvements, and more.

Micah Wylde
CEO of Arroyo
End-to-end SQL tests with Rust proc macros
May 8, 2023
Testing a complex system like Arroyo is hard. But with Rust's powerful proc macros, we're able to easily produce end-to-end tests of our SQL features.

Jackson Newhouse
CTO of Arroyo

Announcing Arroyo 0.2.0
May 2, 2023
Arroyo 0.2.0 brings a number of improvements including native Kubernetes support, new SQL features, and many other fixes and improvements.

Micah Wylde
CEO of Arroyo

Open-sourcing the Arroyo Streaming Engine
April 5, 2023
After launching our state-of-the-art cloud real-time data processor, we're opening up the technology that powers it: the Arroyo streaming engine

Micah Wylde
CEO of Arroyo

10x faster sliding windows: how our Rust streaming engine beats Flink
March 18, 2023
Arroyo's Rust-based stream processing engine outperforms Apache Flink in sliding window queries due to its efficient algorithms that maintain near-constant throughput even with smaller slides and larger windows

Jackson Newhouse
CTO of Arroyo