cloud-native
stream processing

Transform, filter, aggregate, and join data streams by writing SQL, with sub-second resultsScale from zero to millions of events per secondNo ops team required

Trusted by teams from

logo
logo
logo
logo

Built by streaming experts from

Lyft logo
Splunk logo
Sift logo
Quantcast logo

Get started

Arroyo ships as a single, compact binary. Run locally on MacOS or Linux for development, deploy to production with Docker or Kubernetes.

$ curl -LsSf https://arroyo.dev/install.sh | sh
$ arroyo cluster

Release 0.11.3

|

August 20, 2024

Apache 2.0 License

Arroyo is a new kind of stream processing engine, built from the ground up to make real-time easier than batch.

Analytical SQL that just works

Arroyo was designed from the start so that anyone with SQL experience can build reliable, efficient, and correct streaming pipelines.

Data scientists and engineers can build end-to-end real-time applications, models, and dashboards—without a separate team of streaming experts.

1CREATE VIEW tags AS (
2    SELECT btrim(unnest(tags), '"') as tag FROM (
3        SELECT extract_json(value, '$.tags[*].name') AS tags
4     FROM mastodon)
5);
6
7SELECT * FROM (
8    SELECT *, ROW_NUMBER() OVER (
9        PARTITION BY window
10        ORDER BY count DESC) as row_num
11    FROM (SELECT count(*) as count,
12        tag,
13        hop(interval '5 seconds',
14          interval '15 minutes') as window
15            FROM tags
16            group by tag, window)) WHERE row_num <= 5;

Designed for the modern cloud

Your streaming pipelines shouldn't page someone just because Kubernetes decided to reschedule your pods. Arroyo is built to run in modern, elastic cloud environments, from simple container runtimes like Fargate to large, distributed deployments on Kubernetes logo Kubernetes.

In short: Arroyo is a stateful stream processing engine that operates like a stateless one.

Scales easily to any workload

Arroyo is for everyone who needs to process data in real-time. Small use-cases can run with just a few MBs of RAM and a fractional vCPU.

For larger streams, Arroyo can rescale vertically and horizontally to process tens of millions of events per seconds while maintaining exactly-once semantics.

Incredible performance

Arroyo is fast. Really really fast. Written in Rust, a high-performance systems language, and built around the Arrow in-memory analytics format, its performance exceeds similar systems like Apache Flink by 5x or more.

Features

Well connected

Arroyo ships with tons of connectors, making it easy to integrate into your data stack

Real-time with Arroyo

With Arroyo, you can build streaming pipelines by writing the same analytical SQL queries you are already running in your data warehouse.

CREATE TABLE mastodon (
    value TEXT
) WITH (
    connector = 'sse',
    format = 'raw_string',
    endpoint = 'http://mastodon.arroyo.dev/api/v1/streaming/public',
    events = 'update'
);

CREATE VIEW tags AS (
    SELECT btrim(unnest(tags), '"') as tag FROM (
        SELECT extract_json(value, '$.tags[*].name') AS tags
     FROM mastodon)
);

SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (
        PARTITION BY window
        ORDER BY count DESC) as row_num
    FROM (SELECT count(*) as count,
        tag,
        hop(interval '5 seconds', interval '15 minutes') as window
            FROM tags
            group by tag, window)) WHERE row_num <= 5;

Recent posts from the blog