cloud-nativestream processing

Transform, filter, aggregate, and join streams using SQL, with sub-second results. Autoscale to millions of events per second, no ops team neededPowered by the open-source Arroyo Streaming Engine

Built by streaming experts from

Lyft logo
Splunk logo
Sift logo
Quantcast logo

Backed by

Logo for YCombinator

What is stream processing?

Modern business operate in the moment. Consumers expect their apps are always up to date. Threats and attacks can happen in seconds. And operations teams need to respond to issues in real-time.

Stream processing operates on events as they come in—providing answers in seconds instead of days or hours.

But today stream processing is too hard. Existing tools like Apache Flink are complex. They require deep expertise to build and operate correct, reliable, and performant pipelines.

a stream of events

Why Arroyo

Arroyo is a new kind of stream processing engine, built to make real-time as easy as batch.

SQL that just works

Optimized from the SQL planner to the storage layer for excellent, unsurprising SQL support. Build reliable, efficient streaming pipelines without specialized streaming knowledge.

Designed for the cloud

Designed from the ground-up to run in modern, elastic cloud environments.

Run on the Arroyo Cloud, or self-host with Kubernetes logo Kubernetes or Nomad.

Operational simplicity

Comes out of the box with an automated control plane, so you don't need to worry about manually managing pipelines. Reliable and efficient state checkpointing prevents data loss.

Get started

How it works

How it works

Real-time with Arroyo

Arroyo lets you build streaming pipelines by writing the same analytical SQL queries you are already running in your data warehouse, with a few extensions for real-time. See our SQL docs for the details.

CREATE VIEW tags AS (
    SELECT tag FROM (
        SELECT extract_json_string(value, '$.tags[*].name') AS tag
     FROM mastodon)
    WHERE tag is not null
);

SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (
        PARTITION BY window
        ORDER BY count DESC) as row_num
    FROM (SELECT count(*) as count,
        tag,
        hop(interval '5 seconds', interval '15 minutes') as window
            FROM tags
            group by tag, window)) WHERE row_num <= 5;

Try it out

Step 1

Run the docker container

$ docker run -p 5115:5115 \
      ghcr.io/arroyosystems/arroyo:latest

Step 2

Open the web UI

See the getting started guide for more