Blog

Updates from the Arroyo team

End-to-end SQL tests with Rust proc macros

Testing a complex system like Arroyo is hard. But with Rust's powerful proc macros, we're able to easily produce end-to-end tests of our SQL features.

Jackson Newhouse

Jackson Newhouse

CTO of Arroyo

Arroyo is a new distributed stream processing engine written in Rust that allows users to build and execute stateful, real-time computations on high-volume streaming data. As is becoming increasingly standard, users of Arroyo build out their pipelines by writing SQL queries against their streaming data. Today I want to dive into the techniques we use to test our SQL processing logic.

After Arroyo parses and plans a SQL query, it generates Rust code that is then compiled into the final pipeline binary. This is a pattern seen in a number of other systems, including Spark, Presto, and Calcite1. Code generation results in wickedly fast computation, especially with a compiler as powerful as Rust's.

But it also introduces challenges for developing and testing our engine code. Because much of the code is generated at runtime, we can't take advantage of Rust's many compile-time correctness checks. In the worst case we may end up generating invalid code, leading to compilation errors when a user tries to run a query.

Fortunately, Rust has another trick up its sleeve: a flexible macro system which allows us to generate test cases at compile time from our generated code. By leveraging a procedural macro, we're able to easily write tests to validate that our generated code compiles and behaves correctly. ‍

An Example Test

Let's look at one of the 92 different tests we currently have committed in arroyo-sql-testing, one_modulo_nullable_two:

single_test_codegen!(
  "one_modulo_nullable_two",
  "non_nullable_i32 % nullable_i64",
  arroyo_sql::TestStruct {
    non_nullable_i32: 1,
    nullable_i64: Some(2),
    ..Default::default()
  },
  Some(1i64)
);

This tests how our SQL engine handles the modulo operator % between null and non-null inputs.

The single_test_codegen macro goes through several steps to turn this into a useful test case.

Parsing the macro arguments

As a procedural macro, single_test_codegen takes a TokenStream, processes it, and returns a modified TokenStream. Initially, it extracts four input arguments:

  • test_name: The name of the test case to be generated.
  • calculation_string: A string with the SQL expression whose implementation we wish to test.
  • input_value: The row we should execute the generated code against. For our tests we have TestStruct which has nullable and non-nullable for most of our data types.
  • expected_result: Some Rust expression that should be equal to the calculation_string called on the input value.

Creating an artificial SQL query

The macro calls get_test_expression() which creates an artificial SQL context, and plans a generated query2 using DataFusion. We then pull out the DataFusion expression for further processing.

Generating The Rust Code

The extracted expression3 is converted to an Arroyo Expression, from which we generate Rust code:

arg.nullable_i64
   .clone()
   .map(|right| arg.non_nullable_i32.clone() as i64 % right)

Constructing The Test Case

Finally, the generated code is used to build a test function, using the input_value as the initial context for the expression. ‍

#[test]
fn generated_test_one_modulo_nullable_two() {
  assert_eq!({
    let arg = arroyo_sql::TestStruct {
      non_nullable_i32: 1,
      nullable_i64: Some(2),
      ..Default::default()
    };
    arg.nullable_i64
      .clone()
      .map(|right| arg.non_nullable_i32.clone() as i64 % right)
  }, Some(1i64));
}

Conclusion

Despite the intricacy of this approach, there are several practical advantages. First, writing these tests is very easy, increasing the frequency that good tests are actually written. Second, because the tests are agnostic to the underlying implementation we can make significant changes to the engine without extensive test rewrites. Lastly, because of how much of the query path it executes it is able to detect regressions from unexpected places. For instance, this test caught a regression when I upgraded us to DataFusion 23.0.

Footnotes

  1.   Calcite is in turn used by a large number of Java-based distributed compute engines, including Flink, Trino, Druid, Hive and Kylin.

  2. SELECT {calculation_string} FROM test_source
  3. CAST(test_source.non_nullable_i32 AS Int64) %
        test_source.nullable_i64