Skip to main content

Actyx Query Language (AQL)

The Actyx Query Language (AQL) allows you to select precisely which information you want to extract from the query or subscribe endpoints of the Events API. In its most basic form it selects events based on their tags. This section gives you all the gory details of how that works and what else you can express.

This chapter is normative

If you find Actyx behavior differing from the description below, then you have found a bug. Please tell us about it!

If you discover some behavior that is not documented below, then you have wandered into unspecified territory — the observed behavior may change in future releases. In these cases we very much welcome your questions, comments, and suggestions in the forum!

General structure

AQL is not whitespace sensitive, you can add any amount of whitespace or comments in almost all places (the exceptions are documented below). Comments begin with -- and extend to the end of the line. The overall structure of a query is the following:

PRAGMA features := some features
-- the above is optional
FROM 'mytag1' & 'mytag2' -- the only mandatory part
… -- optional list of transformations
END -- optional

Language features

The language is growing and will continue to do so. Our intention is to keep it backward compatible for as long as we can so that you can keep running your apps also on future Actyx versions of the 2.x series. Since we need to try out extensions and get feedback on them before committing to them, new features start out in alpha or beta status and may then graduate to fully released at a later time.

Alpha features are not documented, are expected to change, and are only revealed to ask for specific feedback on the functionality. If you have questions or ideas, you are always welcome to let us know, perhaps it is easy to add or already on its way.

Beta features are documented and are expected to be more stable, although we reserve the right to change them at any time even during the Actyx 2.x series. You are very welcome to try them out and discuss about them, for example in the forum.

Enabling features

If your query uses a feature that is not yet released, you’ll have to enable this feature using the PRAGMA features := <words> syntax. Each feature’s name is a single word and you can put any number of words after the equals sign, separated by whitespace.

The names of required features are shown below in the form [featureName].

PRAGMA features := interpolation subQuery aggregate fromArray multiEmission
Pre-Pragma syntax

In Actyx versions before 2.13.0 features are enabled using a different syntax:

FEATURES(some features)
FROM …

Event queries

The FROM <tag_expr> part of an AQL query selects the events from which the results shall be computed. Hereby, <tag_expr> is a boolean expression composed from the following basic atoms:

  • 'mytag' or "mytag" matches all events that carry this tag. Tags are arbitrary non-empty Unicode strings. Quoting is only needed for the used delimiter: if your tag is enclosed in single quotes, single quotes within the tag need to be repeated, e.g. 'o''clock'. The analogue goes for double quotes. Tags containing IDs that where created using an Actyx SDK can be queried like mytag:myid.

  • isLocal matches all events that were emitted by the local Actyx node.

  • allEvents matches all events.

  • TIME > <time> matches all events whose timestamp is greater than the given one. Timestamps can be given in UTC using the suffix +00:00 or Z, or they can use a numeric time zone offset (like +02:00 for CEST). Valid formats are 2021-07-20T09:53:07.462Z, or with microseconds, just seconds, or just the date. Omitted components are treated as zero, so 2021-07-20+04:30 marks the beginning of that day in Afghanistan.

    The comparison can analogously be done with TIME >= <time> (or ), TIME < <time>, or TIME <= <time> (or ).

  • KEY > <event ID> matches all events whose event ID is greater then the provided one. An event ID consists of <lamport timestamp>/<node ID>-<stream nr> and has the same sort order as the corresponding event. You may specify only the lamport timestamp, in which case node ID and stream number are treated as zero.

    The comparison can analogously be done with KEY >= <event ID> (or ), KEY < <event ID>, or KEY <= <event ID> (or ).

  • appId(<app ID>) matches all events from the given app ID. No whitespace is allowed between the parentheses. App IDs are valid DNS names, i.e. name components consist of lowercase letters, digits, or dashes and are separated by a single dot. A valid example is appId(com.example.my-app).

    As a special case appId(me) refers to the app ID of the current app, i.e. the app ID in the manifest with which the Actyx connection was made.

Larger expressions are constructed using the and and or combinators:

  • <tag_expr> & <tag_expr> matches all events that match both the left and the right condition
  • <tag_expr> | <tag_expr> matches all events that match at least one of the given conditions

As usual, & takes precedence over |. You can use parentheses to override this: 'a' & ('b' | 'c') is the same as 'a' & 'b' | 'a' & 'c'.

Controlling the order of events

When using the query endpoint (as opposed to subscribe and subscribeMonotonic), the events matching the tag expression are delivered in ascending event key order unless indicated otherwise. One way to indicate this is the order property when making the API call, another can be implied by the AGGREGATE keyword (see below). All these can be overridden by appending an ORDER clause directly after the tag expression:

FROM 'myTag' ORDER DESC  -- the other direction is ASC

[interpolation] String interpolation

The tags given in a tag expression can also be computed using string interpolation; this is done most commonly in sub-queries. Interpolation is enabled by enclosing the tag in backticks instead of single or double quotes. Within the string use {<expr>} to insert the string representation of the result of expr at this location.

FROM `machine:{_.machineId}` …

If you want to insert a literal backtick or opening brace, you’ll need to emit it via an expression:

`x{'`'}y` -- yields the string 'x`y'

[fromArray] Iterating over arrays

Besides streaming events from the event store, a (sub)query can also start from inputs given as an array:

FROM ['a', 1, ...someArray] …

This will feed the items in their given order to the following transformation stages. While events have an event key, logical & physical timestamps, and tags the values 'a' and 1 in the above example have no such metadata. If someArray is a sub-query expression then the sub-query’s results are passed to the following transformation stages one by one, i.e. without first merging their metadata.

Data transformations

Following the FROM <tag_expr> clause you may optionally specify a sequence of transformation steps. The first step will receive the events selected by the event query as input — bound to the variable _ — and compute outputs from them. Each of the outputs is fed into the following step — again bound to the variable _ — where the same principle applies, etc. This allows you to write down the transformation from events into query results in an incremental fashion, doing one step at a time.

note

The processing described above is exactly how Actyx evaluates your query. This means that the order in which you write the steps is significant and affects the performance of your query.

Discarding inputs with FILTER

Whenever an event query is not specific enough, e.g. because not all relevant properties are available as tags, you can filter out undesirable results within Actyx. This makes processing more efficient since it avoids serialization and deserialization plus the filtering in your application code. The syntax for a filter step is

FILTER <simple_expr>

If the provided expression (see below) evaluates to the boolean value TRUE, then the input is passed along as output. Otherwise the input is discarded.

Transforming a single input with SELECT

The events originating in your FROM <tag_expr> clause may contain more information than you need or they may not have the format you desire, e.g. different property names. In these cases, you can use a transformation step that takes a single input and computes at most one output from it using the simple expression language shown further below. The syntax for a transformation step is

SELECT <simple_expr>

Whatever the given expression (see below) evaluates to will be passed on as an output. However, it is possible that no value is computed, for example when accessing non-existent properties in the input. In such cases, no output is generated.

[multiEmission] Emitting multiple results

If your input contains the information for multiple desired query results (e.g. you want all usernames involved and the event contains both an author and a reviewer) then you can turn one input into multiple outputs.

SELECT <expr1>, <expr2>, …

This can be combined with the [spread] feature to emit the elements of an array as individual outputs:

SELECT ...<array>

[aggregate] Aggregating inputs

Perhaps the most powerful feature of AQL is the ability to condense a large number of inputs from many events into one output object before transferring this result to your application. The syntax is similar to SELECT, with some special operators.

AGGREGATE <aggregate_expr>

The given expression can only refer to variables (including the current input _) inside one of the the following operators:

  • LAST(<expr>) yields the expression result with the latest event key
  • FIRST(<expr>) yields the expression result with the earliest event key
  • MIN(<expr>) yields the minimum value computed by the expression (works only for boolean and numbers)
  • MAX(<expr>) yields the maximum value computed by the expression (works only for boolean and numbers)
  • SUM(<expr>) yields the sum of all expression results (works only for boolean and numbers)
  • PRODUCT(<expr>) yields the product of all expression results (works only for boolean and numbers)

The results yielded by these operators are then assembled into the final result using the rules for simple expressions. Usage of only LAST operators indicates that descending event key order is desired; in this case processing will stop immediately when the first value is found. Usage of only FIRST works analogously.

tip

You can use AGGREGATE LAST(...) to efficiently retrieve the latest event about something.

Discarding excess inputs

If you need only the three first events for some query you should use the LIMIT clause:

LIMIT <number>

The given positive number indicates the number of events that may at most pass through this stage, stopping the input event stream immediately upon reaching this number. It is significant where you place this stage: if you place LIMIT 3 before a FILTER, then at most three inputs are presented to the filter, while placing it after the FILTER only stops once three inputs have passed the filter.

Variable bindings

Like in your favorite programming language you can bind a computed value to a name so that you can later refer to it, e.g. to reuse it in multiple places or to build up your final result in a nicely structured fashion.

LET <ident> := <expr>

An identifier starts with a lowercase letter followed by alphanumeric or underscore characters (as defined by Unicode). When referring to a variable by using its identifier, Actyx searches the preceding query stages going backwards from the point of reference and uses the first definition it can find. This means that it is possible to “shadow” a variable by defining it again later — note that this does not change the first binding in any way.

The AQL data model

Before discussing the expression language we need to lay the groundwork: this section describes the data types AQL works with. AQL is dynamically typed, meaning that each computed value does have exactly one type, but these types are not known before the computation is run. This type is not considered when reading the query, it is only checked during the evaluation of expressions. One noteworthy difference to Javascript is that AQL does not know subtyping, and it also doesn’t coerce values from one type to the other implicitly.

  • NULL is the single value of the unit type
  • TRUE and FALSE are both values of the boolean type
  • 42 and -12.34 are examples of the number type, which currently contains either a 64bit unsigned integer or a double-precision finite floating point number
  • 'hello', "world", or `x{1+2}y` are examples of the string type (with quoting rules and interpolation like for tags)
  • TIME(2021-08-13T07:45:03.418-06:00) is an example of the timestamp type
  • [1, 2, 3, 'a', "b", `c`] is an example of the array type
  • { one:1 two:2 } is an example of the object type

Simple expressions

Expressions are built up from literal values using a notation similar to the C family of languages. Whenever the description says that an error is generated, the evaluation of the whole expression stops without a result and you’ll get a diagnostic message in your query response.

  • ! <expr> or ¬ <expr> negates a boolean value.

  • <expr> & <expr>, <expr> | <expr>, and <expr> ~ <expr> compute the logical and, or, and xor of boolean values, respectively (you can also use , , and ).

  • Comparison operators >, >=, <, <=, =, != (or , , ) work between operands of the same type, i.e. comparing a number to a string yields an error.

  • Arithmetic operators +, -, *, / (with alternatives ×, ÷), % (mod), ^ (exponentiation) work between numbers, otherwise yield an error. Natural numbers (64bit integers) are converted to floating point when combined with floating point numbers. All operations yield an error upon overflow or underflow.

  • <expr1> ?? <expr2> evaluates to the result of expr1 if that is not an error, otherwise it evaluates to expr2.

  • Arrays are constructed with [<expr>, ...]. The contents of another array can be copied into a fresh array by using [spread] syntax [<expr1>, ...<expr2>, <expr3>], in which case expr2 must evaluate to an array, otherwise an error is raised.

  • Objects are constructed with {<key>: <expr>, ...}, where the comma separators are optional; each <key> can be either a bare word (in which case it must start with a lowercase letter, followed by letters, numbers, or underscores) or a pair of brackets containing either a natural number, a string, or an expression.

    Valid examples are {asdf: 42}, {[12]: "hello }, {['PascalCase']: TRUE}, {[1 + 1]: 2}.

  • Values are suffixed by an index to dig into arrays or objects, yielding an error if the value is of the wrong type or lacks the desired property; indexes follow the same rules as object keys, with the addition that a bare word is preceded by a dot, like in many object-oriented languages.

    Valid examples are x[0], y.my_property_42, z[2].prop['isDone']. If you want to index into computed sub-expressions, you need to enclose the expression in parentheses, e.g. (['a','b'])[0] (yields 'a').

  • CASE <expr> => <expr> CASE ... ENDCASE allows conditional evaluation; all case clauses are tried one by one until an <expr> yields TRUE, in which case the corresponding second <expr> is used to compute the result. An error is yielded if no case matches.

    This means that FILTER <expr> has the same behavior as SELECT CASE <expr> => _ ENDCASE.

Precedence of the binary operators in increasing order is: or, xor, and, equality, ordering, additive, multiplicative, exponential. Indexing binds more strongly than negation.

[subQuery] Sub-Queries

Wherever you can write a simple value (as per the data model) you can also write a complete query FROM … END. The result of such a sub-query is always an array that contains one item per result returned from the query. If you use a sub-query in array building context (i.e. when constructing an array or for multi-emission) then you can use [spread] syntax to pass on the individual results instead of a single array:

SELECT 42, ...FROM … END, 'the end'

Evaluation context

Each expression is evaluated as part of a processing step when applying this step to one particular input value. This value is available within the expression under the name _.

FROM 'myTag'
LET time := TIME(_)
FILTER _.type = "started"
SELECT { user: _.user_id time: time }

In this example the filter stage checks each incoming event for a type property with string value “started”. All matching events are passed on to the transformation step that extracts the value of the “user_id” property from the current event. The result is packed into the final object together with the event’s timestamp, which has been bound to the variable time in the first processing step.

Query Errors

Error causes

Errors can happen during the execution of an AQL query, common errors may be caused by:

Undefined property of an event payload is accessed

This error commonly occurs because of an incorrect assumption on the shape of an event payload.

FILTER _.count > 0 --- when the 'count' property does not exist in the event payload, this line will err
FILTER _['count'] > 0 --- is equivalent to the above line

As a safeguard for this error, a preceding filter can be added to determine if a property is defined. For example:

FILTER IsDefined(_.count)
FILTER _.count > 0
note

FILTER IsDefined(_.count) & _.count > 0 wouldn't work because boolean evaluations are not short circuited.

Unbound Variable is Accessed

This error is related to the variable binding feature. A common cause is a typo when referring to a previously declared variable binding.

FROM 'some-tag'
LET available := FROM ... END -- a variable contains a `subQuery`
SELECT avalable -- a typo, causing access to unbound variable `avalable`

Binary Operation Data Type Mismatch

This error arises when a comparative binary operation is applied to two incomparable values.

For example, the AQL below will raise an error because a number (1) is incomparable to a string ("1").

FILTER 1 > "1"

Aggregated Data Type Mismatch

Some aggregation expressions have a type constraint. For example, MIN, MAX, SUM, and PRODUCT only works on boolean and numbers. Assigning a data outside the type constraint will yield an error.

AGGREGATE SUM(_.some_string) -- _.some_string is not boolean or numbers; This line will yield error.

An error also arise when an aggregation receives a sequence of incomparable values.

For example, supplying a boolean and a number into a single MIN aggregation causes an error because a MIN aggregation cannot compare a boolean with a number.

AGGREGATE SUM(_.some_number_or_bool) -- May error if in one event the property contains a number and in another a boolean

However, type mismatch error is specific to the aggregation expression. FIRST AND LAST, for example, have no constraints and can compare any sequence of types passed to it.

Aggregated Data is Not Found

An aggregation yields an error if there is no data supplied. For example, the AQL below has a filter that is always false:

FILTER some_variable_that_is_always_false
AGGREGATE FIRST(_)

The filter does not let any event pass. Because of it, the aggregate is not supplied with any value and thus fails.

Error in a Sub-query

There is a minor different in how an error behaves in a sub-query compared to one happening in the top-level query, namely:

  1. An error halts the entire sub-query; in contrast, an error in a top-level query is yielded as an entry.
  2. An error in a sub-query is propagated upwards to its parent query.

For example, the sub-query below is bound to a variable containing an aggregate. If the aggregate fails, the parent query will yield the sub-query's error.

LET subquery := FROM ... AGGREGATE LAST(_) ... END 
note

Sub-query is a beta feature; the behaviors above MAY be subject to change.

Catching an Error in a Sub-query

Sometimes an error in sub-query — such as possible aggregation missing values — is expected. In that case, the error can be caught using the ?? operator and the provided fallback value is returned instead.

For example, take a look into the the AQL below.

FROM 'station-opened'
LET last_departure := FROM `departure:{_.station_id}` AGGREGATE LAST(_) END ?? NULL
FILTER last_departure != NULL

The application executes the AQL to list "ONLY station-opened events that has at least one corresponding last_departure event". By design, not every station-opened event has a corresponding last_departure event. Consequently, the AGGREGATE LAST(_) expression may fail by design. The ?? NULL prevents upward propagation of errors from the sub-query and, in its place, assigns a NULL as a fallback value to the last_departure variable; this covers the path where a station-opened event has no corresponding last_departure event.

PRAGMAS

Like many other languages, AQL supports a generic pragma mechanism. Pragmas can only be given at the beginning of the query and consist of a name and a value.

PRAGMA x := the value -- not a comment
PRAGMA y
the
multiline
value
ENDPRAGMA

In this example the value of pragma x is the value -- not a comment, i.e. the value extends up to the line’s end. The value of pragma y consists of three lines, separated by the two line separators given literally in the query.

Below you can find a list of pragmas currently available.

PRAGMA features

The value of this pragma is split on whitespace into words to obtain the list of enabled features for this query.

PRAGMA events

When testing AQL queries you will often need to control the precise contents of the Actyx event store so that you get reliable results. This can be done by populating a fresh topic by hand or using ax events restore, but it can also be done inside a query. The advantage of the latter is that only this current query sees the synthetic event store, other queries using the same Actyx node at the same time are not affected. The syntax of the value is newline-delimited JSON:

PRAGMA events
{"timestamp":<number>,"time":<string>,"tags":[<string>,...],"appId":<string>,"payload":<value>}
...
ENDPRAGMA

All properties apart from payload are optional, timestamp has higher priority than time (which has the same syntax as TIME() values shown above). Note how this lets you test code that depends on specific timestamps.

Optimizing Query Performance

AQL execution, just like any other computation system, is not magical. A command is bound to perform better than the other. However, figuring out a command's performance in a domain-specific language such as AQL is not as obvious as that of a general-purpose language such as C++, JavaScript, Rust, etc.

This guide provides insight into how AQL queries affect performance without delving too much into the internal details.

Prioritize Filter by Tags

There are two ways to filter events in Actyx:

  • FROM: filters by tags
  • FILTER: filters by content

As a rule of thumb, the filter should be written in a way that as many events as possible are filtered out with tags before those are filtered out with contents for the best performance. The reason is that tag expressions are evaluated solely on the event stream indexes while filtering based on content requires the actual event data to be read, decrypted, decompressed, and decoded.

Examine the example below: querying purchase order events emitted by a certain user in a certain price range.

FROM 'purchase-order'
FILTER _.issuedBy = 'my_id' & _.price >= 1000 & _.price < 2000

Imagine that at the time of the query, FROM 'purchase-order' yields 10000 events, only 300 of which meet the issuedBy criterion. The fact that the issuedBy property is a string to be compared using the = operator makes this an opportunity for optimization. To do it, publish the event with an accompanying tag purchase-order:issued-by:[some_id]; this allows querying using the newly defined tag instead of using the filter directive.

FROM 'purchase-order' & 'purchase-order:issued-by:my_id'
FILTER _.price >= 1000 & _.price < 2000

The new AQL above yields the same result, but the decoding before the FILTER line only occurs to the 300 events instead of the original 10000.

The Effect of the Events' Expected Lifetime to the Migration Strategy

Improvements of this kind require changes in the published tags and the AQL. Changing the AQL may prevent the new version of the application code to read the old events that are not affected by the new accompanying tag.

If the events in question are short-lived (e.g. 1 day at max), new tag can be added first. Later, after all, old events are expected to be discarded, the new AQL can be rolled out.

If the events in question are intended to be persisted long-term, the application needs to be able to query both the old events and the new events. Therefore, retaining the old code is necessary. Appending a version number in the tag (e.g. an event is simultaneously tagged with 'purchase-order' and 'purchase-order:v1') may be necessary, therefore the old query does not accidentally query the new events, and vice versa.

Discard The Most Possible Events Before Sub-Queries

A Sub-Query is a nested AQL query that is executed for each event found by the surrounding AQL query.

FROM 'purchase-order' & 'purchase-order:issued-by:some_id'
LET revocations := FROM `purchase-order:{_.id}` FILTER _.type = 'revoked' END
LET revoked := IsDefined(revocations[0])
FILTER _.price >= 1000 & _.price < 2000 & !revoked

The line "LET revocations ..." invokes the sub-query as many as times as the number of events found in the FROM line. Optimization can be done by splitting the FILTER into two parts.

FROM 'purchase-order' & 'purchase-order:issued-by:some_id'
FILTER _.price >= 1000 & _.price < 2000
LET revocations := FROM `purchase-order:{_.id}` FILTER _.type = 'revoked' END
LET revoked := IsDefined(revocations[0])
FILTER !revoked

With the above AQL, the LET revocation ... line will see a reduced number of events, and consequently perform a reduced number of sub-queries, too.