Actyx Query Language (AQL)
The Actyx Query Language (AQL) allows you to select precisely which information you want to extract from the query
or subscribe
endpoints of the Events API.
In its most basic form it selects events based on their tags.
This section gives you all the gory details of how that works and what else you can express.
If you find Actyx behavior differing from the description below, then you have found a bug. Please tell us about it!
If you discover some behavior that is not documented below, then you have wandered into unspecified territory — the observed behavior may change in future releases. In these cases we very much welcome your questions, comments, and suggestions in the forum!
General structure
AQL is not whitespace sensitive, you can add any amount of whitespace or comments in almost all places (the exceptions are documented below).
Comments begin with --
and extend to the end of the line.
The overall structure of a query is the following:
PRAGMA features := some features
-- the above is optional
FROM 'mytag1' & 'mytag2' -- the only mandatory part
… -- optional list of transformations
END -- optional
Language features
The language is growing and will continue to do so. Our intention is to keep it backward compatible for as long as we can so that you can keep running your apps also on future Actyx versions of the 2.x series. Since we need to try out extensions and get feedback on them before committing to them, new features start out in alpha or beta status and may then graduate to fully released at a later time.
Alpha features are not documented, are expected to change, and are only revealed to ask for specific feedback on the functionality. If you have questions or ideas, you are always welcome to let us know, perhaps it is easy to add or already on its way.
Beta features are documented and are expected to be more stable, although we reserve the right to change them at any time even during the Actyx 2.x series. You are very welcome to try them out and discuss about them, for example in the forum.
If your query uses a feature that is not yet released, you’ll have to enable this feature using the PRAGMA features := <words>
syntax.
Each feature’s name is a single word and you can put any number of words after the equals sign, separated by whitespace.
The names of required features are shown below in the form [featureName].
PRAGMA features := interpolation subQuery aggregate fromArray multiEmission
In Actyx versions before 2.13.0 features are enabled using a different syntax:
FEATURES(some features)
FROM …
Event queries
The FROM <tag_expr>
part of an AQL query selects the events from which the results shall be computed.
Hereby, <tag_expr>
is a boolean expression composed from the following basic atoms:
'mytag'
or"mytag"
matches all events that carry this tag. Tags are arbitrary non-empty Unicode strings. Quoting is only needed for the used delimiter: if your tag is enclosed in single quotes, single quotes within the tag need to be repeated, e.g.'o''clock'
. The analogue goes for double quotes. Tags containing IDs that where created using an Actyx SDK can be queried likemytag:myid
.isLocal
matches all events that were emitted by the local Actyx node.allEvents
matches all events.TIME > <time>
matches all events whose timestamp is greater than the given one. Timestamps can be given in UTC using the suffix+00:00
orZ
, or they can use a numeric time zone offset (like+02:00
for CEST). Valid formats are2021-07-20T09:53:07.462Z
, or with microseconds, just seconds, or just the date. Omitted components are treated as zero, so2021-07-20+04:30
marks the beginning of that day in Afghanistan.The comparison can analogously be done with
TIME >= <time>
(or≥
),TIME < <time>
, orTIME <= <time>
(or≤
).KEY > <event ID>
matches all events whose event ID is greater then the provided one. An event ID consists of<lamport timestamp>/<node ID>-<stream nr>
and has the same sort order as the corresponding event. You may specify only the lamport timestamp, in which case node ID and stream number are treated as zero.The comparison can analogously be done with
KEY >= <event ID>
(or≥
),KEY < <event ID>
, orKEY <= <event ID>
(or≤
).appId(<app ID>)
matches all events from the given app ID. No whitespace is allowed between the parentheses. App IDs are valid DNS names, i.e. name components consist of lowercase letters, digits, or dashes and are separated by a single dot. A valid example isappId(com.example.my-app)
.As a special case
appId(me)
refers to the app ID of the current app, i.e. the app ID in the manifest with which the Actyx connection was made.
Larger expressions are constructed using the and and or combinators:
<tag_expr> & <tag_expr>
matches all events that match both the left and the right condition<tag_expr> | <tag_expr>
matches all events that match at least one of the given conditions
As usual, &
takes precedence over |
. You can use parentheses to override this: 'a' & ('b' | 'c')
is the same as 'a' & 'b' | 'a' & 'c'
.
Controlling the order of events
When using the query
endpoint (as opposed to subscribe
and subscribeMonotonic
), the events matching the tag expression are delivered in ascending event key order unless indicated otherwise.
One way to indicate this is the order
property when making the API call, another can be implied by the AGGREGATE
keyword (see below).
All these can be overridden by appending an ORDER
clause directly after the tag expression:
FROM 'myTag' ORDER DESC -- the other direction is ASC
…
[interpolation] String interpolation
The tags given in a tag expression can also be computed using string interpolation; this is done most commonly in sub-queries.
Interpolation is enabled by enclosing the tag in backticks instead of single or double quotes.
Within the string use {<expr>}
to insert the string representation of the result of expr
at this location.
FROM `machine:{_.machineId}` …
If you want to insert a literal backtick or opening brace, you’ll need to emit it via an expression:
`x{'`'}y` -- yields the string 'x`y'
[fromArray] Iterating over arrays
Besides streaming events from the event store, a (sub)query can also start from inputs given as an array:
FROM ['a', 1, ...someArray] …
This will feed the items in their given order to the following transformation stages.
While events have an event key, logical & physical timestamps, and tags the values 'a'
and 1
in the above example have no such metadata.
If someArray
is a sub-query expression then the sub-query’s results are passed to the following transformation stages one by one, i.e. without first merging their metadata.
Data transformations
Following the FROM <tag_expr>
clause you may optionally specify a sequence of transformation steps.
The first step will receive the events selected by the event query as input — bound to the variable _
— and compute outputs from them.
Each of the outputs is fed into the following step — again bound to the variable _
— where the same principle applies, etc.
This allows you to write down the transformation from events into query results in an incremental fashion, doing one step at a time.
The processing described above is exactly how Actyx evaluates your query. This means that the order in which you write the steps is significant and affects the performance of your query.
Discarding inputs with FILTER
Whenever an event query is not specific enough, e.g. because not all relevant properties are available as tags, you can filter out undesirable results within Actyx. This makes processing more efficient since it avoids serialization and deserialization plus the filtering in your application code. The syntax for a filter step is
FILTER <simple_expr>
If the provided expression (see below) evaluates to the boolean value TRUE
, then the input is passed along as output.
Otherwise the input is discarded.
Transforming a single input with SELECT
The events originating in your FROM <tag_expr>
clause may contain more information than you need or they may not have the format you desire, e.g. different property names.
In these cases, you can use a transformation step that takes a single input and computes at most one output from it using the simple expression language shown further below.
The syntax for a transformation step is
SELECT <simple_expr>
Whatever the given expression (see below) evaluates to will be passed on as an output. However, it is possible that no value is computed, for example when accessing non-existent properties in the input. In such cases, no output is generated.
[multiEmission] Emitting multiple results
If your input contains the information for multiple desired query results (e.g. you want all usernames involved and the event contains both an author and a reviewer) then you can turn one input into multiple outputs.
SELECT <expr1>, <expr2>, …
This can be combined with the [spread] feature to emit the elements of an array as individual outputs:
SELECT ...<array>
[aggregate] Aggregating inputs
Perhaps the most powerful feature of AQL is the ability to condense a large number of inputs from many events into one output object before transferring this result to your application.
The syntax is similar to SELECT
, with some special operators.
AGGREGATE <aggregate_expr>
The given expression can only refer to variables (including the current input _
) inside one of the the following operators:
LAST(<expr>)
yields the expression result with the latest event keyFIRST(<expr>)
yields the expression result with the earliest event keyMIN(<expr>)
yields the minimum value computed by the expression (works only for boolean and numbers)MAX(<expr>)
yields the maximum value computed by the expression (works only for boolean and numbers)SUM(<expr>)
yields the sum of all expression results (works only for boolean and numbers)PRODUCT(<expr>)
yields the product of all expression results (works only for boolean and numbers)
The results yielded by these operators are then assembled into the final result using the rules for simple expressions.
Usage of only LAST
operators indicates that descending event key order is desired; in this case processing will stop immediately when the first value is found.
Usage of only FIRST
works analogously.
You can use AGGREGATE LAST(...)
to efficiently retrieve the latest event about something.
Discarding excess inputs
If you need only the three first events for some query you should use the LIMIT
clause:
LIMIT <number>
The given positive number indicates the number of events that may at most pass through this stage, stopping the input event stream immediately upon reaching this number.
It is significant where you place this stage: if you place LIMIT 3
before a FILTER
, then at most three inputs are presented to the filter, while placing it after the FILTER
only stops once three inputs have passed the filter.
Variable bindings
Like in your favorite programming language you can bind a computed value to a name so that you can later refer to it, e.g. to reuse it in multiple places or to build up your final result in a nicely structured fashion.
LET <ident> := <expr>
An identifier starts with a lowercase letter followed by alphanumeric or underscore characters (as defined by Unicode). When referring to a variable by using its identifier, Actyx searches the preceding query stages going backwards from the point of reference and uses the first definition it can find. This means that it is possible to “shadow” a variable by defining it again later — note that this does not change the first binding in any way.
The AQL data model
Before discussing the expression language we need to lay the groundwork: this section describes the data types AQL works with. AQL is dynamically typed, meaning that each computed value does have exactly one type, but these types are not known before the computation is run. This type is not considered when reading the query, it is only checked during the evaluation of expressions. One noteworthy difference to Javascript is that AQL does not know subtyping, and it also doesn’t coerce values from one type to the other implicitly.
NULL
is the single value of the unit typeTRUE
andFALSE
are both values of the boolean type42
and-12.34
are examples of the number type, which currently contains either a 64bit unsigned integer or a double-precision finite floating point number'hello'
,"world"
, or`x{1+2}y`
are examples of the string type (with quoting rules and interpolation like for tags)TIME(2021-08-13T07:45:03.418-06:00)
is an example of the timestamp type[1, 2, 3, 'a', "b", `c`]
is an example of the array type{ one:1 two:2 }
is an example of the object type
Simple expressions
Expressions are built up from literal values using a notation similar to the C family of languages. Whenever the description says that an error is generated, the evaluation of the whole expression stops without a result and you’ll get a diagnostic message in your query response.
! <expr>
or¬ <expr>
negates a boolean value.<expr> & <expr>
,<expr> | <expr>
, and<expr> ~ <expr>
compute the logical and, or, and xor of boolean values, respectively (you can also use∧
,∨
, and⊻
).Comparison operators
>
,>=
,<
,<=
,=
,!=
(or≥
,≤
,≠
) work between operands of the same type, i.e. comparing a number to a string yields an error.Arithmetic operators
+
,-
,*
,/
(with alternatives×
,÷
),%
(mod),^
(exponentiation) work between numbers, otherwise yield an error. Natural numbers (64bit integers) are converted to floating point when combined with floating point numbers. All operations yield an error upon overflow or underflow.<expr1> ?? <expr2>
evaluates to the result ofexpr1
if that is not an error, otherwise it evaluates toexpr2
.Arrays are constructed with
[<expr>, ...]
. The contents of another array can be copied into a fresh array by using [spread] syntax[<expr1>, ...<expr2>, <expr3>]
, in which caseexpr2
must evaluate to an array, otherwise an error is raised.Objects are constructed with
{<key>: <expr>, ...}
, where the comma separators are optional; each<key>
can be either a bare word (in which case it must start with a lowercase letter, followed by letters, numbers, or underscores) or a pair of brackets containing either a natural number, a string, or an expression.Valid examples are
{asdf: 42}
,{[12]: "hello }
,{['PascalCase']: TRUE}
,{[1 + 1]: 2}
.Values are suffixed by an index to dig into arrays or objects, yielding an error if the value is of the wrong type or lacks the desired property; indexes follow the same rules as object keys, with the addition that a bare word is preceded by a dot, like in many object-oriented languages.
Valid examples are
x[0]
,y.my_property_42
,z[2].prop['isDone']
. If you want to index into computed sub-expressions, you need to enclose the expression in parentheses, e.g.(['a','b'])[0]
(yields'a'
).CASE <expr> => <expr> CASE ... ENDCASE
allows conditional evaluation; all case clauses are tried one by one until an<expr>
yieldsTRUE
, in which case the corresponding second<expr>
is used to compute the result. An error is yielded if no case matches.This means that
FILTER <expr>
has the same behavior asSELECT CASE <expr> => _ ENDCASE
.
Precedence of the binary operators in increasing order is: or, xor, and, equality, ordering, additive, multiplicative, exponential. Indexing binds more strongly than negation.
[subQuery] Sub-Queries
Wherever you can write a simple value (as per the data model) you can also write a complete query FROM … END
.
The result of such a sub-query is always an array that contains one item per result returned from the query.
If you use a sub-query in array building context (i.e. when constructing an array or for multi-emission) then you can use [spread] syntax to pass on the individual results instead of a single array:
SELECT 42, ...FROM … END, 'the end'
Evaluation context
Each expression is evaluated as part of a processing step when applying this step to one particular input value.
This value is available within the expression under the name _
.
FROM 'myTag'
LET time := TIME(_)
FILTER _.type = "started"
SELECT { user: _.user_id time: time }
In this example the filter stage checks each incoming event for a type property with string value “started”.
All matching events are passed on to the transformation step that extracts the value of the “user_id” property from the current event.
The result is packed into the final object together with the event’s timestamp, which has been bound to the variable time
in the first processing step.
Query Errors
Error causes
Errors can happen during the execution of an AQL query, common errors may be caused by:
- Undefined property of an event payload is accessed
- Unbound variable is accessed
- Binary Operation data type mismatch
- Aggregated data type mismatch
- Aggregated data is not found
Undefined property of an event payload is accessed
This error commonly occurs because of an incorrect assumption on the shape of an event payload.
FILTER _.count > 0 --- when the 'count' property does not exist in the event payload, this line will err
FILTER _['count'] > 0 --- is equivalent to the above line
As a safeguard for this error, a preceding filter can be added to determine if a property is defined. For example:
FILTER IsDefined(_.count)
FILTER _.count > 0
FILTER IsDefined(_.count) & _.count > 0
wouldn't work because boolean evaluations are not short circuited.
Unbound Variable is Accessed
This error is related to the variable binding feature. A common cause is a typo when referring to a previously declared variable binding.
FROM 'some-tag'
LET available := FROM ... END -- a variable contains a `subQuery`
SELECT avalable -- a typo, causing access to unbound variable `avalable`
Binary Operation Data Type Mismatch
This error arises when a comparative binary operation is applied to two incomparable values.
For example, the AQL below will raise an error because a number (1) is incomparable to a string ("1").
FILTER 1 > "1"
Aggregated Data Type Mismatch
Some aggregation expressions have a type constraint.
For example, MIN
, MAX
, SUM
, and PRODUCT
only works on boolean and numbers.
Assigning a data outside the type constraint will yield an error.
AGGREGATE SUM(_.some_string) -- _.some_string is not boolean or numbers; This line will yield error.
An error also arise when an aggregation receives a sequence of incomparable values.
For example, supplying a boolean and a number into a single MIN
aggregation causes an error because a MIN
aggregation cannot compare a boolean with a number.
AGGREGATE SUM(_.some_number_or_bool) -- May error if in one event the property contains a number and in another a boolean
However, type mismatch error is specific to the aggregation expression. FIRST
AND LAST
, for example, have no constraints and can compare any sequence of types passed to it.
Aggregated Data is Not Found
An aggregation yields an error if there is no data supplied. For example, the AQL below has a filter that is always false:
FILTER some_variable_that_is_always_false
AGGREGATE FIRST(_)
The filter does not let any event pass. Because of it, the aggregate is not supplied with any value and thus fails.
Error in a Sub-query
There is a minor different in how an error behaves in a sub-query compared to one happening in the top-level query, namely:
- An error halts the entire sub-query; in contrast, an error in a top-level query is yielded as an entry.
- An error in a sub-query is propagated upwards to its parent query.
For example, the sub-query below is bound to a variable containing an aggregate. If the aggregate fails, the parent query will yield the sub-query's error.
LET subquery := FROM ... AGGREGATE LAST(_) ... END
Sub-query is a beta feature; the behaviors above MAY be subject to change.
Catching an Error in a Sub-query
Sometimes an error in sub-query — such as possible aggregation missing values — is expected.
In that case, the error can be caught using the ??
operator and the provided fallback value is returned instead.
For example, take a look into the the AQL below.
FROM 'station-opened'
LET last_departure := FROM `departure:{_.station_id}` AGGREGATE LAST(_) END ?? NULL
FILTER last_departure != NULL
The application executes the AQL to list "ONLY station-opened
events that has at least one corresponding last_departure
event".
By design, not every station-opened
event has a corresponding last_departure
event.
Consequently, the AGGREGATE LAST(_)
expression may fail by design.
The ?? NULL
prevents upward propagation of errors from the sub-query and, in its place, assigns a NULL
as a fallback value to the last_departure
variable;
this covers the path where a station-opened
event has no corresponding last_departure
event.
PRAGMAS
Like many other languages, AQL supports a generic pragma mechanism. Pragmas can only be given at the beginning of the query and consist of a name and a value.
PRAGMA x := the value -- not a comment
PRAGMA y
the
multiline
value
ENDPRAGMA
In this example the value of pragma x
is the value -- not a comment
, i.e. the value extends up to the line’s end.
The value of pragma y
consists of three lines, separated by the two line separators given literally in the query.
Below you can find a list of pragmas currently available.
PRAGMA features
The value of this pragma is split on whitespace into words to obtain the list of enabled features for this query.
PRAGMA events
When testing AQL queries you will often need to control the precise contents of the Actyx event store so that you get reliable results.
This can be done by populating a fresh topic by hand or using ax events restore
, but it can also be done inside a query.
The advantage of the latter is that only this current query sees the synthetic event store, other queries using the same Actyx node at the same time are not affected.
The syntax of the value is newline-delimited JSON:
PRAGMA events
{"timestamp":<number>,"time":<string>,"tags":[<string>,...],"appId":<string>,"payload":<value>}
...
ENDPRAGMA
All properties apart from payload
are optional, timestamp
has higher priority than time
(which has the same syntax as TIME()
values shown above).
Note how this lets you test code that depends on specific timestamps.
Optimizing Query Performance
AQL execution, just like any other computation system, is not magical. A command is bound to perform better than the other. However, figuring out a command's performance in a domain-specific language such as AQL is not as obvious as that of a general-purpose language such as C++, JavaScript, Rust, etc.
This guide provides insight into how AQL queries affect performance without delving too much into the internal details.
Prioritize Filter by Tags
There are two ways to filter events in Actyx:
FROM
: filters by tagsFILTER
: filters by content
As a rule of thumb, the filter should be written in a way that as many events as possible are filtered out with tags before those are filtered out with contents for the best performance. The reason is that tag expressions are evaluated solely on the event stream indexes while filtering based on content requires the actual event data to be read, decrypted, decompressed, and decoded.
Examine the example below: querying purchase order events emitted by a certain user in a certain price range.
FROM 'purchase-order'
FILTER _.issuedBy = 'my_id' & _.price >= 1000 & _.price < 2000
Imagine that at the time of the query, FROM 'purchase-order'
yields 10000 events, only 300 of which meet the issuedBy
criterion.
The fact that the issuedBy
property is a string to be compared using the =
operator makes this an opportunity for optimization.
To do it, publish the event with an accompanying tag purchase-order:issued-by:[some_id]
; this allows querying using the newly defined tag instead of using the filter directive.
FROM 'purchase-order' & 'purchase-order:issued-by:my_id'
FILTER _.price >= 1000 & _.price < 2000
The new AQL above yields the same result, but the decoding before the FILTER
line only occurs to the 300 events instead of the original 10000.
Improvements of this kind require changes in the published tags and the AQL. Changing the AQL may prevent the new version of the application code to read the old events that are not affected by the new accompanying tag.
If the events in question are short-lived (e.g. 1 day at max), new tag can be added first. Later, after all, old events are expected to be discarded, the new AQL can be rolled out.
If the events in question are intended to be persisted long-term, the application needs to be able to query both the old events and the new events. Therefore, retaining the old code is necessary. Appending a version number in the tag (e.g. an event is simultaneously tagged with 'purchase-order' and 'purchase-order:v1') may be necessary, therefore the old query does not accidentally query the new events, and vice versa.
Discard The Most Possible Events Before Sub-Queries
A Sub-Query is a nested AQL query that is executed for each event found by the surrounding AQL query.
FROM 'purchase-order' & 'purchase-order:issued-by:some_id'
LET revocations := FROM `purchase-order:{_.id}` FILTER _.type = 'revoked' END
LET revoked := IsDefined(revocations[0])
FILTER _.price >= 1000 & _.price < 2000 & !revoked
The line "LET revocations ..." invokes the sub-query as many as times as the number of events found in the FROM
line.
Optimization can be done by splitting the FILTER into two parts.
FROM 'purchase-order' & 'purchase-order:issued-by:some_id'
FILTER _.price >= 1000 & _.price < 2000
LET revocations := FROM `purchase-order:{_.id}` FILTER _.type = 'revoked' END
LET revoked := IsDefined(revocations[0])
FILTER !revoked
With the above AQL, the LET revocation ...
line will see a reduced number of events, and consequently perform a reduced number of sub-queries, too.