AQL Primer
Actyx’s main function is to persist streams of events on one node and access them on all nodes of a swarm. The Actyx Query Language (AQL) lets you formulate which events you want to see and what information you want to extract from them.
In the following we use the Node Manager’s QUERY pane to try out some code. Note that you’ll need at least Actyx version 2.13.0 and Node Manager 2.8.0. We prime a per-query ephemeral event store by using the following preamble atop all queries shown below (this is a testing feature, see reference):
PRAGMA features := limit interpolation subQuery aggregate binding
PRAGMA events
{"time":"2020-03-12T16:33:05Z","tags":["item","item:someUUID1","itemCreated"],"payload":{"id":"someUUID1","text":"TODO 1","type":"created"}}
{"time":"2022-03-12T16:33:15Z","tags":["item","item:someUUID2","itemCreated"],"payload":{"id":"someUUID2","text":"TODO 2","type":"created"}}
{"time":"2022-03-12T16:34:14Z","tags":["item","item:someUUID3","itemCreated"],"payload":{"id":"someUUID3","text":"TODO 3","type":"created"}}
{"time":"2022-03-12T16:48:05Z","tags":["item","item:someUUID3"],"payload":{"id":"someUUID3","text":"TODO 1","type":"textChanged"}}
{"time":"2022-03-12T16:53:15Z","tags":["item","item:someUUID4","itemCreated"],"payload":{"id":"someUUID4","text":"TODO 1","type":"created"}}
{"time":"2022-03-12T16:54:14Z","tags":["item","item:someUUID4"],"payload":{"id":"someUUID4","text":"TODO 4","type":"textChanged"}}
ENDPRAGMA
This dump contains an event history from a TODO list app, with events like “item created” or “item text changed”.
Each TODO item has a unique ID and all events pertaining to that item are tagged with item:<id>
.
Events of type “item created” are additionally tagged with itemCreated
.
Getting an overview
As a first exploration, we obtain a list of all items every created:
…
ENDPRAGMA
FROM 'itemCreated'
This simple query selects all events tagged with itemCreated
and returns them verbatim.
The list looks like this:
rgepwerk-0 6123446 2022-03-12T16:33:05Z {"id":"someUUID1","text":"TODO 1","type":"created"}
rgepwerk-0 6123452 2022-03-12T16:33:15Z {"id":"someUUID2","text":"TODO 2","type":"created"}
rgepwerk-0 6123488 2022-03-12T16:34:14Z {"id":"someUUID3","text":"TODO 3","type":"created"}
...
The first column shows the stream ID, i.e. on which node the event was written and in which stream it has been placed (the node ID is actually much longer and will be different for you, but like git hashes it usually suffices to show the first few characters). The second and third column show the logical and physical timestamp, respectively. The fourth column is the most interesting one as it contains the event payload data in JSON form.
If we are only interested in getting a list of item IDs we can use AQL to extract that part.
-- this is a comment to the end of the line
FROM 'itemCreated' -- we could also have written SELECT here, AQL ignores whitespace
SELECT _.id
The result of this query looks like this:
rgepwerk-0 6123446 2022-03-12T16:33:05Z "someUUID1"
rgepwerk-0 6123452 2022-03-12T16:33:15Z "someUUID2"
rgepwerk-0 6123488 2022-03-12T16:34:14Z "someUUID3"
...
Each event that is generated by the FROM
clause will be fed to the SELECT
step, one by one.
The expression given after SELECT
is then evaluated, with _
being replaced by the current event.
You can use syntax for working with JSON data like you know from Javascript or similar languages, which is how .id
extracts the "id"
property from the objects we saw in the very first query.
Filtering events
If we wanted to find the item ID for the item with text “TODO 1” we could try the following:
FROM "itemCreated" -- strings can be given in single or double quotes, AQL doesn’t care
FILTER _.text = 'TODO 1'
SELECT { id: _.id, type: _.type } -- we can construct result objects as well
This will return the IDs of all items that were created with the desired text.
Each “item created” event first enters the FILTER
step, and those where the given expression evaluates to TRUE
are handed also to the SELECT
step.
If you access a non-existing property (like _.wat
) or perform an otherwise invalid operation (like comparing a string and a number), then the result is nothing at all.
This means that the filter test would not pass, because nothing at all is not the same as TRUE
.
Similarly, if you SELECT _.whatever
you won’t see any results, but you’ll get diagnostic hints.
See also the ??
operator
But the above query does not correctly answer our question: what about items whose text was changed to “TODO 1” after their creation?
In this case, we need to cast a wider net and select from the item
tag that all item-related events have.
When you try this out you’ll see multiple matching events and item IDs — this is one of the powers of Actyx in that it can provide the full history of the system.
If we are only interested in the most recent appearances of “TODO 1”, we change the query like so:
FROM 'item' ORDER DESC
FILTER _.text = 'TODO 1'
LIMIT 3
The result looks like the following:
rgepwerk-0 6124076 2022-03-12T16:53:15Z {"id":"someUUID4","text":"TODO 1","type":"created"}
rgepwerk-0 6123972 2022-03-12T16:48:05Z {"id":"someUUID3","text":"TODO 1","type":"textChanged"}
rgepwerk-0 6123972 2022-03-12T16:33:05Z {"id":"someUUID1","text":"TODO 1","type":"created"}
The first thing to notice is that results are delivered with decreasing timestamps, which we caused by adding the ORDER DESC
clause — this can only be done before the first processing step.
The FILTER
step still does its duty as before, and it would have delivered more events (we remember that the first created
event should also have matched).
But we placed a LIMIT
step after it that only permits the first three results that pass through it.
Had we placed the LIMIT
before the FILTER
, we would have gotten less results because among the first three events (in descending order) there are mostly events with other TODO item texts.
Enriching events
In the previous query we found that the most recent mention of “TODO 1” was on the item with ID someUUID4
.
Now we step up our game and extract the most recently set item text for each TODO item.
The process for this is to first identify all items and then query Actyx for the desired information.
FROM 'itemCreated'
SELECT {
id: _.id,
texts:
FROM `item:{_.id}`
SELECT _.text -- this discards all events that don’t have this property
}
The result looks like this:
rgepwerk-0 6123446 2022-03-12T16:33:05Z {"id":"someUUID1","texts":["TODO 1"]}
rgepwerk-0 6123452 2022-03-12T16:33:15Z {"id":"someUUID2","texts":["TODO 2"]}
rgepwerk-0 6123488 2022-03-12T16:34:14Z - rgepwerk-0 6123972 2022-03-12T16:48:05Z ↩︎
{"id":"someUUID3","texts":["TODO 3","TODO 1"]}
...
The first thing to notice is that the third line lists multiple timestamps, more precisely it lists a range of them.
This is due to the fact that the item text for the third TODO item has been changed after creation, as we can see from the JSON value:
first it was created as “TODO 3” and then the text was changed to “TODO 1”.
This information was found by the sub-query FROM 'item:someUUID3' ...
that is dynamically constructed in the definition of the texts
property.
Since sub-queries are used to obtain more information about a given subject, their query is usually not hard-coded; the tags with which we want to select events depend on what we are looking for.
Here, we make use of the string interpolation feature of AQL to construct the desired tag name.
“Backticks” enable this mechanism, where everything apart from {...}
is taken as verbatim, and you can add dynamic information by placing expressions inside the braces.
The _
within a FROM
clause refers to the current input in the surrounding context, just like when extracting the item ID for the id
property.
Within the following SELECT
stage, _
refers to the current event in the sub-query.
Another noteworthy observation is that the first two results only have a single timestamp.
This is because the inputs from which the texts
property is computed yielded only a single value each, so we don’t mix information from multiple events.
Aggregating data
The above query did not fully answer our question: instead of listing the most recently set item text we get all item texts ever set for each item.
We can fix this by retrieving only the latest matching event in the sub-query.
As an exercise you’re invited to try this out using the knowledge you have so far, namely using ORDER
and LIMIT
clauses.
We will step up our game another notch by using the more powerful AGGREGATE
step:
FROM 'itemCreated'
LET itemId := _.id -- this is just a convenience to refer to the ID by a variable name
SELECT {
id: itemId,
text:
FROM `item:{itemId}`
AGGREGATE {
latest: LAST(_.text),
changes: SUM(
CASE _.type = 'textChanged' => 1
CASE _.type = 'created' => 0
ENDCASE -- if neither case matches, don’t create a value to sum up
)
}
}
Here we sneaked in another feature of AQL expressions: if-then-else constructs are supported in a fashion similar to SQL, where each CASE
is tried until the first one matches.
But first things first: this is the result.
rgepwerk-0 6123446 2022-03-12T16:33:05Z {"id":"someUUID1","text":[{"latest":"TODO 1","changes":0}]}
rgepwerk-0 6123452 2022-03-12T16:33:15Z {"id":"someUUID2","text":[{"latest":"TODO 2","changes":0}]}
rgepwerk-0 6123488 2022-03-12T16:34:14Z - rgepwerk-0 6123972 2022-03-12T16:48:05Z ↩︎
{"id":"someUUID3","text":[{"latest":"TODO 1","changes":1}]}
As before, the result from the sub-query is wrapped in a array because sub-queries can have any number of results.
In all cases, AGGREGATE
computes a single result, which is an object containing both the last value encountered for the text
property on any event as well as the number of textChanged
events seen throughout the sub-query.
If you are curious — which you should be — then you may remove the second CASE
line to see what happens
(hint: if SUM
doesn’t get any input at all, it can’t produce an output, since it doesn’t know what type of value should have been summed up).
If we wanted to get rid of the array wrapping and write the query down a bit more nicely, we could do the following:
FROM 'itemCreated'
LET itemId := _.id
LET latest :=
FROM `item:{itemId}`
AGGREGATE {
text: LAST(_.text),
changes: SUM(
CASE _.type = 'textChanged' => 1
CASE _.type = 'created' => 0
ENDCASE -- if neither case matches, don’t create a value to sum up
)
}
END -- this is needed here because otherwise the next LET would be applied to the sub-query
LET latest := latest[0] -- only the latest definition of a variable is seen below
SELECT {
id: itemId,
text: latest.text,
changes: latest.changes,
}
Summary
This concludes our AQL tutorial. The key points to take away are:
- queries start by selecting events with
FROM
- sub-queries can appear wherever you need to compute a value
- after that you can apply transformation, filtering, or aggregation steps
- each step gets inputs from above and passes its computed outputs to the next step below
- if no value is computed, nothing is passed on but you get a diagnostic message
- you can bind interesting values to variable names for ease of reference
- expression syntax is mostly similar to Javascript, but if-then-else is done with
CASE ... ENDCASE
For all the gory details please refer to the language reference.