Skip to main content

Performance and limits

This conceptual guide explains the limitations of Actyx in real-world factory solutions with regard to:

Latency#

Event communication latencies are extremely hard to predict. Hundreds of things play into this, from the hardware of your edge devices to a forklift passing in front of a wireless access point. The following statements hold under typical conditions most of the time:

  • event delivery latency is below 200 ms unless you are pushing performance limits (see below)
  • in general, Actyx has lower latency volatility than most centralized systems

Keep in mind that Actyx does not offer deterministic real-time services; you should use a PLC for such cases.

Get in touch

As a developer you can always build apps that will bring any system to its knees. If you follow best-practices you should not face any issues. If you do, please get in touch with us and the Actyx community—we love optimizing!

Max. number of nodes#

The amount of devices you can reliably run in a single Actyx swarm depends on a large number of factors, including the local-area network setup, the devices themselves, and the apps running on them. With typical rugged tablets (CPU from around 2013) or other devices with relatively low computing power (e.g. Raspberry Pi 3), 100 devices should by themselves not pose a problem. If you start pushing performance limits (see below) then this number can be significantly lower, depending on the event data rate and the computational complexity of your business logic.

In some cases it may be possible to split an overall system into several smaller ones that do not directly interact via event streams. You can use this approach in order to serve a larger factory while still staying within the 100 devices limit.

You need to connect thousands of edge devices?

Please get in touch with us and the Actyx community: we would love to hear about your use-case and figure out how to make it work with Actyx.

Required disk space of devices#

Actyx is a completely decentralized system dependent on the disk space of every single edge device. At runtime different types of data are generated and stored throughout the Actyx swarm. Current versions of Actyx store every event on all devices of the same swarm; in the future we will add the possibility to configure the replication as well as the lifespan of events. This will allow you to configure when events should be deleted and on which devices events should be replicated.

These are the currently implemented behaviors:

TypeSizeLifespanReplication
Eventmax. 4KB per eventEvents have infinite lifespan within the swarm.Events are replicated on all devices.
Logmax. 4KB per logLogs are cleared automatically when disk space is needed.Logs are not replicated.

The listed maximal sizes are recommendations — neither events nor logs should be used to carry huge pieces of data (although larger events may be justified in some circumstances, e.g. defining a production process with many steps, where such definitions happen infrequently).

Until we make event retention and replication configurable, the point at which you run out of disk space depends on the size, number, and compressibility of your events. This is always dependent on how you define events in your app(s), but here are two examples based on apps that are running at our customers:

With machine integrationsWithout machine integrations
Number of nodes810
Operating time1 year1 year
Number of events10,000,000580,000
Uncompressed size of events4500 MB1200 MB
Compression factor0.030.03
Disk space needed135 MB36MB

Currently, the only solution to running out of disk space because of events is either clearing events from your swarm (effectively creating a new swarm and starting over) or increasing the disk space of your edge devices. We are already working on the functionality to configure replication and lifespan of events.

Your solution will produce 100s of millions of events?

Please get in touch with us and the Actyx community: we would love to hear about your use-case and figure out how to make it work with Actyx.

Performance#

The limits given above are formulated under the assumption that the processing of events by business logic does not use significant resources. While there are vast differences between languages, runtimes, CPU architectures, and your choices of business logic algorithms and data structures, this section gives some guidance based on our experience.

Application and Fish startup time#

Bringing a Fish up to date for the first time in an application run (we say “hydrating” or “waking up” the Fish) requires a starting point and the application of all following events. If a Fish requires a large number of events for starting up then it will take a correspondingly long time until the first state is available and the first command can be processed — the startup time is proportional to the number of events selected by the subscription (see below for the effect of snapshots).

Our recommendation is to structure your fishes such that each requires less than 1,000 events. This means that fishes are best used to model process workflows (which rarely have more than 1,000 steps). Fishes are not ideal for accumulating statistical data over long time periods; a better solution for that use-case is to export the events into a time-series database.

The effect of semantic snapshots#

Semantic snapshots effectively truncate the event history: everything before the recognized snapshot event is ignored. This still means that the snapshot event needs to be found (by searching backwards from the current end of the event log) and that all younger events need to be applied.

Semantic snapshots help a lot if in typical cases:

  • the previous snapshot event lies less than a month in the past
  • since the previous snapshot event the Fish in question only subscribes to less than 1,000 events

As such, semantic snapshots can make a Fish viable even though it violates the recommendation in the previous section.

Latency implications of large number of peers#

With current Actyx versions the internal event publication latency is proportional to the number of connected peers. The effect of this depends very much on the connectivity: for example 20 connected peers on a cloud network can see an internal latency of more than a second (e.g. time until the call to publish resolves).

We are working on removing this bottleneck in a future release.

Pathological time travel performance when reconnecting a device that was offline for a long time#

When reconnecting a device that was disconnected for a long time period, this device will need to ingest all new events from all other devices in the swarm to become up-to-date. If a Fish is running during this process of catching up with the swarm, then that Fish will experience a lot more time travel than it would if it were woken up after Actyx is up-to-date. Actyx optimizes event delivery by doing it in chunks, nevertheless in pathological cases the amount of time travel is not linear in the number of such chunks but quadratic.

As an example: consider 10 devices to catch up with, each contributing 10 chunks of events for your Fish. The optimal scenario would be to first catch up and then insert all 100 chunks into the event history, then perform one time travel. The worst-case scenario would be that every chunk causes a time travel, with each subsequent time travel being more costly than the previous (due to added events). Thereby, the Fish would process some events potentially a hundred times, more likely ten times (since chunks are ordered per device they come from).

To avoid this issue, we recommend using Pond.waitForSwarmSync() or Actyx.waitForSync() to delay app startup until Actyx is up-to-date with the rest of the swarm.

Illustrative Examples#

Machine Connectors#

Machine connectors that only make machine state and counters available in the Actyx swarm can be made without running into the above-mentioned issues. The best way to do so is to emit all state and counter updates as events — typically a complete set is emitted every time. You can then use Actyx.observeLatest to keep track of the latest known machine state.

If for some reason you need to write partial updates (e.g. only a subset of counters) in some events, you can use a Fish to construct a full state from the partial updates. It is recommended to still emit the complete state from time to time. The Fish can treat that as a semantic snapshot:
This way, starting up the machine connector’s related Fishes will always be quick.

The other performance consideration for machine connectors is that they produce data at a steady rate that is often higher than the data rates produced by human operators. Therefore you need to plan the available disk space according to update frequency times event size (compressed) for at least one year; we will release a future Actyx version with ephemeral event streams which will give you the possibility to place a fixed limit on disk usage and thus solve this issue permanently.

Production Data Acquisition#

This use-case includes the tracking of state as well as results (finished goods, waste) of production orders. Each of these orders has a finite lifespan, it becomes irrelevant typically after a few weeks. The most challenging part of writing such an app is to construct a process registry fish such that its state doesn’t grow too big — and this fish will typically need to make use of state snapshots because it is long-lived and consumes some events from each production order.

We know from project experience that implementing production data acquisition with 50 workstations and 50 machine connectors (see above) works well if you keep the above recommendations in mind.

Data Exporters#

Exporting events or derived information into an ERP or BI system can be done using fishes if the overall workload is “human-scale”: we know from project experience that it works well for booking acquired production data into an ERP system.

Exporting machine or sensor data into a time-series database for further analysis outside Actyx is best done using the HTTP API and writing the exporter in a language that is well-suited for connecting to your database and handling the required amount of streaming data. We know from project experience that this can be done with native code (like Rust, C++) for demanding applications that write many million data points per day, e.g. into PostgreSQL with the timescale extension.