Skip to main content

Performance and limits

This conceptual guide explains the limitations of ActyxOS in real-world factory solutions with regard to:

Latency#

Event communication latencies are extremely hard to predict. Hundreds of things play into this, from the hardware of your edge devices to a forklift passing in front of a wireless access point. The following statements hold under typical conditions most of the time:

  • event delivery latency is below 200 ms unless you are pushing performance limits (see below)
  • in general, ActyxOS has lower latency volatility than most centralized systems

Keep in mind that ActyxOS does not offer deterministic real-time services; you should use a PLC for such cases.

Get in touch

As a developer you can always build apps that will bring any system to its knees. If you follow best-practices you should not face any issues. If you do, please get in touch with us and the Actyx community—we love optimizing!

Max. number of nodes#

The amount of devices you can reliably run in a single ActyxOS swarm depends on a large number of factors, including the local-area network setup, the devices themselves, and the apps running on them. With typical rugged tablets (CPU from around 2013) or other devices with relatively low computing power (e.g. Raspberry Pi 3), 100 devices should by themselves not pose a problem. If you start pushing performance limits (see below) then this number can be significantly lower, depending on the event data rate and the computational complexity of your business logic.

In some cases it may be possible to split an overall system into several smaller ones that do not directly interact via event streams. You can use this approach in order to serve a larger factory while still staying within the 100 devices limit.

You need to connect thousands of edge devices?

Please get in touch with us and the Actyx community: we would love to hear about your use-case and figure out how to make it work with ActyxOS.

Required disk space of devices#

ActyxOS is a completely decentralized system dependent on the disk space of every single edge device. At runtime different types of data are generated and stored throughout the ActyxOS swarm. Current versions of ActyxOS store every event on all devices of the same swarm; in the future we will add the possibility to configure the replication as well as the lifespan of events. This will allow you to configure when events should be deleted and on which devices events should be replicated.

These are the currently implemented behaviors:

TypeSizeLifespanReplication
Eventmax. 4KB per eventEvents have infinite lifespan within the swarm.Events are replicated on all devices.
Logmax. 4KB per logLogs are cleared automatically when disk space is needed.Logs are not replicated.

The listed maximal sizes are recommendations — neither events nor logs should be used to carry huge pieces of data (although larger events may be justified in some circumstances, e.g. defining a production process with many steps, where such definitions happen infrequently).

Until we make event retention and replication configurable, the point at which you run out of disk space depends on the size, number, and compressibility of your events. This is always dependent on how you define events in your app(s), but here are two examples based on apps that are running at our customers:

With machine integrationsWithout machine integrations
Number of nodes810
Operating time1 year1 year
Number of events10,000,000580,000
Uncompressed size of events4500 MB1200 MB
Compression factor0.030.03
Disk space needed135 MB36MB

Currently, the only solution to running out of disk space because of events is either clearing events from your swarm (effectively creating a new swarm and starting over) or increasing the disk space of your edge devices. We are already working on the functionality to configure replication and lifespan of events.

Your solution will produce 100s of millions of events?

Please get in touch with us and the Actyx community: we would love to hear about your use-case and figure out how to make it work with ActyxOS.

Performance#

The limits given above are formulated under the assumption that the processing of events by business logic does not use significant resources. While there are vast differences between languages, runtimes, CPU architectures, and your choices of business logic algorithms and data structures, this section gives some guidance based on our experience.

Application and Fish startup time#

Bringing a Fish up to date for the first time in an application run (we say “hydrating” or “waking up” the Fish) requires a starting point (which can be a semantic or state snapshot) and the application of all following events. If a Fish requires a large number of events for starting up then it will take a correspondingly long time until the first state is available and the first command can be processed — the startup time is proportional to the number of events selected by the subscription (see below for the effect of snapshots).

Our recommendation is to structure your fishes such that each requires less than 1'000 events. This means that fishes are best used to model process workflows (which rarely have more than 1'000 steps). Fishes are not ideal for accumulating statistical data over long time periods; a better solution for that use-case is to export the events into a time-series database.

tip

If you need to create long-running fishes that accumulate many events — like a registry for currently open work orders — then you can mitigate the resulting issues with state snapshots as described below.

The effect of semantic snapshots#

Semantics snapshots effectively truncate the event history: everything before the recognized snapshot event is ignored. This still means that the snapshot event needs to be found (by searching backwards from the current end of the event log) and that all younger events need to be applied.

Semantic snapshots help a lot if in typical cases:

  • the previous snapshot event lies less than a month in the past
  • since the previous snapshot event the Fish in question only subscribes to less than 1'000 events

As such, semantic snapshots can make a Fish viable even though it violates the recommendation in the previous section.

The effect of state snapshots#

State snapshots are written when the Fish’s event history is longer than 1'000 events and snapshots are enabled by providing a corresponding (de)serialization config. When starting a Fish, the most recent state snapshot is used, thereby limiting the number of events that need to be applied. This can help if the required events are large in number or dispersed over a long time range and many devices — which are both factors in requiring a longer time to find and deliver the events to the business logic. It can also help if the business logic is computationally expensive, by reusing previously computed results.

One important aspect to keep in mind is that these snapshots are local to each device. Therefore they don’t help when waking up a Fish for the first time on a given device — plowing through 10'000'000 events will take a long time even when the events can be found and delivered quickly.

Large application state#

State snapshots can be problematic when the business logic state is large. If for example a Fish holds data structures with a volume of 100MB in memory, then storing a snapshot requires serializing this live data structure to JSON (which may separately take more than 100MB) and then sending it to ActyxOS for safe keeping. Restoring a snapshot also requires both the serialized and the in-memory data structures to be live at the same time.

The above characteristics can cause problems especially in browser apps, where such memory usage spikes may lead to the app getting killed by the Javascript runtime. Actual limits depend both on the runtime environment (which version of which browser) and on the available system memory, which in turn is handled differently between desktop PCs and tablet computers.

We recommend to keep per Fish memory usage below 10MB.

You are seeing much higher memory usage than expected?

Keep in mind that starting with Pond version 2.1 you can directly modify the state parameter to the onEvent function, which will reduce memory pressure compared to making a copy.

Latency implications of large number of fishes and sources#

With current ActyxOS versions the internal event dispatch latency is proportional to both the number of Fishes and the number of devices in the swarm. The effect of this depends very much on the CPU that is running ActyxOS: for example 100 sources times 500 Fishes on an Intel Atom processor (rugged Android tablet from around 2015) lead to an internal latency of 3 seconds (e.g. between button click and subsequent fish state update).

We are working on improving this characteristic starting with release 1.1.5 and expect to remove this bottleneck in a future release. Until then we recommend to keep the product of woken up Fishes and swarm sources below 1'000, at least on tablet computers.

tip

With current Pond versions, Fishes will never be stopped and their resources released (we are working on a feature in this direction). This may lead to exceeding the above recommended limit or exceeding the available main memory.

Since this happens mostly on tablets with dynamic end-user workflows (like opening and closing work orders), an easy remedy is to restart the app — provided that after the restart less Fishes are woken up.

Pathological time travel performance when reconnecting a device that was offline for a long time#

When reconnecting a device that was disconnected for a long time period, this device will need to ingest all new events from all other devices in the swarm to become up-to-date. If a Fish is running during this process of catching up with the swarm, then that Fish will experience a lot more time travel than it would if it were woken up after ActyxOS is up-to-date. ActyxOS optimizes event delivery by doing it in chunks, nevertheless in pathological cases the amount of time travel is not linear in the number of such chunks but quadratic.

As an example: consider 10 devices to catch up with, each contributing 10 chunks of events for your Fish. The optimal scenario would be to first catch up and then insert all 100 chunks into the event history, then perform one time travel. The worst-case scenario would be that every chunk causes a time travel, with each subsequent time travel being more costly than the previous (due to added events). Thereby, the Fish would process some events potentially a hundred times, more likely ten times (since chunks are ordered per device they come from).

To avoid this issue, we recommend using Pond.waitForSwarmSync() to delay app startup until ActyxOS is up-to-date with the rest of the swarm.

Illustrative Examples#

Machine Connectors#

Machine connectors that only make machine state and counters available in the ActyxOS swarm can be made without running into the above-mentioned issues. The best way to do so is to emit all state and counter updates as events — typically a complete set is emitted every time. If for some reason you need to write partial updates (e.g. only a subset of counters) in some events, make sure to persist the full set within a single event at regular intervals and mark that as a semantic snapshot. This way starting up the machine connector’s related Fishes will always be quick.

The only performance consideration for machine connectors is that they produce data at a steady rate that is often higher than the data rates produced by human operators. Therefore you need to plan the available disk space according to update frequency times event size for at least one year; we will release a future Actyx version with ephemeral event streams which will give you the possibility to place a fixed limit on disk usage and thus solve this issue permanently.

Production Data Acquisition#

This use-case includes the tracking of state as well as results (finished goods, waste) of production orders. Each of these orders has a finite lifespan, it becomes irrelevant typically after a few weeks. The most challenging part of writing such an app is to construct a process registry fish such that its state doesn’t grow too big — and this fish will typically need to make use of state snapshots because it is long-lived and consumes some events from each production order.

We know from project experience that implementing production data acquisition with 50 workstations and 50 machine connectors (see above) works well if you keep the above recommendations in mind.

Data Exporters#

Exporting events or derived information into an ERP or BI system can be done using fishes if the overall workload is “human-scale”: we know from project experience that it works well for booking acquired production data into an ERP system.

Exporting machine or sensor data into a time-series database for further analysis outside ActyxOS is best done using the HTTP API and writing the exporter in a language that is well-suited for connecting to your database and handling the required amount of streaming data. We know from project experience that this can be done with native code (like Rust, C++) for demanding applications that write many million data points per day, e.g. into PostgreSQL with the timescale extension.