Performance and limits
This conceptual guide explains the limitations of Actyx in real-world factory solutions with regard to:
Latency
Event communication latencies are extremely hard to predict. Hundreds of things play into this, from the hardware of your edge devices to a forklift passing in front of a wireless access point. The following statements hold under typical conditions most of the time:
- event delivery latency is below 200 ms unless you are pushing performance limits (see below)
- in general, Actyx has lower latency volatility than most centralized systems
Keep in mind that Actyx does not offer deterministic real-time services; you should use a PLC for such cases.
As a developer you can always build apps that will bring any system to its knees. If you follow best-practices you should not face any issues. If you do, please get in touch with us and the Actyx community—we love optimizing!
Max. number of nodes
The amount of devices you can reliably run in a single Actyx swarm depends on a large number of factors, including the local-area network setup, the devices themselves, and the apps running on them. With typical rugged tablets (CPU from around 2013) or other devices with relatively low computing power (e.g. Raspberry Pi 3), 100 devices should by themselves not pose a problem. If you start pushing performance limits (see below) then this number can be significantly lower, depending on the event data rate and the computational complexity of your business logic.
In some cases it may be possible to split an overall system into several smaller ones that do not directly interact via event streams. You can use this approach in order to serve a larger factory while still staying within the 100 devices limit.
Please get in touch with us and the Actyx community: we would love to hear about your use-case and figure out how to make it work with Actyx.
Required disk space of devices
Actyx is a completely decentralized system dependent on the disk space of every single edge device. At runtime different types of data are generated and stored throughout the Actyx swarm. Current versions of Actyx store every event on all devices of the same swarm; in the future we will add the possibility to configure the replication as well as the lifespan of events. This will allow you to configure when events should be deleted and on which devices events should be replicated.
These are the currently implemented behaviors:
Type | Size | Lifespan | Replication |
---|---|---|---|
Event | max. 4KB per event | Events have infinite lifespan within the swarm. | Events are replicated on all devices. |
Log | max. 4KB per log | Logs are cleared automatically when disk space is needed. | Logs are not replicated. |
The listed maximal sizes are recommendations — neither events nor logs should be used to carry huge pieces of data (although larger events may be justified in some circumstances, e.g. defining a production process with many steps, where such definitions happen infrequently).
Until we make event retention and replication configurable, the point at which you run out of disk space depends on the size, number, and compressibility of your events. This is always dependent on how you define events in your app(s), but here are two examples based on apps that are running at our customers:
With machine integrations | Without machine integrations | |
---|---|---|
Number of nodes | 8 | 10 |
Operating time | 1 year | 1 year |
Number of events | 10,000,000 | 580,000 |
Uncompressed size of events | 4500 MB | 1200 MB |
Compression factor | 0.03 | 0.03 |
Disk space needed | 135 MB | 36MB |
Currently, the only solution to running out of disk space because of events is either clearing events from your swarm (effectively creating a new swarm and starting over) or increasing the disk space of your edge devices. We are already working on the functionality to configure replication and lifespan of events.
Please get in touch with us and the Actyx community: we would love to hear about your use-case and figure out how to make it work with Actyx.
Performance
The limits given above are formulated under the assumption that the processing of events by business logic does not use significant resources. While there are vast differences between languages, runtimes, CPU architectures, and your choices of business logic algorithms and data structures, this section gives some guidance based on our experience.
Latency implications of large number of peers
With current Actyx versions the internal event publication latency is proportional to the number of connected peers.
The effect of this depends very much on the connectivity: for example 20 connected peers on a cloud network can see an internal latency of more than a second (e.g. time until the call to publish
resolves).
We are working on removing this bottleneck in a future release.
Pathological time travel performance when reconnecting a device that was offline for a long time
When reconnecting a device that was disconnected for a long time period, this device will need to ingest all new events from all other devices in the swarm to become up-to-date. Actyx optimizes event delivery by doing it in chunks, nevertheless in pathological cases the amount of time travel is not linear in the number of such chunks but quadratic.
Event subscription via subscribeMonotonic
and machine-runner may experience a lot more time travel than it would if it were woken up after Actyx is up-to-date.
As an example: consider 10 devices to catch up with, each contributing 10 chunks of events for your application. At best, all 100 chunks are inserted into the event history and then one time travel is performed. At worst, every chunk causes a time travel, with each subsequent time travel being more costly than the previous (due to added events). Thereby, the application would process some events potentially a hundred times, more likely ten times (since chunks are ordered per device they come from).
To avoid this issue, Actyx.waitForSync()
can be used to delay app startup until Actyx is up-to-date with the rest of the swarm.
Startup Time in Machine-runner
machine-runner works by subscribing to events of certain tags and applying all events that are found. At start up, a lot of matching events may have already been stored, which then makes the machine-runner take time, proportional to the number of events, until the first state is available.
Our recommendation is to structure your machine-runner protocols such that each requires less than 1,000 events. This means that machine-runner are best used to model short-lived process workflows (which rarely have more than 1,000 steps). Machine-runner are not ideal for accumulating statistical data over long time periods; a better solution for that use-case is to export the events into a time-series database.
Illustrative Examples
Machine Connectors
Machine connectors that only make machine state and counters available in the Actyx swarm can be made without running into the above-mentioned issues.
The best way to do so is to emit all state and counter updates as events — typically a complete set is emitted every time.
You can then use Actyx.observeLatest
to keep track of the latest known machine state.
Machine connectors often produce data at a higher rate than human operators. Therefore you need to plan the available disk space according to update frequency times event size (compressed) for at least one year. Alternatively, ephemeral event streams configuration can be used to place various types of limitations on disk usage to alleviate this issue.
Data Exporters
Exporting machine or sensor data into a time-series database for further analysis outside Actyx is best done using the HTTP API and writing the exporter in a language that is well-suited for connecting to your database and handling the required amount of streaming data. We know from project experience that this can be done with native code (like Rust, C++) for demanding applications that write many million data points per day, e.g. into PostgreSQL with the timescale extension.