Our mission is to make automation accessible. We do this in so many ways that enable customers to achieve their goals and make it to Go-Live, in other words, when your automation is deployed into production. That said, there is a special group that must deliver automation under heavy loads. This document is for any rule author that must keep their eye on performance at all times.
AVG Execution Time
The most important metric for performance is execution time. This single metric solves several problems when teams pay attention to it in the early days of a project and prior to go-live. Why is this important? Execution time eats CPU utilization. Rule applications that perform 25% better after optimization will consume much less CPU time across a production environment for equivalent loads. In practice, we cannot say that CPU utilization will drop by the same percentage. What we can say is it will make a difference in your project.
Traffic Load
Why is traffic load a secondary problem? Traffic load is important; however, a well-performing rule application can simply take more load, making execution time a North Star metric from a performance perspective. Reducing load can help a poorly performing rule application in addition to adding CPU cores; however, this is not always the case. Traffic can also spike quickly. Sluggish rule applications simply do not scale out well because they make other parts of the architecture work harder.
Root Causes
Poorly performing rule applications usually execute time-intensive activities – in many cases unnecessarily.
These are the most common culprits we observe:
Repetitive Lookup
Often there is a need to find a value in a collection or a reference to a specific definition within a rule application. It is always better to place the results of lookups into a data structure that is globally accessible to the rule application. All other references to that information avoid the looping structure that created the lookup. For example, imagine a lookup value is three levels deep in a nested collection. A rule author might have one thousand historical medical claims that must be compared to that lookup value. If the lookup value is buried in a set of large nested collections – the resulting work could be 1000 (loops in the parent collection) + 1000 (loops in the child collection) + 300 (loops in the lower child collection) for a possible total of 2300 loops to determine the lookup value. That is a lot of work for a lookup value. If this information is looked up each time it is needed (rather than storing it globally in the rule application) the resulting work could be 2300 multiplied by the size of the collection using the lookup value. It is not uncommon to see thousands of unnecessary repetitive loops. Let’s put this into perspective with our example scenario:
Without Optimization
2300 (loops to get the lookup value)
x 1000 (the size of the target collection to be processed with the lookup value)
= 2,300,000 loops to process the target collection
Optimized
2300 (loops to get the lookup value and store it once)
+ 1000 (the size of the target collection to be processed with the lookup value)
= 3300 loops to process the target collection
2.3 million loops were reduced to 3300. The impact of storing lookup values in a globally accessible area of a rule application reduces a significant amount of the work the rule application must do. The performance report is a great place to see which rules are called the most. This might be your first clue as to the frequency of execution and if it is expected or not.
Collections that Grow
From time to time, we see reference data in a rule application grow at runtime. During testing, while synthetic test data is usually small and targeted, the performance risk of a collection that grows might be missed. It may be desirable for a collection to grow. We suggest you keep an eye on these collections and benchmark with real-world data to ensure no surprises occur at go-live (when you turn over to production).
REST and Database Endpoints
REST endpoints solve many problems – especially when they provide a data state that simplifies a rule application. If possible, optimize data payloads for exactly what the rule application requires. The rule application will pay the transmission cost for any unused data. If possible, gather required data from APIs prior to calling a rule application. This practice has the following benefits:
- Rule applications are easier to share since they have no dependencies.
- Removing dependencies from a rule application isolates problems in production environments making it easier to understand what InRule is doing--in comparison to other services that might have problems (random DNS errors, timeouts, etc).
Many of the same practice points above can be said for database endpoints as well. However, there are times when a rule application simply does not know what data it needs until it arrives at a particular point in its execution. We urge every rule author to be judicious in their choices – especially for rule applications under extreme loads.
Single Pass Sequential
In 5.8, the default rule set is ‘Single-Pass Sequential’ to avoid unnecessary firing of rule sets because of state changes to the entity schema (for reference the industry calls this forward-chaining). While this might not improve performance on its own, it will remove complexity when it is time to analyze a trace. Recursive firing of member rule sets has its place; however, we suggest you opt into the choice when you know it simplifies your problem.
Unused Entity Fields
Large entities, where only a few fields are used, will impact execution performance. For this reason, it is best not to import large cross-sections of a standard like MISMO or simply import the totality of an industry schema used in-house. The temptation is acceleration; however, it is worth the effort to trim or even flatten data structures for a decision. The other common argument is “future-proofing” so breaking changes to a schema are not incurred. This is a real problem for teams that do not spend enough time on their entity structure. Our best practice is a balance—flatten where you can, and incur complexity where it is required. Above all, do not hurt performance for some future problem that has not (or may never) happened yet.
Fields not included in the entity schema will not be returned to an application. This might affect code that depends on chaining the state of an object between API calls.
Performance Statistics Check-List
Below is a quick check-list of things to review if you are experiencing performance degradation in your rule applications—while keeping your eye on the North star metric of execution time:
- Function Compile Time: If Function Compile time is high, check for overuse of the Eval function;
- Long Durations: Start optimizations by focusing on the rules that take the most time to execute.
- Multiple Executions: Check for rules that are fired more times than you expect.
- Data and Method Operations: Check that Data and Method operations (towards the bottom) are not consuming an unusually high percentage of the execution time; if they are, consider refactoring the architecture to remove some of the need for those interactions at rule execution time (IE pre-querying and passing the required data in as part of the entity state).
- Execution Trace Patterns: Check for patterns in the Execution Trace that are repeated many times (high frequency).
These quick checks above are your go-to practices for performance. That stated, the following additional checks might prove useful:
- UDFs: Ensure that "State Refresh" is disabled for all UDFs (unless you are in the very edge case of an irSDK execution where the UDF is calling a bound assembly method that modifies entity state information).
- UDF Complexity: For complex UDFs, have a developer review the logic to ensure that it follows software development best practices. Advanced UDF tuning (leveraging things like ContextProperties) may benefit from an InRule services engagement.
- Entity Creation: Creating large numbers of entities (that will all reside in the rules engine dependency graph) may have a performance impact. Reuse Entities by using "by reference" where possible.
- Field Inclusion: Including large numbers of unnecessary fields may have a performance impact; remove unnecessary fields where possible. Note that removed fields will not be returned as part of the output entity structure, so may be required only for that purpose.
- Multi-Pass Sequential RuleSets: RuleSets that are not single-pass or explicit may fire multiple times without the rule author without the explicit need for them to refire.
- Calculations: Calculations behave like sequential Rule Sets that re-fire when needed. Complex expressions in calculations may have a performance impact if they refire unnecessarily.
- Collections: Collection interactions (like collection lookups) are done by iterating through each item in the collection, they do not operate with indexes. Frequent iteration over collections, or iteration over large collections without filters, may have a performance impact.
- External Requests: Interactions with external resources like databases and API endpoints are performed synchronously and will pause execution for the duration of the sum of network latency and processing on the external endpoint. This can directly impact performance.
- External Request Payload: Where possible, reduce the volume of data returned by an external request. This reduces data on the wire. More data will generally take more time to process a response.
- External REST Endpoint Result Mapping: The results of a REST request are frequently mapped into the entity structure using a ‘Map Data’ action, but the payload will frequently include more fields than are needed by the Rule Application. If the external API cannot be simplified to return less data, then the rule author may choose to only include the required fields in the mapping target schema. The unused JSON data returned by the request will be ignored, and there will be fewer Entities and Fields added to the engine’s dependency graph (see Entity Creation and Field Inclusion).
- Cold Start: Rule Applications that are frequently updated in production or that have long compilation times will experience longer executions during the cold start compilation. The engine will block requests until compilation is complete. Try enabling Background Compile to relieve problems due to cold start compilation.
Summary
It is natural for rule applications to grow in size and complexity—they do real automation in some of the most challenging scenarios. The first line of defense is testing your rule application with irVerify to understand how well it performs using best and worst-case entity instances. Note the time it takes (without a cold start) to run the entry-point rulesets you care about. It would be convenient if we could simply state a practice like “Keep an eye on your rule application if it crosses the 1000 millisecond line.” It is not that easy. Any time a rule application grows by a significant amount, it is important to measure execution time. For example, if a rule application grows from 250 milliseconds to 500 milliseconds, then that’s a 100% increase in execution time and will drain CPU utilization as a result. Look for significant jumps in execution time. In agile, the best practice is don’t optimize early because a working system is more valuable than nothing. Above all, we believe the best practice for automation is to make good choices along the way and continually test performance to understand the impact and avoid panic right before or during Go-Live.
Comments
0 comments
Please sign in to leave a comment.