In high-volume applications that make numerous calls to the rule engine, using a multi-threaded approach can help achieve better overall performance for batch processing. Take the following scenario as an example:
- A data transformation application processes approximately 100,000 records with InRule every night as a batch
- The application will run on a machine that has more than two processing cores
The InRule rule engine and catalog are designed for free-threaded execution. In the free-threaded model, memory caches are reused between rule engine instances running in different threads, which can lower the overall memory footprint and execution times for an application.
When InRule is run on multiple threads, the owner process can execute rule engine requests concurrently. Each processor core shares the load in processing rules, which can produce an overall execution time that is significantly lower than if the entire batch was executed serially in a single thread on a single core.
With any multi-threaded application, there is an inherent performance overhead incurred for each new thread that is introduced into the process. With a .NET application, this overhead can be exacerbated by the need to periodically run garbage collection across a heap of memory that is shared between threads. During garbage collection, the .NET runtime blocks executing threads so that the memory heap can be examined and cleaned in a thread-safe manner.
Since InRule is built from the ground-up in .NET, it experiences the same scalability concerns inherent with any .NET application that must work with a large number of .NET objects. In many cases, adding more concurrent execution threads beyond one thread per core to these types of applications will not improve overall performance.
Given that each rule application is unique with respect to both the complexity and depth of its schemas and rulesets, it is difficult to make a blanket statement about the best threading model to use for all applications that are running the rule engine. Applications with a large number of rules may perform better when more threads are added, while large state models may quickly reach a point of diminishing returns as new threads are added.
Although optimal threading models vary greatly between applications, the following guidelines are presented as general best-practices:
- Use a maximum of one rule execution thread per processor core -- InRule is processor intensive, so adding more threads beyond one per core generally does not help performance. In addition, a large number of threads may increase contention for memory caches and garbage collection thereby degrading performance.
- Run services under IIS instead of a Windows Service -- IIS has optimized request handling for large numbers of requests across threads.
- Set the Garbage Collection mode for the .NET runtime to "gcServer" -- By default the .NET runtime will use "gcWorkstation", which optimizes garbage collection for UI applications and UI background threading. The "gcServer" is recommended for machines that have more than two logical processors and will be running free-threaded applications. By default an IIS install should already have processes set to use gcServer, but this setting may need to be manually adjusted for other .NET applications. For more information about adjusting this setting in a .NET configuration file, see .NET Framework Runtime Config Settings With InRule. More information about garbage collection modes is available from Microsoft on-line at https://msdn.microsoft.com/en-us/library/ms229357.aspx.
- Run performance load tests before deploying to production -- To ensure that production load requirements are met, run soak tests on similar hardware to better anticipate real production throughput. Adjust threading models to optimize for the highest throughput.
- Scaling across cores does not correlate to a linear improvement in performance -- The application overhead of running on more than one processor results in some loss of throughput as each new thread is included. For example, running an application on two threads may result in an overall improvement of 1.9 times instead of a linear factor of 2.0. This overhead increases as more cores are included, so that an application running on four cores may only show a factor of 3.5 times improvement over one thread on one core. Since each rule application can include a vastly different combination of rules an schema objects, these scalability factors can vary between implementations.
- Scale "out" is preferred over scale "up" -- To make the most efficient use of hardware, InRule suggests no more than eight physical cores per server. If rule processing demand exceeds the capacity of four to eight cores, then the option of configuring multiple four or eight core servers in a farm should be explored before using a server with more than eight cores. A four-core server is considered the ideal sizing for most applications.
Configuration of IIS for Multi-Threading:
Given the popularity and robust nature of IIS on Windows Servers, many InRule implementations are hosted as services or applications running in IIS Application Pools. Each IIS AppPool contains AppDomains with .NET memory heaps. If too many threads are running against the same shared memory heap, then the application may experience a drop in throughput due to contentions. When running under IIS, the following guidelines may help optimize throughput:
- Scale out physical servers over eight cores -- If not using virtual machines, then load should be spread across physical servers that have a maximum of eight cores.
- Use multiple VMs with a smaller number of cores assigned to each VM -- If a hypervisor is available, run multiple VMs with two, four, or eight cores assigned to the VM. Spreading the load across multiple VMs generally results in less performance loss due to thread contention.
- If the number of servers is limited, enable "web gardens" in IIS -- Each Application Pool in IIS can be configured to run as one or more processes (the default is one). By setting the Maximum Worker Processes to greater than one, IIS with spread the request load across more than one process, which in turn spreads the load over more than one shared memory heap. This may help reduce threading contentions and improve throughput. Note that when more than one worker process is used, there is a significant increase in fixed memory cost, since each process must maintain a separate set of shared InRule caches.