Collecting the Garbage for High-Availability Real-Time Java Systems
Kelvin Nilsen
Sep 26, 2001 (10:15 AM)
URL: http://www.commsdesign.com/showArticle.jhtml?articleID=16503463
Java has much to offer the high-availability communication industry at various service levels within the implementation of a typical communication system hierarchy. In such systems, the hard real-time device driver software at the lowest levels of the system hierarchy is often quite simple in design, whereas the highest levels of the hierarchy typically have considerable complexity and almost no real-time constraints. This article focuses on the intermediate levels of software in a typical high-availability communication system hierarchy. This software can be well suited to real-time Java technologies, if it has medium complexity and some real-time constraints.
One of the most important aspects of the Java programming environment that make it especially appropriate for development of complex high-availability system software is automatic garbage collection, which prevents dangling-pointer errors and reduces the likelihood of debilitating memory-leak programmer errors.
A software system is qualified for real-time garbage collection technologies if it requires soft or hard real-time responsiveness ranging from 1 to 1,000 ms. Application components with looser response-time constraints can usually be effectively served by modern interactive garbage collection techniques that do not have real-time capabilities.
Several aspects of a Java garbage collection system qualify it for support of reliable real-time software systems. Among these are:
To develop real-time communication software systems with Java technologies as the foundation, it is important to select a Java virtual machine (JVM) that offers support for real-time garbage collection as characterized by these five points. Garbage collection pacing makes it possible for a virtual machine to ensure reasonable upper bounds on the time that will be required by application software threads to allocate new objects.
Garbage collection abstractions
The idea behind automatic garbage collection is to reduce the burden of detail that application developers need to manage when developing software components. In an ideal world, the developer could pretend there is an infinite amount of memory and would never have to worry about the facts that memory is limited, CPU time is required to perform garbage collection, and fragmentation may hinder reliability even if the total amount of free memory in the memory allocation pool is far larger than the size of a particular allocation request.
Engineers developing a highly reliable real-time system using Java technologies need to pay special attention to certain issues related to the pragmatics of garbage collection:
Figure 1 shows a sample memory allocation log. This particular application allocates an average of 1 MB each second. figure 2 shows the cumulative effects of this memory allocation behavior in the absence of garbage collection. Clearly, pragmatic considerations require that this high rate of memory allocation be accompanied by frequent garbage collection activities.
Balancing the garbage collection workload
To configure a JVM for reliable support of a real-time workload, a designer must be able to configure the workload in terms of the following parameters. First, the designer needs to determine an upper bound on how much memory the workload maintains as active at any instant in time. Second, the maximum rate at which memory is allocated (and discarded) by the real-time workload must be determined. Given that an upper bound on the total live memory retained by the application is determined, the designer assumes that allocation of new memory is balanced by discarding previously allocated objects once steady-state conditions have been reached.
Third, a designer must know what percentage of CPU time the application workload is willing to relinquish to garbage collection. Note that the application workload needs to count on a certain amount of CPU time for reliable execution of its real-time threads. By relinquishing more CPU time to garbage collection, the application workload enables reliable execution of garbage collection with a smaller working allocation buffer.
In theory, the developer or system integrator can specify the values of these parameters. But it is also possible for the virtual machine itself to experimentally determine approximate values for the first two parameters, assuming that the system integrator specifies an upper bound on the percentage of CPU time available for executing the garbage collection thread. Let's consider how that could take place.
Principles of pacing garbage collection
Assume that the total amount of allocatable memory in a particular virtual machine configuration is M bytes. Further suppose that an upper bound on the CPU time required to perform the complete incremental garbage collection is S seconds.
Note that we assume it is possible to determine a constant upper bound on the CPU time required to perform garbage collection, independent of how many times the algorithm is preempted. Most incremental garbage collection approaches violate this assumption. However, some real-time garbage collection algorithms do honor this requirement.
Let R represent the upper bound on the percentage of CPU time that the virtual machine is allowed to run garbage collection. Note that the real time required to perform a complete incremental garbage collection is represented by S/R.
Suppose further that U represents the maximum live-memory usage of the real-time workload, measured in bytes, and that V represents the real-time workload's allocation throughput, measured in bytes allocated per second. When we speak of steady-state conditions, we refer to a condition in which the application workload has progressed to the point of having allocated and initialized all long-lived data structures, after which point every time the application allocates a new object, it releases a previously allocated object of similar size.
Consider the state of memory immediately following completion of garbage collection. In the worst steady-state case, the heap holds a total of U bytes of live memory plus V(S/R) bytes of dead memory to balance the allocations that occurred while the most recent garbage collection pass was active. If we start the next garbage collection pass immediately after this first pass completes, we need to allocate another V(S/R) bytes of memory while garbage collection is carried out. Thus, the total size of the allocation pool required to support this workload, M, must be greater than or equal to U + 2V(S/R).
If the total size of the allocation pool is greater than U + 2V(S/R), it is possible to idle garbage collection until the amount of available memory remaining in the pool is less than or equal to V(S/R). The key benefit of idling garbage collection is that this makes more CPU time available for performing the work required by the application threads.
These relationships are illustrated in figure 3. Note that the amount of memory available for allocation immediately following completion of a garbage collection pass must be at least V(S/R) to guarantee that enough of a memory buffer to satisfy the memory allocation requests that arrive while performing garbage collection. Since for this sample workload the memory available immediately following garbage collection is significantly greater than V(S/R), figure 3 shows that garbage collection is only active approximately 50% of the time.
In the above analysis, we are making two conservative assumptions. First, we assume that objects that are live at the beginning of garbage collection but become garbage at some point during garbage collection will not be reclaimed by the current garbage effort. Second, we assume that none of the memory reclaimed by garbage collection is available to serve future allocation requests until the entire garbage collection effort has completed.
Experimental determination of the pacing parameters
As a real-time workload runs, it is straightforward for the dynamic memory-management subsystem to monitor its behavior to determine pacing parameters. Assume that the system integrator specifies the initial value of R. V is measured by tallying the amount of memory allocated per unit of time and adjusting V upward whenever the most recent measurement exceeds the previously approximated value of V. S is measured by monitoring how much CPU time is required by the garbage collection thread to perform a complete garbage collection and adjusting S upward whenever the most recent garbage collection effort exceeds the previously computed estimate for S.
At the end of each garbage collection, approximate the value of U by subtracting 2V(S/R) from the amount of memory that is currently in use by the application workload. (The phrase "in use" refers to the memory that has been allocated to the application and has not yet been identified as garbage.) As with the approximations computed for other parameters, if the most recently estimated value for U is greater than the previous estimate, the designer needs to adjust the estimate to the newly approximated value.
Most real-world workloads are somewhat bursty in nature, as shown in figure 1, so the whole concept of real-time workloads operating in steady state is a slight oversimplification. Nevertheless, this notion serves as the basis for analysis and modeling.
Over the history of a long-running application, continual monitoring of the workload enables us to determine its worst-case behavior. The garbage collection system can then configure itself to reliably handle the worst observed circumstances. As long as the system continually refines its notion of worst-observed circumstances, this self-adjusting approach to pacing of garbage collection works well for soft real-time system workloads.
However, for hard real-time workloads, developers need to analyze the worst-case memory-allocation behavior and provide appropriate parameters rather than trusting the dynamic memory allocation subsystem to determine these parameters from observation of the system's behavior.
Special considerations: generational garbage collection
Empirical analysis of many object-oriented programs reveals several common characteristics. For example, most allocated objects have a short lifetime, and it is relatively rare for old objects to contain references to young objects. Also, for the rare object that survives beyond a particular threshold length of time, the expected remaining life expectancy is much longer than the expected remaining life expectancy of a newly allocated object.
Generational garbage collection is a garbage collection technique that divides memory into multiple regions, or generations, each representing objects of progressively older age. Garbage collection focuses its efforts on the youngest generations, with the intent that for the same total garbage collection effort, the percentage of reclaimed memory is generally much higher for the young-generation regions.
The performance benefits of generational garbage collection can only be realized if the garbage collector reclaims dead memory from young generations without examining all of the memory contained in the older generations. To do this, the generational garbage collection system maintains a log of pointers that reference younger-generation objects from older-generation objects.
When searching for garbage in one of the younger-generation regions, the collector examines the log of cross-generation pointers for the older-generation regions. It does this without reclaiming any of the dead memory in the older generations. This means that a dead object in the older generation that happens to contain a pointer to a younger-generation object will cause that younger-generation object to be viewed as alive.
A couple of cautions
In terms of reliability and predictability metrics, generational garbage collection is not especially good. Think of it as a heuristic that usually delivers improved performance, but in very rare circumstances results in undesirable performance anomalies. Here are a couple examples of what might go wrong with a generational garbage collector.
If a large majority of the young objects that become garbage are referenced by older objects that have already been promoted out of the younger-generation memory regions, then reclaiming the memory for these objects takes much more effort than a traditional garbage collection system would require. This is because the generational collector must treat the dead object as if it were alive until it performs a collection of the older-generation memory region that contains references to this dead memory.
Also consider that although the time required to perform a typical generational garbage collection is short (because the collector only examines the objects contained within the youngest generation's memory region), occasional garbage collections have to traverse all of memory to find the requested dead objects. This takes even longer with generational than traditional garbage collection because the generational collector must first fail in its attempts to collect sufficient garbage from all younger-generation regions before it will even attempt to collect garbage in an older generation.
Because generational garbage collection is only a heuristic, using generational garbage collection in a highly reliable real-time system is especially challenging. One approach to the problem would be to recognize the performance benefits of generational garbage collection and configure the virtual machine to use generational garbage collection as much as possible. All real-time pacing calculations are based on the assumption that only full (non-generational) garbage collection can guarantee to replenish the memory allocation pool under the required timing constraints.
As long as there is enough slack in the scheduling of the garbage collection thread to perform a generational collection followed by a full non-generational collection prior to exhausting the free pool, this approach optimistically schedules execution of the generational collection. In most cases, the generational collection will succeed in reclaiming sufficient dead memory to obviate the need for performing the full non-generational collection.
On completion of each generational garbage collection, it is necessary to query once again whether there is sufficient scheduling slack to complete another generational collection followed by a full non-generational collection prior to exhausting the free pool. If it is found that, following completion of the generational collection, the free pool is at risk of depletion, perform a full non-generational collection. Likewise, if there is not enough remaining scheduling slack to perform both generational collection and non-generational collection before exhausting the free pool, conservatively choose to perform full non-generational collection rather than performing another generational collection.
The potential of automatic garbage collection
Automatic garbage collection greatly simplifies software development and expands the breadth of possibilities for software engineers who are designing new features, frameworks, and architectures. With appropriate attention to detail, even developers of systems that have high-availability and/or real-time requirements can enjoy the full benefits of automatic garbage collection.
Kelvin Nilsen is the founder, vice president, and chief technical officer of NewMonics, Inc., He earned a BS in physics from Brigham Young University in 1981 and MS and Ph.D. degrees in computer science from the University of Arizona. He also presides as Technical Chair for the J-Consortium, an open forum dedicated to the advancement of Java technologies for real-time and embedded applications. He can be contacted at kelvin@newmonics.com.