INNOVATION in a wireless world

Latest News

Open JDK releases follow strict 6 month release cadence. Some of these releases are only feature releases or short-term releases. This means that this particular version will get updates only for following six months till next release. Every two years there is a Long Term Support (LTS)  release. And this is the more desired one in terms of production deployment as it will be updated and patched for next few years.

Each new version comes with its own set of changes. Some are completely new while other are slowly improving features over several consecutive releases. There are some language changes like introduction of switch expressions, text blocks or records just to name the few. Other changes touch inner working of JVM itself like introduction or improvement of GC algorithms.

Each time a new release comes out, you may ask: should we migrate? Rewriting the existing code just to implement new language feature immediately raises some concerns. Such operation has a high potential to introduce new errors and the costs might be larger than the benefits. You could end up in a situation when you are literally throwing money into the fire.

Background story

In one of our test environments, we set up a cluster of application servers that were initially certified to run on Java 11. At first, the hardware setup was more than sufficient and efficiently handled the expected workload.

Over time, the system landscape evolved. Additional features were added, more users started relying on the platform, and resource demand naturally increased. While the architecture itself was stable, the higher load began to highlight areas that needed optimization.

The environment that once performed flawlessly now requires careful tuning to keep up with new conditions. This triggered an evaluation of whether the underlying platform – in particular, the JVM version – could be adjusted to better support the changing workload.

Approach

At first glance, the hardware resources appeared to be sufficient to accommodate the required traffic. Yet, from time to time, we observed issues with the garbage collection process exceeding acceptable limits. Rewriting the codebase was not an option, as it depended heavily on a third-party framework. The only viable approach was to fine-tune the JVM.

Initially, changes were focused on memory allocation. Adjusting parameters such as Xmx and Xms was enough to pass stress tests and handle peak traffic. Eventually, however, this was no longer sufficient. We decided not only to move beyond Java 11 but also to test newer GC algorithms. This led us through several iterations before settling on G1 GC with newer OpenJDK releases.

Having version 17 provided a period of stability. More recently, however, we made another step forward – migrating from OpenJDK 17 to 21. After promising results in the lab, we gradually rolled out the upgrade to production and monitored the outcomes.

Results

We’ve read blog posts promising reduced memory fragmentation and consumption, various improvements in existing algorithms and removal of obsolete ones. But how does all that translate to changes in behaviour on production? Let’s find out!

We observe natural changes in traffic depending on week day and/or holidays. To be able to compare apples to apples we selected two time windows that represent fairly uneventful weeks (no public holidays etc.), one before and one after the upgrade to OpenJDK 21.

We will start with throughput – percentage of time the JVM spends executing application code versus performing garbage collection.

Figure 2. Throughput

As you can see above the throughput rises around 0.2% after the change and now it shows what one could call: less noisy sinusoid. This means that application spends a bit more time processing the traffic versus GC. Where is this change coming from? Does that mean there is significant reduction in number of GC events? Yes! In low traffic hours there is a 20-25% decrease in GC events and 5-12% decrease in busy hour. Moreover, there is a change in type of events – mixed events are way more frequent than before.

Next charts show the exact time JVM spends in GC while application is stopped. There is a 25 second drop in total pause duration. This, together with 3h for each datapoint will make the 0.2% in throughput mentioned before. While 0.2% might still be considered a relatively small number please notice that this reduced the overall time spent in GC even more than 25% in busy hour (115s ↘ 85s), which is a significant change.

We know now that there is less GC events, they altogether take less time but what about each GC pause? Is there any change in the duration? This also shows a significant improvement. There is a drop in both average and maximum duration of pauses. This means application suffers way less from GC pauses which has positive impact on KPIs such as transactions per second or response time.

Results

In our migration from OpenJDK 17 to 21, we observed clear improvements in G1GC behaviour. The newer JVM introduced more frequent mixed collection events and other improvements. All this helped maintain a healthier heap and resulted in noticeably shorter GC pauses. These changes had a positive impact on key application KPIs, improving both performance stability and responsiveness. At the same time, our tests confirmed that it is essential to carefully evaluate JVM behaviour in a staging environment before production migration, as the impact of GC tuning can vary between workloads.

Telecommunication. Security. Intelligent Network. SS7. Diameter.