Profiling Memory/Thread Leaks in Your Java Application

October 28, 2023

On a recent endeavor, I integrated a new library maintained by another team, which aims to establish a connection between the server and the client using a customized protocol. However, after completing the implementation and deployment, the system experienced a significant memory leak.

Memory Leak

Though I have experience with performance and memory leak profiling, the world of Java is entirely foreign to me. So, here are some quick insights to Java application profiling.

Is there a Leak?

I firmly believe that it's a good practice to monitor CPU and memory usage periodically after every deployment. Typically, with proper observability practices, you can detect memory leaks in a timely manner.

Apart from checking the overall memory usage, you can use tools like the IntelliJ Profiler and JDK tools for more detailed profiling. Let's consider the IntelliJ Profiler as an example. If you follow the official profiling tutorial, CPU and memory live charts | IntelliJ IDEA Documentation, you'll come across the Heap Memory chart. This chart can indicate if there's a leak in the heap memory.

IntelliJ Profiler - Heap Memory

If you observe a saw-tooth pattern in the heap memory layout as above, it indicates that the Garbage Collector (GC) is functioning properly and there isn't a heap memory leak. Conversely, if the chart shows a consistently increasing trend, it suggests a heap memory leak.

Get Closer: Check the Log

Just like analyzing any other kind of problem, we identify the issue and gradually narrow down the potential culprits. Java excels in error handling and provides a call stack for debugging. There are several types of Out of Memory Error (OOM). The most common one is java.lang.OutOfMemoryError: Java heap space. This error indicates that you've run out of heap space, which may be due to insufficient allocated heap space or improper implementation.

In this case, I encountered the error java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached. This indicates that the JVM failed to create a new thread due to two possilble reasons. However, it's not possible to reproduce the issue on my local machine, so it's essential for me to profile the leak online.

JDK Tools: Basic Info

Firstly, jps lists all the JVM processes, and pinger-service.jar is the specific service suspected of having a leak issue:

$ jps
7 pinger-service.jar
250 Jps

With the specified pid, you can inspect VM flags using jinfo:

$ jinfo -flags 7
VM Flags:
-XX:AdaptiveSizePolicyWeight=90 -XX:CICompilerCount=2 -XX:GCTimeRatio=4 -XX:InitialHeapSize=8388608 -XX:MaxHeapFreeRatio=10 -XX:MaxHeapSize=132120576 -XX:MaxNewSize=44040192 -XX:MinHeapDeltaBytes=524288 -XX:MinHeapFreeRatio=5 -XX:MinHeapSize=8388608 -XX:NewSize=2621440 -XX:NonNMethodCodeHeapSize=5826188 -XX:NonProfiledCodeHeapSize=122916026 -XX:OldSize=5767168 -XX:ProfiledCodeHeapSize=122916026 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:SoftMaxHeapSize=132120576 -XX:+UnlockExperimentalVMOptions -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseContainerSupport -XX:+UseParallelGC 

If you're familiar with the workings of GC,, jstat is for you:

$ jstat -gcutil 7
  S0     S1     E      O      M     CCS    YGC     YGCT     FGC    FGCT     CGC    CGCT       GCT   
  0.00   0.00  57.86  89.89  99.37  97.00    288     0.974   121    10.745     -         -    11.718

While the aforementioned tools provide valuable insights for reviewing configurations and details about GC, you can delve deeper with other tools.

JDK Tools: Heap and Thread

Finally, we come to the main course. Before this step, the leak issue might have seemed like a black box to you. In this phase, you would expect to find some suspected culprits. Alright, let's start with jmap.

jmap could list the histogram of the live object heap:

$ jmap -histo 7
num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:        163293       20076592  [B (java.base@17.0.2)
   2:         34197        9135920  [Ljava.lang.Object; (java.base@17.0.2)
   3:         28496        3313352  [I (java.base@17.0.2)
   4:        104208        2500992  java.lang.String (java.base@17.0.2)
   5:           500        1925184  [Ljava.nio.channels.SelectionKey; (java.base@17.0.2)
   6:          2800        1881600  io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
   7:         50171        1605472  java.util.concurrent.ConcurrentHashMap$Node (java.base@17.0.2)
   8:         13329        1579848  java.lang.Class (java.base@17.0.2)
   9:          2422        1273608  [C (java.base@17.0.2)
  10:         12858         984904  [Ljava.util.HashMap$Node; (java.base@17.0.2)
  11:          9187         808456  java.lang.reflect.Method (java.base@17.0.2)
  12:         22245         711840  java.util.HashMap$Node (java.base@17.0.2)
  13:          4752         657552  [J (java.base@17.0.2)
  14:         14631         585240  java.util.LinkedHashMap$Entry (java.base@17.0.2)
  15:           944         543744  io.netty.util.internal.shaded.org.jctools.queues.MpscUnboundedArrayQueue
  16:         15431         541640  [Ljava.lang.String; (java.base@17.0.2)
  17:          8897         498232  java.util.LinkedHashMap (java.base@17.0.2)
  18:          1081         478352  [Ljava.util.concurrent.ConcurrentHashMap$Node; (java.base@17.0.2)
...

Also, it could be used to dump the current heap memory (Memory Snapshot); however, the dump file needs to be analyzed with other tools, such as jhat or Eclipse Memory Analyzer Tool (MAT):

$ jmap -dump:file=/tmp/DumpFile.txt 7
Dumping heap to /tmp/DumpFile.txt ...
Heap dump file created [94775465 bytes in 0.337 secs]

Since the issue might be related to thread problems, we also need to check the stack traces of threads using jstack:

$ jstack -l 7
2023-10-29 01:04:26
Full thread dump OpenJDK 64-Bit Server VM (17.0.2+8-86 mixed mode, sharing):

Threads class SMR info:
_java_thread_list=0x00007f7070002140, length=660, elements={
0x00007f709c0a1580, 0x00007f709c0a2990, 0x00007f709c0aa570, 0x00007f709c0ab8f0,
0x00007f709c0accd0, 0x00007f709c0ae650, 0x00007f709c0afb50, 0x00007f709c0b0f90,
0x00007f709c0fd390, 0x00007f709c100220, 0x00007f709c844d50, 0x00007f7048088cf0,
...
"client-5075-1" #6519 prio=5 os_prio=0 cpu=0.63ms elapsed=4.62s tid=0x00007f6ff4143d10 nid=0x1ae3 runnable  [0x00007f6e5fc14000]
   java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.EPoll.wait(java.base@17.0.2/Native Method)
 at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@17.0.2/EPollSelectorImpl.java:118)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@17.0.2/SelectorImpl.java:129)
 - locked <0x00000000fe2c2c30> (a io.netty.channel.nio.SelectedSelectionKeySet)
 - locked <0x00000000fe2c29b0> (a sun.nio.ch.EPollSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(java.base@17.0.2/SelectorImpl.java:146)
 at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
 at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:879)
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:526)
 at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.lang.Thread.run(java.base@17.0.2/Thread.java:833)

   Locked ownable synchronizers:
 - None

"VM Thread" os_prio=0 cpu=8927.59ms elapsed=4637.75s tid=0x00007f709c09cd90 nid=0xa runnable  

"GC Thread#0" os_prio=0 cpu=28683.38ms elapsed=4637.76s tid=0x00007f709c03afe0 nid=0x9 runnable  

"VM Periodic Task Thread" os_prio=0 cpu=2280.39ms elapsed=4637.72s tid=0x00007f709c0fecb0 nid=0x14 waiting on condition  

JNI global refs: 26, weak refs: 4

When I saw these messages, I noticed that the output is longer than I expected, and it appears that there are numerous active threads that have not been released:

$ jstack -l 7 | grep client
"client-4-1" #40 prio=5 os_prio=0 cpu=8.82ms elapsed=5109.72s tid=0x00007f7010028a10 nid=0x2f runnable  [0x00007f6ff3dfc000]
"client-15-1" #52 prio=5 os_prio=0 cpu=2.80ms elapsed=5104.13s tid=0x00007f7004009080 nid=0x3b runnable  [0x00007f6ff2af5000]
"client-22-1" #61 prio=5 os_prio=0 cpu=2.61ms elapsed=5098.99s tid=0x00007f6ff4002740 nid=0x44 runnable  [0x00007f6ff30fb000]
"client-42-1" #85 prio=5 os_prio=0 cpu=2.51ms elapsed=5093.44s tid=0x00007f6ff40047c0 nid=0x5c runnable  [0x00007f6ff1ee9000]
"client-51-1" #96 prio=5 os_prio=0 cpu=2.34ms elapsed=5088.16s tid=0x00007f700c0053c0 nid=0x67 runnable  [0x00007f6ff26f1000]
...
"client-5629-1" #7231 prio=5 os_prio=0 cpu=0.86ms elapsed=5.25s tid=0x00007f6ff4050290 nid=0x1dfa runnable  [0x00007f6e5afc8000]
"client-5630-1" #7234 prio=5 os_prio=0 cpu=4.83ms elapsed=0.24s tid=0x00007f6ff405b0f0 nid=0x1dfd waiting on condition  [0x00007f6e5aec7000]
"client-5631-1" #7235 prio=5 os_prio=0 cpu=4.24ms elapsed=0.24s tid=0x00007f7004137830 nid=0x1dfe waiting on condition  [0x00007f6e5f40c000]
"client-5632-1" #7236 prio=5 os_prio=0 cpu=4.24ms elapsed=0.23s tid=0x00007f7018177c10 nid=0x1dff waiting on condition  [0x00007f6e5cae3000]
"client-5633-1" #7237 prio=5 os_prio=0 cpu=0.23ms elapsed=0.18s tid=0x00007f701017e7e0 nid=0x1e0d runnable  [0x00007f6e5b1ca000]

$ jstack -l 7 | grep client | wc -l
685

Therefore, it's now clear that too many unreleased threads are causing the leak. To fix it, I tracked the call stack based on the thread name in the log and found some unhandled errors. These errors are the reason threads aren't being released.

Conclusion

Profiling and debugging memory leaks in a Java application, especially if you're new to the Java ecosystem, might appear challenging. However, with the right tools and techniques, one can efficiently pinpoint the issues and ensure efficient application performance. Regular monitoring, understanding error messages, and leveraging built-in JDK tools can illuminate potential pitfalls and lead to effective solutions.

References


Profile picture

Written by J.W. Liang, who lives in Munich and builds interesting things.
Follow him on LindedIn