概述

某Java服务(假设PID=10765)出现了OOM,最常见的原因为: - 有可能是内存分配确实过小,而正常业务使用了大量内存 - 某一个对象被频繁申请,却没有释放,内存不断泄漏,导致内存耗尽 - 某一个资源被频繁申请,系统资源耗尽,例如:不断创建线程,不断发起网络连接 画外音:无非“本身资源不够”“申请资源太多”“资源耗尽”几个原因

kubectl exec -it pod-695b96f88f-gngzr sh

/ # ps
PID   USER     TIME   COMMAND
    1 root       8:54 java -Djava.security.egd=file:/dev/./urandom -Dspring.cloud.config.uri=http://epic-config:3901 -Dspring.cloud.config.label=dev -jar /epic-blazer-DEV-
  877 root       0:00 sh
  884 root       0:00 ps

一、确认是不是内存本身就分配过小

jmap -heap 1

输出:

/ # jmap -heap 1
Attaching to process ID 1, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.112-b15

using thread-local object allocation.
Parallel GC with 33 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 32210157568 (30718.0MB)
   NewSize                  = 702545920 (670.0MB)
   MaxNewSize               = 10736369664 (10239.0MB)
   OldSize                  = 1405091840 (1340.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 6605504512 (6299.5MB)
   used     = 384222008 (366.4226608276367MB)
   free     = 6221282504 (5933.077339172363MB)
   5.816694353958834% used
From Space:
   capacity = 68157440 (65.0MB)
   used     = 0 (0.0MB)
   free     = 68157440 (65.0MB)
   0.0% used
To Space:
   capacity = 68157440 (65.0MB)
   used     = 0 (0.0MB)
   free     = 68157440 (65.0MB)
   0.0% used
PS Old Generation
   capacity = 4105699328 (3915.5MB)
   used     = 76364000 (72.82638549804688MB)
   free     = 4029335328 (3842.673614501953MB)
   1.8599511045344623% used

36468 interned Strings occupying 4387784 bytes.

如上图,可以查看新生代,老生代堆内存的分配大小以及使用情况,看是否本身分配过小

二、找到最耗内存的对象

命令

jmap -histo:live 1 | more

输出

/ # jmap -histo:live 1 |more

 num     #instances         #bytes  class name
----------------------------------------------
   1:        185342       19036168  [C
   2:         72312        6363456  java.lang.reflect.Method
   3:        183636        4407264  java.lang.String
   4:         94630        3785200  java.util.LinkedHashMap$Entry
   5:          7657        3397904  [I
   6:         95453        3054496  java.util.concurrent.ConcurrentHashMap$Node
   7:          9991        2886032  [B
   8:         41504        2707192  [Ljava.lang.Object;
   9:         32210        2449584  [Ljava.util.HashMap$Node;
  10:         17683        1951240  java.lang.Class
  11:         31631        1771336  java.util.LinkedHashMap
  12:         48706        1558592  java.util.HashMap$Node
  13:          1064        1084592  [Ljava.util.concurrent.ConcurrentHashMap$Node;
  14:         44244         932072  [Ljava.lang.Class;
  15:         37739         905736  java.util.ArrayList
  16:         23726         759232  java.lang.ref.WeakReference

如上图,输入命令后,会以表格的形式显示存活对象的信息,并按照所占内存大小排序: - 实例数 - 所占内存大小 - 类名

三、确认是否是资源耗尽

工具: - pstree - netstat 查看进程创建的线程数,以及网络连接数,如果资源耗尽,也可能出现OOM

这里介绍另一种方法,通过

ll /proc/${PID}/fd
ll /proc/${PID}/task

ll /proc/${PID}/fd | wc -l
ll /proc/${PID}/task | wc -l (效果等同pstree -p | wc -l)

可以分别查看句柄详情和线程数。