赞
踩
记一次排查Python程序内存泄漏的问题。
工具 | 说明 |
---|---|
gc | Python标准库 内置模块 |
tracemalloc 推荐 | Python3.4 以上此工具为标准库 |
mem_top 推荐 | 是对 gc 的封装,能够排序输出最多的 Top N,执行快 |
guppy | 可以对堆里边的对象进行统计, 算是比较实用;但计算耗时长 |
objgraph | 可以绘制对象引用图, 对于对象种类较少, 结构比较简单的程序适用 |
pympler | 可以统计内存里边各种类型的使用, 获取对象的大小 |
pyrasite | 非常强大的第三方库, 可以渗透进入正在运行的python进程动态修改里边的数据和代码 |
各个工具官网文档都有详细说明,也有基本示例用法,本文简单介绍工具的常见使用。
gc 作为内置模块,Python2 和 Python3 都支持,用起来非常方便。
常用的方法有:
gc.collect(generation=2)
若被调用时不包含参数,则启动完全的垃圾回收;在排查内存泄漏时,为避免垃圾未及时回收的影响,在统计前可以先手动调用一下垃圾回收;gc.get_objects()
返回一个收集器所跟踪的所有对象列表;gc.get_referrers(*objs)
返回 直接
引用任意一个 objs 的对象列表。这个函数只定位支持垃圾回收的容器;引用了其它对象但不支持垃圾回收的扩展类型不会被找到。gc.get_referents(*ojbs)
返回 被
任意一个参数中的对象直接引用的对象的列表,在排查内存泄漏中一般需要排查被引用的对象列表;sys.getsizeof(obj)
返回对象的大小(以字节为单位), 只计算直接分配给对象的内存消耗,不计算它所引用的对象的内存消耗。示例用法:
import gc, sys
def top_memory(limit=3):
gc.collect()
objs_by_size = []
for obj in gc.get_objects():
size = sys.getsizeof(obj)
objs_by_size.append((obj, size))
# 按照内存分配大小排序
sorted_objs = sorted(objs_by_size, key=lambda x: x[1], reverse=True)
for obj, size in sorted_objs[:limit]:
print(f"size: {size/1024/1024:.2f}MB, type: {type(obj)}, obj: {id(obj)} ")
# 输出被引用列表
for item in gc.get_referents(obj):
print(f"{item}\n")
Python3.4 以上的内置库。
tracemalloc 模块是一个用于对 python 已申请的内存块进行debug的工具。它能提供以下信息:
常用函数介绍:
tracemalloc.start()
可以在运行时调用函数来启动追踪 Python 内存分配tracemalloc.take_snapshot()
保存一个由 Python 分配的内存块的追踪的快照。 返回一个新的 Snapshot 实例Snapshot.compare_to
计算与某个旧快照的差异代码示例:
import tracemalloc
tracemalloc.start()
# ... start your application ...
snapshot1 = tracemalloc.take_snapshot()
# ... call the function leaking memory ...
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ Top 10 differences ]")
for stat in top_stats[:10]:
print(stat)
官网有非常详细的说明文档和使用示例,详见
mem_top 其实是对 gc
模块的方法的封装,调用 mem_top.mem_top()
函数能够直接打印出按照 被引用数量
、占用内存大小
、按照类型统计对象个数
三种方式排序的 top N
信息。
安装 pip install mem-top
函数说明:
mem_top(
limit=10, # limit of top lines per section
width=100, # width of each line in chars
sep='\n', # char to separate lines with
refs_format='{num}\t{type} {obj}', # format of line in "refs" section
bytes_format='{num}\t {obj}', # format of line in "bytes" section
types_format='{num}\t {obj}', # format of line in "types" section
verbose_types=None, # list of types to sort values by `repr` length
verbose_file_name='/tmp/mem_top', # name of file to store verbose values in
)
示例 mem_top.mem_top(limit=3, width=200)
输出:
refs:
1638 <type 'dict'> {'IPython.core.error': <module 'IPython.core.error' from '/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/IPython/core/error.pyc'>, 'ipython_genutils.py3compat': <module 'ipython_ge
765 <type 'list'> [u'd = {\n "@babel/core": "^7.24.4",\n "@babel/plugin-proposal-class-properties": "^7.18.6",\n "@babel/preset-env": "^7.9.5",\n "@jest/globals": "^29.7.0",\n "babel-eslint": "^10.1.0",\
765 <type 'list'> [u'd = {\n "@babel/core": "^7.24.4",\n "@babel/plugin-proposal-class-properties": "^7.18.6",\n "@babel/preset-env": "^7.9.5",\n "@jest/globals": "^29.7.0",\n "babel-eslint": "^10.1.0",\
bytes:
49432 {'IPython.core.error': <module 'IPython.core.error' from '/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/IPython/core/error.pyc'>, 'ipython_genutils.py3compat': <module 'ipython_ge
33000 set(['disp', 'union1d', 'all', 'issubsctype', 'atleast_2d', 'setmember1d', 'restoredot', 'ptp', 'blackman', 'pkgload', 'tostring', 'tri', 'arrayrange', 'array_equal', 'item', 'indices', 'loads', 'roun
12584 {u'': 0, u'pmem_top.mem_top(limit=3, width=200) ': 37, u'primem_top.mem_top(limit=3, width=200) ': 39, u'printmem_top.mem_top() ': 23, u'print mem_top.mem_top(limit) ': 29, u'print mem_top.mem_top(lim
types:
8581 <type 'function'>
7527 <type 'tuple'>
6102 <type 'dict'>
gunppy是一个非常强大的工具,但同时 缺点
也比较明细,执行耗时不适合生产debug。
安装 pip install guppy
注意
该库会寻找使用对象的dir
相关属性,注意若是自行实现的__dir__
函数有问题,会导致该库初始化出现异常。
常用示例:
import datetime import guppy # 初始化了SessionContext,使用它可以访问heap信息 analyzer = guppy.hpy() def do_something(): # run your app ... print("==={} heap total===".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))) # 返回heap内存详情 heap = analyzer.heap() print(heap) # byvia返回该对象的被哪些引用, heap[0]是内存消耗最大的对象 print("==={} references===".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))) references = heap[0].byvia print(references) print("==={} references detail===".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))) print(references[0].kind) # 类型 print(references[0].shpaths) # 路径 print(references[0].rp) # 引用
输出结果:
===2024-07-21 16:27:12 heap total=== Partition of a set of 785315 objects. Total size = 104732120 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 396372 50 35974232 34 35974232 34 unicode 1 23029 3 23814136 23 59788368 57 dict (no owner) 2 143799 18 13556704 13 73345072 70 str 3 75473 10 7372992 7 80718064 77 tuple 4 1085 0 2634680 3 83352744 80 dict of module 5 2764 0 2500384 2 85853128 82 type 6 19206 2 2458368 2 88311496 84 types.CodeType 7 15857 2 2409224 2 90720720 87 list 8 19402 2 2328240 2 93048960 89 function 9 2764 0 2215840 2 95264800 91 dict of type <931 more rows. Type e.g. '_.more' to view.> ===2024-07-21 16:27:14 references=== Partition of a set of 396372 objects. Total size = 35974232 bytes. Index Count % Size % Cumulative % Referred Via: 0 18748 5 1371888 4 1371888 4 '.keys()[0]' 1 13046 3 974352 3 2346240 7 '.keys()[1]' 2 9958 3 724328 2 3070568 9 '.keys()[2]' 3 9027 2 658576 2 3729144 10 '.keys()[3]' 4 8636 2 632264 2 4361408 12 '.keys()[4]' 5 8175 2 607032 2 4968440 14 '.keys()[5]' 6 715 0 515688 1 5484128 15 '.func_doc', '[0]' 7 6557 2 502880 1 5987008 17 '.keys()[6]' 8 5785 1 428904 1 6415912 18 '.keys()[7]' 9 5168 1 392432 1 6808344 19 '.keys()[8]' <3213 more rows. Type e.g. '_.more' to view.> ===2024-07-21 16:27:16 references detail=== <via '.keys()[0]'> 0: hpy().Root.i0_modules['kombu'].__dict__.keys()[0] Reference Pattern by <[dict of] class>. 0: _ --- [-] 18748 <via '.keys()[0]'>: 0x7ff3f82dec30, 0x7ff3f82decc0... 1: a [-] 18753 dict (no owner): 0x7ff3f82f7050*24, 0x7ff3f82f73b0*3... 2: aa ---- [-] 317 dict (no owner): 0x7ff3f88e43b0*1, 0x7ff3f88e44d0*1... 3: a3 [-] 77 dict of aliyunsdkcore.endpoint.endpoint_resolver_rules.En... 4: a4 ------ [-] 77 aliyunsdkcore.endpoint.endpoint_resolver_rules.EndpointR... 5: a5 [-] 77 list: 0x7ff3f88f65f0*6, 0x7ff3f897e7d0*6... 6: a6 -------- [-] 77 dict of aliyunsdkcore.endpoint.chained_endpoint_resolv... 7: a7 [+] 77 aliyunsdkcore.endpoint.chained_endpoint_resolver.Chai... 8: aab ---- [-] 80 dict (no owner): 0x7ff3f88e44d0*1, 0x7ff3f88e8b90*1... 9: aaba [-] 78 dict of aliyunsdkcore.retry.retry_condition.DefaultConfi... <Type e.g. '_.more' for more.>
除了官网的文档,还可以通过类的属性查看相关说明:
analyzer = guppy.hpy()
heap = analyzer.heap()
print("============== Heap Documents ====================")
print(analyzer.doc)
print("============= Heap Status Documents ================")
print(heap.doc)
输出:
============== Heap Documents ==================== Top level interface to Heapy. Available attributes: Anything Nothing Via iso Class Rcs doc load Clodo Root findex monitor Id Size heap pb Idset Type heapu setref Module Unity idset test Use eg: hpy().doc.<attribute> for info on <attribute>. ============= Heap Status Documents ================ biper byvia get_examples parts brief count get_render pathsin by dictof get_rp pathsout byclass diff get_shpaths referents byclodo disjoint imdom referrers byid doc indisize rp byidset dominos kind shpaths bymodule domisize maprox size byrcs dump more sp bysize er nodes stat bytype fam owners test_contains byunity get_ckc partition theone
从Heap Status的说明中可以看到,除了 byvia
统计方法外,还有其他方式,这里介绍几种:
byvia
堆状态的此属性根据引用的对象对堆状态条目进行分组;bysize
堆状态的此属性根据对象的单独大小对堆状态条目进行分组;bytype
堆状态的此属性按对象类型对堆状态条目进行分组,所有dict条目将合并为一个条目;byrcs
堆状态的此属性按引用者类型对堆状态条目进行分组;bymodule
堆状态的此属性按模块对堆状态条目进行分组;byunity
堆状态的此属性按总大小对堆状态条目进行分组;byidset
堆状态的此属性按 idset 对堆状态条目进行分组;byid
堆状态的此属性按内存地址对堆状态条目进行分组;一般情况下
byvia
和bysize
就能解决很多场景的问题。
更多使用示例可以参考 guppy/heapy - Profile Memory Usage in Python
安装 pip install objgraph
为了快速概览内存中的对象,使用函数 show_most_common_types()
;
objgraph会对所有存活的对象进行快照,调用函数 show_growth
查看调用前后的变化。
常见用用法示例:
import objgraph
import datetime
def do_something():
# run your app ...
print("==={} show_most_common_types===".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
objgraph.show_most_common_types(limit=5)
print("==={} show_growth===".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
objgraph.show_growth(limit=5)
输出:
===2024-07-21 16:41:14 show_most_common_types===
function 18495
list 16072
dict 10912
tuple 6515
weakref 3773
===2024-07-21 16:41:14 show_growth===
function 18495 +18495
list 16072 +16072
dict 10903 +10903
tuple 6503 +6503
weakref 3773 +3773
objgraph
还可以直观的输出对象的引用关系图,需要搭配 xdot 使用。
安装 pip install pympler
常见用法示例:
import datetime
from pympler import tracker, muppy, summary
tr = tracker.SummaryTracker()
def do_something():
# run your app ...
print("==={} mem total===".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
print("==={} mem diff===".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
tr.print_diff()
输出结果:
===2024-07-21 16:17:47 mem total=== types | # objects | total size ========================= | =========== | ============ dict | 35489 | 32.33 MB str | 57287 | 5.50 MB unicode | 41150 | 3.55 MB type | 2748 | 2.37 MB code | 17055 | 2.08 MB list | 16024 | 1.80 MB tuple | 12969 | 1.74 MB set | 1704 | 539.06 KB weakref | 3741 | 321.49 KB function (__init__) | 1426 | 167.11 KB getset_descriptor | 2294 | 161.30 KB _sre.SRE_Pattern | 241 | 116.76 KB abc.ABCMeta | 124 | 109.70 KB wrapper_descriptor | 1371 | 107.11 KB collections.OrderedDict | 341 | 103.82 KB ===2024-07-21 16:17:47 mem diff=== types | # objects | total size ===================== | =========== | ============ list | 19695 | 3.77 MB str | 23061 | 1.44 MB dict | 505 | 344.71 KB unicode | 285 | 97.78 KB type | 91 | 80.27 KB code | 560 | 70.00 KB int | 2421 | 56.74 KB _io.BytesIO | 1 | 24.25 KB tuple | 296 | 20.49 KB _sre.SRE_Pattern | 25 | 9.86 KB weakref | 97 | 8.34 KB collections.deque | 7 | 4.77 KB getset_descriptor | 54 | 3.80 KB function (__repr__) | 32 | 3.75 KB function (__init__) | 31 | 3.63 KB
缺点
:统计耗时长,若是放在程序中容易阻塞进程执行,不适合生产debug。
安装 pip install pyrasite pyrasite-gui urwid meliae
还依赖系统的 gdb (version 7.3+)
虽说工具非常强大,是一个可以通过Python进程ID获取进程运行状态的工具,直接运行时查看非常的方便。
非常遗憾,在Mac和Centos系统都未尝试成功。
原始需求是排查Python2程序的问题,所以也是用的python2.7环境进行尝试使用:
出现错误1:
Complete output from command python setup.py egg_info: 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x25c5150>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /simple/cython/ Could not find a version that satisfies the requirement Cython (from versions: ) No matching distribution found for Cython Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-build-RqQ7F6/meliae/setup.py", line 96, in <module> config() File "/tmp/pip-build-RqQ7F6/meliae/setup.py", line 93, in config setup(**kwargs) File "/usr/lib/python2.7/site-packages/setuptools/__init__.py", line 161, in setup _install_setup_requires(attrs) File "/usr/lib/python2.7/site-packages/setuptools/__init__.py", line 156, in _install_setup_requires dist.fetch_build_eggs(dist.setup_requires) File "/usr/lib/python2.7/site-packages/setuptools/dist.py", line 721, in fetch_build_eggs replace_conflicting=True, File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 782, in resolve replace_conflicting=replace_conflicting File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1065, in best_match return self.obtain(req, installer) File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1077, in obtain return installer(requirement) File "/usr/lib/python2.7/site-packages/setuptools/dist.py", line 777, in fetch_build_egg return fetch_build_egg(self, req) File "/usr/lib/python2.7/site-packages/setuptools/installer.py", line 130, in fetch_build_egg raise DistutilsError(str(e)) distutils.errors.DistutilsError: Command '['/usr/bin/python2', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpryTZj0', '--quiet', 'Cython']' returned non-zero exit status 1 ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-RqQ7F6/meliae/
安装依赖报错,通过 pip install -U pip
解决。
安装成功后,找到Python进程ID为 75055
执行 pyrasite-memory-viewer 75055
出现错误2:
Traceback (most recent call last):
File "/Users/skyler/Documents/py-env/venv2.7/bin/pyrasite-memory-viewer", line 8, in <module>
sys.exit(main())
File "/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/pyrasite/tools/memory_viewer.py", line 150, in main
objects = loader.load(filename)
File "/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/meliae/loader.py", line 541, in load
source, cleanup = files.open_file(source)
File "/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/meliae/files.py", line 32, in open_file
source = open(filename, 'rb')
IOError: [Errno 2] No such file or directory: '/tmp/pyrasite-75055-objects.json'
简单通过 touch /tmp/pyrasite-75055-objects.json
继续执行:
Traceback (most recent call last):
File "/Users/skyler/Documents/py-env/venv2.7/bin/pyrasite-memory-viewer", line 8, in <module>
sys.exit(main())
File "/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/pyrasite/tools/memory_viewer.py", line 150, in main
objects = loader.load(filename)
File "/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/meliae/loader.py", line 556, in load
max_parents=max_parents)
File "/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/meliae/loader.py", line 635, in _load
factory=objs.add):
File "/Users/skyler/Documents/py-env/venv2.7/lib/python2.7/site-packages/meliae/loader.py", line 629, in iter_objs
% (line_num, len(objs), mb_read, input_mb, tdelta))
UnboundLocalError: local variable 'line_num' referenced before assignment
非常遗憾,pyrasite工具本文在Mac和Centos系统都未尝试成功。
环境:
这里使用的 mem_top
工具,执行耗时快,不影响业务进程提供服务;
定义了全局计数器 count
,每执行100次输出一次目前进程内存占用情况;
import logging import mem_top logger = logging.getLogger("mem-debug") # 自行配置logger相关配置 global count # 定义全局计数器 def do_something(): # run your app ... global count if count % 100 == 0: msg = mem_top.mem_top(limit=3, width=400) logger.info("{} {}".format(count, msg)) else: logger.debug(count) count += 1
截取部分输出:
refs:
157613189 <type 'list'> [<function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x
5742 <type 'list'> ['# module pyparsing.py\n', '#\n', '# Copyright (c) 2003-2018 Paul T. McGuire\n', '#\n', '# Permission is hereby granted, free of charge, to any person obtaining\n', '# a copy of this software and associated documentation files (the\n', '# "Software"), to deal in the Software without restriction, including\n', '# without limitation the rights to use, copy, modify, merge, publish,\n', '# distribut
4240 <type 'dict'> {'oss2.task_queue': <module 'oss2.task_queue' from '/usr/lib/python2.7/site-packages/oss2/task_queue.pyc'>, 'requests.Cookie': None, 'aliyunsdkcdn.request.v20180510': <module 'aliyunsdkcdn.request.v20180510' from '/usr/lib/python2.7/site-packages/aliyunsdkcdn/request/v20180510/__init__.pyc'>, 'elasticsearch.client.cat': <module 'elasticsearch.client.cat' from '/usr/lib/python2.7/site-packages/elas
bytes:
1377112608 [<function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x7f777e945398>, <function search_function at 0x
196888 {'oss2.task_queue': <module 'oss2.task_queue' from '/usr/lib/python2.7/site-packages/oss2/task_queue.pyc'>, 'requests.Cookie': None, 'aliyunsdkcdn.request.v20180510': <module 'aliyunsdkcdn.request.v20180510' from '/usr/lib/python2.7/site-packages/aliyunsdkcdn/request/v20180510/__init__.pyc'>, 'elasticsearch.client.cat': <module 'elasticsearch.client.cat' from '/usr/lib/python2.7/site-packages/elas
49432 {'FOLLOWLOCATION': 52, 'NETRC_IGNORED': 0, 'E_WRITE_ERROR': 23, 'CONTENT_LENGTH_UPLOAD': 3145744, 'SSLVERSION_TLSv1_0': 4, 'SSLVERSION_TLSv1_1': 5, 'SSLVERSION_TLSv1_2': 6, 'E_COULDNT_CONNECT': 7, 'NETRC_OPTIONAL': 1, 'IOCTLFUNCTION': 20130, 'MAX_SEND_SPEED_LARGE': 30145, 'QUOTE': 10028, 'E_ABORTED_BY_CALLBACK': 42, 'INFOTYPE_TEXT': 0, 'READDATA': 10009, 'POLL_NONE': 0, 'E_CONV_REQD': 76, 'MAXCONN
types:
19638 <type 'function'>
11322 <type 'dict'>
7124 <type 'tuple'>
从输出日志中可以看到内存泄漏是因为 <function search_function at 0x7f777e945398>
。
在代码中全局搜索 search_function
但并未发现使用,此时我们也可以通过其他工具是通过引用路径发现使用的地方,本人直接暴力从安装依赖库的路径去全局搜索了一下。
> cd ./py-env/venv2.7/lib/python2.7/site-packages/
> find . -type f -name "*.py" | xargs grep search_function
./gnupg/_util.py: codecs.register(encodings.search_function)
到此发现是 三方库 gnupg
中出现的问题 , 源码。
gnupg是一个加解密模块,在处理encode编码问题时,为了解决非utf-8的编码,lib内部处理编码时register了编码function,但没有unregister(python2.7也没有unregister函数,在python 3.10版本加入)
因为服务代码都是utf-8编码,不需要通过那个逻辑解决,注释了那行register代码,测试内存不泄漏。
由于时间紧迫,加上看lib作者已经很久没有维护改库了,所以使用 python-gnupg==0.4.6
替换了 gnupg==2.2.0
去解决了问题。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。