赞
踩
目录
阅读/bionic/libc/malloc_debug/README.md
环境:Android10
目的:调试native 进程内存泄露,内存分配,踩内存相关问题。
Malloc Debug, 如果启用了malloc debug 功能,则将替换如下函数,而使用malloc_debug的函数
- malloc
- free
- calloc
- realloc
- posix_memalign
- memalign
- aligned_alloc
- malloc_usable_size
Controlling Malloc Debug Behavior
每次调用malloc,都在分配的区域之前填充SIZE_BYTES,填充内容为0xaa
例子:setprop libc.debug.malloc.options front_guard=16
- ptr = (char*)malloc(1 * 1024 * 1024);
- memset(ptr, 0, 1 * 1024 * 1024);
- printf("*(ptr - 16) = %d\n", *(ptr - 16)); // 这里打印170 (0XAA)
- *(ptr - 16) = 0xee; //这里会出现memory corruption occuring
- free(ptr) // malloc debug 会在free的时候检测那些出现内存损坏的地方
因为该程序出现了内存踩踏,所以会打印如下log
每次调用malloc,都在指向内存的最后填充SIZE_BYTES,填充内容为0xbb
举例:setprop libc.debug.malloc.options rear_guard=16
- ptr = (char*)malloc(1 * 1024 * 1024);
- memset(ptr, 0, 1 * 1024 * 1024);
- printf("*(ptr - 16) = %d\n", *(ptr - 16)); // 这里打印187 (0Xbb)
- *(ptr + 1*1024*1024) = 0xee; //这里会出现memory corruption occuring
- free(ptr) // malloc debug 会在free的时候检测出现内存损坏的地方
这个选项包含了 front_guard 和 rear_guard。在指向内存的前后连续SIZE_BYTES 分别填充0xaa 和 0xbb
MAX_FRAMES 最大值256 默认值16
每次在调用malloc时,malloc debug 都会记录malloc的调用栈(trace),栈的最大深度为MAX_FRAMES,MAX_FRAMES 越大,对malloc的性能影响越大。
当进程收到信号(SIGRTMAX - 17)Android通常该信号值为47时,会触发malloc debug 的dump heap trace功能。默认dump路径在/data/local/tmp/ backtrace_heap.PID.txt。
给进程发信号通过kill -s 47 PID,进程收到信号后并不会马上dump backtrace,二是会等到下次调用malloc 或者 free时才会触发。所以如果发送信号后没有产生trace文件,请继续针对调试的进程做测试。
举例:
setprop libc.debug.malloc.options backtrace=5
得到的dump文件为 $pid.txt 结尾
使能这个选项,通过给进程发送信号 45,可以动态开启和关闭backtrace 功能。
进程退出后自动dump trace 文件,得到的dump文件为 $pid.exit.txt 结尾
trace dump的路径,如果需要放置其他目录,如/sdcard/heap, 则dump的文件路径为/sdcard/heap.$PID.txt
程序结束后,如果有未free的指针,logcat中会打印出来
- 12-19 14:54:32.583 7971 7971 E malloc_debug: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
- 12-19 14:55:02.585 7971 7971 E malloc_debug: +++ androidtest leaked block of size 3072 at 0x736e78f030 (leak 1 of 7) //内存泄露的log,直到程序退出未释放的内存
- 12-19 14:55:02.585 7971 7971 E malloc_debug: Backtrace at time of allocation:
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #00 pc 00000000000152b0 /apex/com.android.runtime/lib64/libc_malloc_debug.so (debug_calloc+432)
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #01 pc 000000000000114c /system/bin/androidtest // 通过 addr2line -e symbols/system/bin/androidtest 000000000000114c 可以得到具体是哪一个指针未释放内存。
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #02 pc 000000000007d86c /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108)
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #03 pc 000000000000104c /system/bin/androidtest
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #04 pc 00000000000533f4 /apex/com.android.runtime/bin/linker64
- 12-19 14:55:02.585 7971 7971 E malloc_debug: +++ androidtest leaked block of size 88 at 0x736e631030 (leak 2 of 7)
对进程中使用malloc, calloc, realloc 的地方进行记录,打印的格式如下
Threadid: action pointer size
471: malloc 0x72e00330c0 6
471: realloc 0x72e0005220 0x72e00330c0 12
471: free 0x72e012fcc0
471: free 0x72e0005220
471: malloc 0x72e01ade40 56
471: malloc 0x72e00330c0 6
注意,最大记录8,000,000 条
打开malloc debug 更多log,类似
- 08-16 15:54:16.060 26947 26947 I libc : /system/bin/app_process64: malloc debug enabled
-
- 09-10 01:03:50.070 557 557 I malloc_debug: /system/bin/audioserver: Run: 'kill -47 557' to dump the backtrace.
该工具可以将dump trace文件通过符号表得到当前未释放内存的指针在代码中的行号。准确性依赖trace保存栈的最大深度。所以最好有两份该文件,分别是不同占用内存时dump 得到的。对比查看哪个指针嫌疑最大,缩小范围继续排查。
例子:
native_heapdump_viewer.py backtrace_heap.7971.exit.txt --symbols ~/Download/symbols --reverse > backtrace_heap.7971.exit.txt.heapout
- #include <string.h>
- #include <cutils/log.h>
- #include <stdlib.h>
- #include <sys/time.h>
- #include <unistd.h>
- #include <time.h>
- #include <vector>
-
- class Foo {
- public:
- Foo() {
- arry = new char[1 * 1024 * 1024];
- };
-
- ~Foo() {
- delete[] arry;
- };
- char* arry;
-
- };
- using namespace std;
- int main(int argc, char* argv[])
- {
- char *ptr = NULL, *ptr1 = NULL, *ptr2 = NULL, *ptr3 = NULL;
- Foo foo;
-
- printf("my pid is: %d\n", getpid());
- ptr = (char*)malloc(1 * 1024 * 1024);
- if (!ptr) {
- return -1;
-
- }
- printf("ptr = (char*)malloc(1 * 1024 * 1024): %p\n", ptr);
- memset(ptr, 0, 1 * 1024 * 1024);
-
- // front_guard test
- // setprop libc.debug.malloc.options front_guard=16
- // ptr 指向的内存之前的连续16字节被填充为0xaa,修改该值
- printf("*(ptr - 16) = %d\n", *(ptr - 16));
- *(ptr - 16) = 0xee;
-
- // rear_guard test
- // setprop libc.debug.malloc.options rear_guard=16
- // ptr 指向的内存之后的连续16字节被填充为0xbb,修改该值
- printf("*(ptr + 1 * 1024 * 1024) = %d\n", *(ptr + 1 * 1024 * 1024));
- *(ptr + 1 * 1024 * 1024) = 0xee;
-
- printf("arry = new char[1 * 1024 * 1024]: %p\n", foo.arry);
-
- // free 操作会触发一个踩内存的log
- free(ptr);
- printf("memory corruption occuring, please check log \n");
-
- // 接下来为了验证 backtrace选项,发送信号之后需要有malloc/free操作才能触发dump,所以休眠30s,再次调用malloc
- printf("sleep 30s, please send signal 47 (kill -s 47 mypid) to me, in order to trigger dump trace\n");
- sleep(30);
- ptr1 = (char*)malloc(1 * 1024);
- if (ptr1) {
- printf("ptr1 = (char*)malloc(1 * 1024 * 1024): %p\n", ptr1);
- }
- printf("dump trace success, please check\n\n");
-
- ptr3 = (char*)calloc(3, 1 * 1024); // 该内存程序结束时未释放,会在 dumptrace解析文件中打印出该行号
- if (ptr3) {
- printf("ptr3 = (char*)malloc(3 * 1024 * 1024): %p\n", ptr3);
- }
- printf("the programe will exit after 10s, and auto dump tracing if you enable backtrace_dump_on_exit option\n");
-
- free(ptr1);
- free(ptr2);
-
- vector<char*> vec(3);
-
- vec[0] = new char[5]; // 该内存程序结束时未释放,会在 dumptrace解析文件中打印出该行号
- vec[1] = new char[6]; // 该内存程序结束时未释放,会在 dumptrace解析文件中打印出该行号
- vec[2] = new char[7]; // 该内存程序结束时未释放,会在 dumptrace解析文件中打印出该行号
-
- memcpy(vec[0], "111", 3);
- memcpy(vec[1], "222", 3);
- memcpy(vec[2], "333", 3);
-
- printf("front: %p, begin: %p, size %lu\n", vec.front(), *(vec.begin()), vec.size());
- // 只进行clear 并没有释放内存,仍然会打印
- vec.clear();
-
- //如果需要释放内存,请使用如下代码,否则vec内存不会释放
- //for (vector<char *>::iterator it = vec.begin(); it != vec.end(); it++) {
- // if (NULL != *it)
- // {
- // delete *it;
- // *it = NULL;
- // }
- //}
-
- // ptr3 和 vec 内存都没有是否. 如果使能了leak_track,那么将打印内存泄露log
- printf("the programe exit, ptr3 is not free, logcat will printf memory leak log\n");
-
- return 0;
- }
- adb shell logcat -s malloc_debug
- --------- beginning of system
- 12-19 14:54:32.582 7971 7971 E malloc_debug: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
- 12-19 14:54:32.582 7971 7971 E malloc_debug: +++ ALLOCATION 0x736ddf4630 SIZE 1048576 HAS A CORRUPTED FRONT GUARD //打开选项front_guard[] 或者 guard=[], 打印的内存踩踏log
- 12-19 14:54:32.582 7971 7971 E malloc_debug: allocation[-16] = 0xee (expected 0xaa)
- 12-19 14:54:32.582 7971 7971 E malloc_debug: Backtrace at time of failure:
- 12-19 14:54:32.582 7971 7971 E malloc_debug: #00 pc 0000000000014940 /apex/com.android.runtime/lib64/libc_malloc_debug.so
- 12-19 14:54:32.582 7971 7971 E malloc_debug: #01 pc 0000000000014800 /apex/com.android.runtime/lib64/libc_malloc_debug.so (debug_free+144)
- 12-19 14:54:32.582 7971 7971 E malloc_debug: #02 pc 00000000000010f4 /system/bin/androidtest
- 12-19 14:54:32.582 7971 7971 E malloc_debug: #03 pc 000000000007d86c /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108)
- 12-19 14:54:32.582 7971 7971 E malloc_debug: #04 pc 000000000000104c /system/bin/androidtest
- 12-19 14:54:32.582 7971 7971 E malloc_debug: #05 pc 00000000000533f4 /apex/com.android.runtime/bin/linker64
- 12-19 14:54:32.583 7971 7971 E malloc_debug: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
- 12-19 14:54:32.583 7971 7971 E malloc_debug: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
- 12-19 14:54:32.583 7971 7971 E malloc_debug: +++ ALLOCATION 0x736ddf4630 SIZE 1048576 HAS A CORRUPTED REAR GUARD //打开选项rear_guard[] 或者 guard=[], 打印的内存踩踏log
- 12-19 14:54:32.583 7971 7971 E malloc_debug: allocation[1048576] = 0xee (expected 0xbb)
- 12-19 14:54:32.583 7971 7971 E malloc_debug: Backtrace at time of failure:
- 12-19 14:54:32.583 7971 7971 E malloc_debug: #00 pc 000000000001496c /apex/com.android.runtime/lib64/libc_malloc_debug.so
- 12-19 14:54:32.583 7971 7971 E malloc_debug: #01 pc 0000000000014800 /apex/com.android.runtime/lib64/libc_malloc_debug.so (debug_free+144)
- 12-19 14:54:32.583 7971 7971 E malloc_debug: #02 pc 00000000000010f4 /system/bin/androidtest
- 12-19 14:54:32.583 7971 7971 E malloc_debug: #03 pc 000000000007d86c /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108)
- 12-19 14:54:32.583 7971 7971 E malloc_debug: #04 pc 000000000000104c /system/bin/androidtest
- 12-19 14:54:32.583 7971 7971 E malloc_debug: #05 pc 00000000000533f4 /apex/com.android.runtime/bin/linker64
- 12-19 14:54:32.583 7971 7971 E malloc_debug: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
- 12-19 14:55:02.585 7971 7971 E malloc_debug: +++ androidtest leaked block of size 3072 at 0x736e78f030 (leak 1 of 7) //内存泄露的log,直到程序退出未释放的内存
- 12-19 14:55:02.585 7971 7971 E malloc_debug: Backtrace at time of allocation:
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #00 pc 00000000000152b0 /apex/com.android.runtime/lib64/libc_malloc_debug.so (debug_calloc+432)
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #01 pc 000000000000114c /system/bin/androidtest // 通过 addr2line -e symbols/system/bin/androidtest 000000000000114c 可以得到具体是哪一个指针未释放内存。
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #02 pc 000000000007d86c /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108)
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #03 pc 000000000000104c /system/bin/androidtest
- 12-19 14:55:02.585 7971 7971 E malloc_debug: #04 pc 00000000000533f4 /apex/com.android.runtime/bin/linker64
- 12-19 14:55:02.588 7971 7971 E malloc_debug: +++ androidtest leaked block of size 7 at 0x736e626a60 (leak 5 of 7)
- 12-19 14:55:02.588 7971 7971 E malloc_debug: Backtrace at time of allocation:
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #00 pc 000000000001470c /apex/com.android.runtime/lib64/libc_malloc_debug.so
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #01 pc 0000000000014534 /apex/com.android.runtime/lib64/libc_malloc_debug.so (debug_malloc+108)
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #02 pc 00000000000675c0 /system/lib64/libc++.so (operator new(unsigned long)+32)
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #03 pc 00000000000011a8 /system/bin/androidtest // 通过addr2line 得到未释放内存的指针
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #04 pc 000000000007d86c /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108)
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #05 pc 000000000000104c /system/bin/androidtest
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #06 pc 00000000000533f4 /apex/com.android.runtime/bin/linker64
- 12-19 14:55:02.588 7971 7971 E malloc_debug: +++ androidtest leaked block of size 6 at 0x736e626a10 (leak 6 of 7)
- 12-19 14:55:02.588 7971 7971 E malloc_debug: Backtrace at time of allocation:
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #00 pc 000000000001470c /apex/com.android.runtime/lib64/libc_malloc_debug.so
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #01 pc 0000000000014534 /apex/com.android.runtime/lib64/libc_malloc_debug.so (debug_malloc+108)
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #02 pc 00000000000675c0 /system/lib64/libc++.so (operator new(unsigned long)+32)
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #03 pc 0000000000001198 /system/bin/androidtest // 通过addr2line 得到未释放内存的指针
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #04 pc 000000000007d86c /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108)
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #05 pc 000000000000104c /system/bin/androidtest
- 12-19 14:55:02.588 7971 7971 E malloc_debug: #06 pc 00000000000533f4 /apex/com.android.runtime/bin/linker64
- 12-19 14:55:02.589 7971 7971 E malloc_debug: +++ androidtest leaked block of size 5 at 0x736e6269c0 (leak 7 of 7)
- 12-19 14:55:02.589 7971 7971 E malloc_debug: Backtrace at time of allocation:
- 12-19 14:55:02.589 7971 7971 E malloc_debug: #00 pc 000000000001470c /apex/com.android.runtime/lib64/libc_malloc_debug.so
- 12-19 14:55:02.589 7971 7971 E malloc_debug: #01 pc 0000000000014534 /apex/com.android.runtime/lib64/libc_malloc_debug.so (debug_malloc+108)
- 12-19 14:55:02.589 7971 7971 E malloc_debug: #02 pc 00000000000675c0 /system/lib64/libc++.so (operator new(unsigned long)+32)
- 12-19 14:55:02.589 7971 7971 E malloc_debug: #03 pc 0000000000001188 /system/bin/androidtest // 通过addr2line 得到未释放内存的指针
- 12-19 14:55:02.589 7971 7971 E malloc_debug: #04 pc 000000000007d86c /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108)
- 12-19 14:55:02.589 7971 7971 E malloc_debug: #05 pc 000000000000104c /system/bin/androidtest
- 12-19 14:55:02.589 7971 7971 E malloc_debug: #06 pc 00000000000533f4 /apex/com.android.runtime/bin/linker64
- 12-19 14:55:02.590 7971 7971 E malloc_debug: Dumping to file: /data/local/tmp/backtrace_heap.7971.exit.txt
稳定性测试一周后,遇到如下问题,android.hardware.graphics.composer@2.1-service 进程占用内存很大。
分析:
调试:
1. 由于该进程开机自动启动,需要开机自动设置malloc debug属性值
任意.rc 文件加入如下内容
on post-fs-data
#路径的组权限需要和调试进程保持一致,否则权限错误(即使关闭了selinux),无法dump
mkdir /data/aa
#当前案例的进程组为system
chown system system /data/aa
#setprop 的长度有限制,尽量使用简单的路径前缀
setprop libc.debug.malloc.options "backtrace=8 leak_track backtrace_dump_on_exit
backtrace_dump_prefix=/data/aa/tra"
2. 抓取dump trace文件
2.1 开机后不进行任何操作,灭屏状态下发送信号47 得到 dump文件
heapdumpfile 为最后需要对比的文件
2.2 开机后复现问题, 然后清除后台所有进程,灭屏待机状态下发送信号47 再次得到 dump文件
从2.1 和 2.2 可以看出,复现问题之后,同样灭屏清除所有后台的情况下,调试进程的内存大了2M, 并且一直不下降。现在对比两次的heapdumpfile 文件. 找到怀疑点后,通知模块负责人仔细排查代码. 查找问题根因
- if [[ $1 == "--help" || $1 == "help" ]]; then
- echo "----------usage-------"
- echo "enable dump records_malloc, the malloc debug will use multiple memeory"
- echo "---------command: malloc_debug.sh logpath [records_malloc]"
- echo "disable dump records_malloc"
- echo "---------command: malloc_debug.sh logpath"
- exit
- elif [[ $1 == "" ]]; then
- echo "malloc_debug.sh --help"
- exit
- fi
-
- logpath=$1
- records_malloc_enable=$2
-
- adb root
- adb wait-for-device
- #need close selinux, others dump failure
- adb shell setenforce 0
-
- psstr=`adb shell ps -A | grep "android.hardware.graphics.composer@2.1-service"`
- pid=`echo $psstr | awk -F ' ' '{print $2}'`
- echo "dump pid = "${pid}
-
- time=$logpath"/"$(date +%m%d%H%M%S)
- echo "log path: "$time
- mkdir ${time}
-
- #get maps and smaps info
- adb shell cat /proc/$pid/maps > $time/$pid"_maps"
- adb shell cat /proc/$pid/smaps > $time/$pid"_smaps"
-
- #get all process info before dump
- adb shell ps -AT > ./$time/processinfo.txt
- #get current meminfo before dump
- adb shell dumpsys meminfo | grep composer > ./$time/meminfo
-
- #dump heap trace
- adb shell kill -s 47 $pid
-
- if [[ $records_malloc_enable == "records_malloc" ]]; then
- #dump records
- adb shell kill -s 46 $pid
- fi
-
- ###################需要按照路径修改#####
- dump_path="/data/aa"
- records_malloc_file_name="m"
- ###################需要按照路径修改#####
- #check dump file
- while true
- do
- file_cnt=`adb shell ls $dump_path | wc -l`
- dumpcnt=1
- if [[ $records_malloc_enable == "records_malloc" ]]; then
- dumpcnt=2
- fi
-
- if [[ $file_cnt -eq $dumpcnt ]]; then
- echo `adb shell ls $dump_path`
- break
- fi
- sleep 1
- done
-
- #find the process name by pid and records pid
- #get current meminfo after dump
- echo `adb shell dumpsys meminfo | grep composer`
-
- adb pull $dump_path ./$time
-
- dirname="${dump_path##*/}"
-
- mv $time/$dirname/* $time
- rm -r ./$time/$dirname
- #remove the before dump file
- adb shell rm $dump_path/*
-
- #need open selinux, others some app will crash
- adb shell setenforce 1
-
- cd $time
-
- #--reverse 内存从大到小排序
- native_heapdump_viewer.py *$pid*".txt" --symbols ~/Downloads/symbols --reverse > heapdumpfile
遇到dump权限问题,dump路径组 和 进程组不一致导致
dumptrace 文件记录的是触发dump前未释放的内存指针,所以dump文件中不全是有异常的内存指针,需要仔细对比
dumptrace文件中记录的内存指针为内存申请的位置,需要查阅代码确认哪些地方用到了该指针,哪里可能没有释放。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。