赞
踩
利用valgrind callgrind分析函数时间消耗(尤其在有网络、磁盘等有IO操作下,分析调用时间。在被IO阻塞情况下,不会明显占用CPU资源)
编写测试程序
- #include <stdio.h>
- #include <unistd.h>
-
- void perf1() {
- unsigned int i = 0;
- while (i < 200) {
- int b = i++;
- }
- usleep(10);
- }
-
- void perf() {
- int i = 0;
- while (i++ < 100000) {
- int b = i;
- }
- i = 0;
- while (i++ < 10) {
- perf1();
- }
- }
-
- int main() {
- unsigned int i = 0;
- while (i++ < 10000) {
- perf();
- }
- return 0;
- }
编译程序,注意加上-g选项,便于valgrind打印出源码对应行数
[yeqiang@localhost perfTest]$ gcc -g -O0 main.c
[yeqiang@localhost perfTest]$
[yeqiang@localhost perfTest]$ valgrind --tool=callgrind ./a.out
==15597== Callgrind, a call-graph generating cache profiler
==15597== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al.
==15597== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==15597== Command: ./a.out
==15597==
==15597== For interactive control, run 'callgrind_control -h'.
==15597==
==15597== Events : Ir
==15597== Collected : 7124669060
==15597==
==15597== I refs: 7,124,669,060
此时当前目录下产生文件:callgrind.out.15597(其中15597是进程id号)
利用valgrind callgrind_annotate获取报表,采用--inclusive=no参数
[yeqiang@localhost perfTest]$ valgrind callgrind_annotate --inclusive=no callgrind.out.15597
==15670== Memcheck, a memory error detector
==15670== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==15670== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==15670== Command: /usr/bin/callgrind_annotate --inclusive=no callgrind.out.15597
==15670==
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.15597' (creator: callgrind-3.10.1)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
LL cache:
Timerange: Basic block 0 - 1021390764
Trigger: Program termination
Profiled target: ./a.out (PID 15597, part 1)
Events recorded: Ir
Events shown: Ir
Event sort order: Ir
Thresholds: 99
Include dirs:
User annotated:
Auto-annotation: off
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
7,124,669,060 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
7,000,900,000 main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
121,300,004 main.c:perf1 [/home/yeqiang/CLionProjects/perfTest/a.out]
==15670==
==15670== HEAP SUMMARY:
==15670== in use at exit: 1,572,619 bytes in 6,334 blocks
==15670== total heap usage: 36,828 allocs, 30,494 frees, 14,135,079 bytes allocated
==15670==
==15670== LEAK SUMMARY:
==15670== definitely lost: 27,520 bytes in 16 blocks
==15670== indirectly lost: 1,543,568 bytes in 6,312 blocks
==15670== possibly lost: 67 bytes in 1 blocks
==15670== still reachable: 1,464 bytes in 5 blocks
==15670== suppressed: 0 bytes in 0 blocks
==15670== Rerun with --leak-check=full to see details of leaked memory
==15670==
==15670== For counts of detected and suppressed errors, rerun with: -v
==15670== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[yeqiang@localhost perfTest]$ valgrind callgrind_annotate --inclusive=yes callgrind.out.15597
==15734== Memcheck, a memory error detector
==15734== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==15734== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==15734== Command: /usr/bin/callgrind_annotate --inclusive=yes callgrind.out.15597
==15734==
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.15597' (creator: callgrind-3.10.1)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
LL cache:
Timerange: Basic block 0 - 1021390764
Trigger: Program termination
Profiled target: ./a.out (PID 15597, part 1)
Events recorded: Ir
Events shown: Ir
Event sort order: Ir
Thresholds: 99
Include dirs:
User annotated:
Auto-annotation: off
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
7,124,669,060 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
7,124,669,060 ???:0x0000000000000c80 [/usr/lib64/ld-2.21.so]
7,124,573,517 ???:_start [/home/yeqiang/CLionProjects/perfTest/a.out]
7,124,572,545 /usr/src/debug/glibc-2.21/csu/libc-start.c:(below main) [/usr/lib64/libc-2.21.so]
7,124,570,832 /home/yeqiang/CLionProjects/perfTest/main.c:main
7,124,570,832 main.c:main [/home/yeqiang/CLionProjects/perfTest/a.out]
7,124,500,819 main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
123,600,819 main.c:perf1 [/home/yeqiang/CLionProjects/perfTest/a.out]
2,300,000 /usr/src/debug/glibc-2.21/misc/../sysdeps/unix/sysv/linux/usleep.c:usleep [/usr/lib64/libc-2.21.so]
==15734==
==15734== HEAP SUMMARY:
==15734== in use at exit: 1,564,876 bytes in 6,213 blocks
==15734== total heap usage: 37,158 allocs, 30,945 frees, 14,138,873 bytes allocated
==15734==
==15734== LEAK SUMMARY:
==15734== definitely lost: 27,520 bytes in 16 blocks
==15734== indirectly lost: 1,535,892 bytes in 6,192 blocks
==15734== possibly lost: 0 bytes in 0 blocks
==15734== still reachable: 1,464 bytes in 5 blocks
==15734== suppressed: 0 bytes in 0 blocks
==15734== Rerun with --leak-check=full to see details of leaked memory
==15734==
==15734== For counts of detected and suppressed errors, rerun with: -v
==15734== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[yeqiang@localhost perfTest]$
说明:
7,124,570,832 main.c:main [/home/yeqiang/CLionProjects/perfTest/a.out]
表示 main.c 中的 main 函数总共消耗时间为7,124,570,832纳秒,约7.1秒。
7,124,500,819 main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
表示 main.c 中的perf函数总共消耗时间为7,124,500,819纳秒。是所有perf被调用消耗的时间总和。
--inclusive=yes表示报表统计的函数的时间消耗包含其内部子函数的时间消耗,如果=no表示不包含内部其他函数的时间消耗。如在yes参数下
7,124,500,819 main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
123,600,819 main.c:perf1 [/home/yeqiang/CLionProjects/perfTest/a.out]
在no参数下
7,000,900,000 main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
121,300,004 main.c:perf1 [/home/yeqiang/CLionProjects/perfTest/a.out]
可以看出no参数对比yes参数,perf的时间消耗减去了perf1的时间消耗。
----------------------------------------------------------------------------------------------------------------------------------------------------
接下来利用perf分析CPU消耗
开始采样:
[yeqiang@localhost perfTest]$ perf record -g ./a.out
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.406 MB perf.data (16143 samples) ]
[yeqiang@localhost perfTest]$
在当前目录下产生perf.data文件
利用perf分析CPU消耗:
[yeqiang@localhost perfTest]$ perf report -g
Samples: 16K of event 'cycles:pp', Event count (approx.): 6638999212
Children Self Command Shared Object Symbol
+ 99.99% 0.01% a.out a.out [.] main
+ 99.99% 0.00% a.out libc-2.21.so [.] __libc_start_main
+ 99.99% 0.00% a.out [unknown] [.] 0x080e258d4c544155
+ 97.07% 94.47% a.out a.out [.] perf
+ 2.92% 0.04% a.out libc-2.21.so [.] __nanosleep
+ 2.82% 0.11% a.out [kernel.kallsyms] [k] entry_SYSCALL_64_fastpath
+ 2.68% 0.08% a.out [kernel.kallsyms] [k] sys_nanosleep
+ 2.58% 0.12% a.out [kernel.kallsyms] [k] hrtimer_nanosleep
+ 2.40% 0.16% a.out [kernel.kallsyms] [k] do_nanosleep
+ 1.94% 1.92% a.out a.out [.] perf1
+ 1.38% 0.01% a.out [kernel.kallsyms] [k] schedule
+ 1.33% 0.12% a.out [kernel.kallsyms] [k] __schedule
+ 0.81% 0.10% a.out [kernel.kallsyms] [k] hrtimer_start_range_ns
+ 0.62% 0.62% a.out [kernel.kallsyms] [k] native_write_msr_safe
+ 0.55% 0.01% a.out [kernel.kallsyms] [k] deactivate_task
+ 0.48% 0.02% a.out [kernel.kallsyms] [k] dequeue_task_fair
+ 0.47% 0.01% a.out [kernel.kallsyms] [k] apic_timer_interrupt
+ 0.46% 0.00% a.out [kernel.kallsyms] [k] smp_apic_timer_interrupt
+ 0.45% 0.14% a.out [kernel.kallsyms] [k] dequeue_entity
+ 0.41% 0.01% a.out [kernel.kallsyms] [k] tick_program_event
+ 0.40% 0.02% a.out [kernel.kallsyms] [k] clockevents_program_event
+ 0.38% 0.01% a.out [kernel.kallsyms] [k] irq_exit
+ 0.35% 0.01% a.out [kernel.kallsyms] [k] __do_softirq
+ 0.32% 0.00% a.out [kernel.kallsyms] [k] lapic_next_deadline
+ 0.28% 0.07% a.out [kernel.kallsyms] [k] finish_task_switch
+ 0.23% 0.23% a.out [kernel.kallsyms] [k] update_blocked_averages
+ 0.21% 0.03% a.out [kernel.kallsyms] [k] __perf_event_task_sched_in
+ 0.21% 0.04% a.out [kernel.kallsyms] [k] __perf_event_task_sched_out
+ 0.21% 0.01% a.out [kernel.kallsyms] [k] run_rebalance_domains
+ 0.21% 0.00% a.out [kernel.kallsyms] [k] rebalance_domains
+ 0.19% 0.00% a.out [kernel.kallsyms] [k] perf_pmu_enable.part.85
+ 0.19% 0.01% a.out [kernel.kallsyms] [k] intel_pmu_enable_all
+ 0.19% 0.01% a.out [kernel.kallsyms] [k] enqueue_hrtimer
+ 0.19% 0.00% a.out [kernel.kallsyms] [k] x86_pmu_enable
+ 0.18% 0.00% a.out [kernel.kallsyms] [k] perf_event_context_sched_in
+ 0.18% 0.02% a.out [kernel.kallsyms] [k] ctx_sched_out
+ 0.17% 0.14% a.out [kernel.kallsyms] [k] timerqueue_add
+ 0.16% 0.08% a.out [kernel.kallsyms] [k] update_cfs_shares
+ 0.16% 0.02% a.out [kernel.kallsyms] [k] __intel_pmu_enable_all
+ 0.13% 0.01% a.out [kernel.kallsyms] [k] x86_pmu_disable
+ 0.13% 0.00% a.out [kernel.kallsyms] [k] perf_pmu_disable.part.83
+ 0.13% 0.00% a.out [kernel.kallsyms] [k] local_apic_timer_interrupt
可以看到,由于perf内有大量消耗CPU的循环操作,CPU消耗明显高于perf1
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。