当前位置:   article > 正文

Linux C 程序性能测试 valgrind callgrind分析函数耗时、perf分析函数CPU消耗_valgrind cpu使用率分析

valgrind cpu使用率分析

利用valgrind callgrind分析函数时间消耗(尤其在有网络、磁盘等有IO操作下,分析调用时间。在被IO阻塞情况下,不会明显占用CPU资源)

编写测试程序

  1. #include <stdio.h>
  2. #include <unistd.h>
  3. void perf1() {
  4. unsigned int i = 0;
  5. while (i < 200) {
  6. int b = i++;
  7. }
  8. usleep(10);
  9. }
  10. void perf() {
  11. int i = 0;
  12. while (i++ < 100000) {
  13. int b = i;
  14. }
  15. i = 0;
  16. while (i++ < 10) {
  17. perf1();
  18. }
  19. }
  20. int main() {
  21. unsigned int i = 0;
  22. while (i++ < 10000) {
  23. perf();
  24. }
  25. return 0;
  26. }
编译程序,注意加上-g选项,便于valgrind打印出源码对应行数

[yeqiang@localhost perfTest]$ gcc -g -O0 main.c 
[yeqiang@localhost perfTest]$ 

利用valgrind的 callgrind 工具 启动程序,开始采样

[yeqiang@localhost perfTest]$ valgrind --tool=callgrind ./a.out 
==15597== Callgrind, a call-graph generating cache profiler
==15597== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al.
==15597== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==15597== Command: ./a.out
==15597== 
==15597== For interactive control, run 'callgrind_control -h'.
==15597== 
==15597== Events    : Ir
==15597== Collected : 7124669060
==15597== 
==15597== I   refs:      7,124,669,060
此时当前目录下产生文件:callgrind.out.15597(其中15597是进程id号)

利用valgrind callgrind_annotate获取报表,采用--inclusive=no参数

[yeqiang@localhost perfTest]$ valgrind callgrind_annotate --inclusive=no callgrind.out.15597 
==15670== Memcheck, a memory error detector
==15670== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==15670== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==15670== Command: /usr/bin/callgrind_annotate --inclusive=no callgrind.out.15597
==15670== 
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.15597' (creator: callgrind-3.10.1)
--------------------------------------------------------------------------------
I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 1021390764
Trigger: Program termination
Profiled target:  ./a.out (PID 15597, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:     
User annotated:   
Auto-annotation:  off

--------------------------------------------------------------------------------
           Ir 
--------------------------------------------------------------------------------
7,124,669,060  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
7,000,900,000  main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
  121,300,004  main.c:perf1 [/home/yeqiang/CLionProjects/perfTest/a.out]

==15670== 
==15670== HEAP SUMMARY:
==15670==     in use at exit: 1,572,619 bytes in 6,334 blocks
==15670==   total heap usage: 36,828 allocs, 30,494 frees, 14,135,079 bytes allocated
==15670== 
==15670== LEAK SUMMARY:
==15670==    definitely lost: 27,520 bytes in 16 blocks
==15670==    indirectly lost: 1,543,568 bytes in 6,312 blocks
==15670==      possibly lost: 67 bytes in 1 blocks
==15670==    still reachable: 1,464 bytes in 5 blocks
==15670==         suppressed: 0 bytes in 0 blocks
==15670== Rerun with --leak-check=full to see details of leaked memory
==15670== 
==15670== For counts of detected and suppressed errors, rerun with: -v
==15670== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

利用valgrind callgrind_annotate获取报表,采用--inclusive=yes参数

[yeqiang@localhost perfTest]$ valgrind callgrind_annotate --inclusive=yes callgrind.out.15597 
==15734== Memcheck, a memory error detector
==15734== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==15734== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==15734== Command: /usr/bin/callgrind_annotate --inclusive=yes callgrind.out.15597
==15734== 
--------------------------------------------------------------------------------
Profile data file 'callgrind.out.15597' (creator: callgrind-3.10.1)
--------------------------------------------------------------------------------
I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 1021390764
Trigger: Program termination
Profiled target:  ./a.out (PID 15597, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:     
User annotated:   
Auto-annotation:  off

--------------------------------------------------------------------------------
           Ir 
--------------------------------------------------------------------------------
7,124,669,060  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
7,124,669,060  ???:0x0000000000000c80 [/usr/lib64/ld-2.21.so]
7,124,573,517  ???:_start [/home/yeqiang/CLionProjects/perfTest/a.out]
7,124,572,545  /usr/src/debug/glibc-2.21/csu/libc-start.c:(below main) [/usr/lib64/libc-2.21.so]
7,124,570,832  /home/yeqiang/CLionProjects/perfTest/main.c:main
7,124,570,832  main.c:main [/home/yeqiang/CLionProjects/perfTest/a.out]
7,124,500,819  main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
  123,600,819  main.c:perf1 [/home/yeqiang/CLionProjects/perfTest/a.out]
    2,300,000  /usr/src/debug/glibc-2.21/misc/../sysdeps/unix/sysv/linux/usleep.c:usleep [/usr/lib64/libc-2.21.so]

==15734== 
==15734== HEAP SUMMARY:
==15734==     in use at exit: 1,564,876 bytes in 6,213 blocks
==15734==   total heap usage: 37,158 allocs, 30,945 frees, 14,138,873 bytes allocated
==15734== 
==15734== LEAK SUMMARY:
==15734==    definitely lost: 27,520 bytes in 16 blocks
==15734==    indirectly lost: 1,535,892 bytes in 6,192 blocks
==15734==      possibly lost: 0 bytes in 0 blocks
==15734==    still reachable: 1,464 bytes in 5 blocks
==15734==         suppressed: 0 bytes in 0 blocks
==15734== Rerun with --leak-check=full to see details of leaked memory
==15734== 
==15734== For counts of detected and suppressed errors, rerun with: -v
==15734== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[yeqiang@localhost perfTest]$ 
说明:

7,124,570,832  main.c:main [/home/yeqiang/CLionProjects/perfTest/a.out]
表示 main.c 中的 main 函数总共消耗时间为7,124,570,832纳秒,约7.1秒。

7,124,500,819  main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
表示 main.c 中的perf函数总共消耗时间为7,124,500,819纳秒。是所有perf被调用消耗的时间总和。

--inclusive=yes表示报表统计的函数的时间消耗包含其内部子函数的时间消耗,如果=no表示不包含内部其他函数的时间消耗。如在yes参数下

7,124,500,819  main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
  123,600,819  main.c:perf1 [/home/yeqiang/CLionProjects/perfTest/a.out]

在no参数下

7,000,900,000  main.c:perf [/home/yeqiang/CLionProjects/perfTest/a.out]
  121,300,004  main.c:perf1 [/home/yeqiang/CLionProjects/perfTest/a.out]
可以看出no参数对比yes参数,perf的时间消耗减去了perf1的时间消耗。

----------------------------------------------------------------------------------------------------------------------------------------------------

接下来利用perf分析CPU消耗

开始采样:

[yeqiang@localhost perfTest]$ perf record -g ./a.out
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.406 MB perf.data (16143 samples) ]
[yeqiang@localhost perfTest]$ 
在当前目录下产生perf.data文件

利用perf分析CPU消耗:

[yeqiang@localhost perfTest]$ perf report -g

Samples: 16K of event 'cycles:pp', Event count (approx.): 6638999212
  Children      Self  Command  Shared Object      Symbol
+   99.99%     0.01%  a.out    a.out              [.] main
+   99.99%     0.00%  a.out    libc-2.21.so       [.] __libc_start_main
+   99.99%     0.00%  a.out    [unknown]          [.] 0x080e258d4c544155
+   97.07%    94.47%  a.out    a.out              [.] perf
+    2.92%     0.04%  a.out    libc-2.21.so       [.] __nanosleep
+    2.82%     0.11%  a.out    [kernel.kallsyms]  [k] entry_SYSCALL_64_fastpath
+    2.68%     0.08%  a.out    [kernel.kallsyms]  [k] sys_nanosleep
+    2.58%     0.12%  a.out    [kernel.kallsyms]  [k] hrtimer_nanosleep
+    2.40%     0.16%  a.out    [kernel.kallsyms]  [k] do_nanosleep
+    1.94%     1.92%  a.out    a.out              [.] perf1
+    1.38%     0.01%  a.out    [kernel.kallsyms]  [k] schedule
+    1.33%     0.12%  a.out    [kernel.kallsyms]  [k] __schedule
+    0.81%     0.10%  a.out    [kernel.kallsyms]  [k] hrtimer_start_range_ns
+    0.62%     0.62%  a.out    [kernel.kallsyms]  [k] native_write_msr_safe
+    0.55%     0.01%  a.out    [kernel.kallsyms]  [k] deactivate_task
+    0.48%     0.02%  a.out    [kernel.kallsyms]  [k] dequeue_task_fair
+    0.47%     0.01%  a.out    [kernel.kallsyms]  [k] apic_timer_interrupt
+    0.46%     0.00%  a.out    [kernel.kallsyms]  [k] smp_apic_timer_interrupt
+    0.45%     0.14%  a.out    [kernel.kallsyms]  [k] dequeue_entity
+    0.41%     0.01%  a.out    [kernel.kallsyms]  [k] tick_program_event
+    0.40%     0.02%  a.out    [kernel.kallsyms]  [k] clockevents_program_event
+    0.38%     0.01%  a.out    [kernel.kallsyms]  [k] irq_exit
+    0.35%     0.01%  a.out    [kernel.kallsyms]  [k] __do_softirq
+    0.32%     0.00%  a.out    [kernel.kallsyms]  [k] lapic_next_deadline
+    0.28%     0.07%  a.out    [kernel.kallsyms]  [k] finish_task_switch
+    0.23%     0.23%  a.out    [kernel.kallsyms]  [k] update_blocked_averages
+    0.21%     0.03%  a.out    [kernel.kallsyms]  [k] __perf_event_task_sched_in
+    0.21%     0.04%  a.out    [kernel.kallsyms]  [k] __perf_event_task_sched_out
+    0.21%     0.01%  a.out    [kernel.kallsyms]  [k] run_rebalance_domains
+    0.21%     0.00%  a.out    [kernel.kallsyms]  [k] rebalance_domains
+    0.19%     0.00%  a.out    [kernel.kallsyms]  [k] perf_pmu_enable.part.85
+    0.19%     0.01%  a.out    [kernel.kallsyms]  [k] intel_pmu_enable_all
+    0.19%     0.01%  a.out    [kernel.kallsyms]  [k] enqueue_hrtimer
+    0.19%     0.00%  a.out    [kernel.kallsyms]  [k] x86_pmu_enable
+    0.18%     0.00%  a.out    [kernel.kallsyms]  [k] perf_event_context_sched_in
+    0.18%     0.02%  a.out    [kernel.kallsyms]  [k] ctx_sched_out
+    0.17%     0.14%  a.out    [kernel.kallsyms]  [k] timerqueue_add
+    0.16%     0.08%  a.out    [kernel.kallsyms]  [k] update_cfs_shares
+    0.16%     0.02%  a.out    [kernel.kallsyms]  [k] __intel_pmu_enable_all
+    0.13%     0.01%  a.out    [kernel.kallsyms]  [k] x86_pmu_disable
+    0.13%     0.00%  a.out    [kernel.kallsyms]  [k] perf_pmu_disable.part.83
+    0.13%     0.00%  a.out    [kernel.kallsyms]  [k] local_apic_timer_interrupt
可以看到,由于perf内有大量消耗CPU的循环操作,CPU消耗明显高于perf1







声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/180981
推荐阅读
相关标签
  

闽ICP备14008679号