赞
踩
最近在给一个驱动程序添加一个功能 --> 通过给定的进程名找到对应进程的pid号,但是遇到了crash的情况,我们一起找找问题出在哪里!
首先给到dmesg中的crash信息:
[ 4534.975026] BUG: unable to handle kernel NULL pointer dereference at 0000000000000430 [ 4534.976059] IP: [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts] [ 4534.977065] PGD 2195a2067 PUD 219c6f067 PMD 0 [ 4534.978066] Oops: 0000 [#3] SMP [ 4534.979027] Modules linked in: bts(OE) chr(OE) hid_generic usbhid hid rfcomm bnep bluetooth intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm arc4 ath9k amdkfd ath9k_common ath9k_hw amd_iommu_v2 ath radeon snd_hda_codec_idt snd_hda_codec_generic snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec mac80211 snd_hda_core aesni_intel aes_x86_64 joydev snd_hwdep hp_wmi snd_pcm sparse_keymap input_leds lrw serio_raw gf128mul glue_helper ppdev ablk_helper lp parport_pc snd_seq_midi cfg80211 snd_seq_midi_event snd_rawmidi snd_seq ttm cryptd snd_seq_device snd_timer mei_me drm_kms_helper mei drm snd i2c_algo_bit soundcore hp_accel lpc_ich lis3lv02d tpm_infineon input_polldev parport video 8250_fintek hp_wireless mac_hid wmi psmouse ahci libahci firewire_ohci sdhci_pci firewire_core e1000e sdhci crc_itu_t ptp pps_core [last unloaded: bts] [ 4534.985521] CPU: 0 PID: 3462 Comm: ops_main Tainted: G D W OE 4.2.0-42-generic #49~14.04.1-Ubuntu [ 4534.986561] Hardware name: Hewlett-Packard HP ProBook 6470b/179C, BIOS 68ICE Ver. F.45 10/07/2013 [ 4534.987607] task: ffff8802203a5280 ti: ffff880220298000 task.ti: ffff880220298000 [ 4534.988636] RIP: 0010:[<ffffffffc0747e78>] [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts] [ 4534.989674] RSP: 0018:ffff88022029bd38 EFLAGS: 00010246 [ 4534.990663] RAX: ffffffff81c15840 RBX: 0000000000000006 RCX: 0000000000000002 [ 4534.991635] RDX: 0000000000000002 RSI: ffff88022029bd51 RDI: ffff8802203a5859 [ 4534.992587] RBP: ffff88022029be98 R08: ffffffffc074b060 R09: 315f6e65706f5f34 [ 4534.993573] R10: 00007fd6ff1ba6a0 R11: 0000000000000246 R12: 0000000000000000 [ 4534.994497] R13: ffffffff81c15840 R14: ffff8802203a5858 R15: ffff8800b8e7b000 [ 4534.995411] FS: 00007fd6ff3cb740(0000) GS:ffff88023ec00000(0000) knlGS:0000000000000000 [ 4534.996324] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4534.997232] CR2: 0000000000000430 CR3: 000000022b479000 CR4: 00000000001406f0 [ 4534.998334] Stack: [ 4534.999528] ffff88022029bd68 ffffffff811f833e ffff88022029bd68 ffffffffc074a201 [ 4535.000466] ffffffff81c15840 7700007472617473 6174732065746972 6563617274207472 [ 4535.001395] 646e616d6d6f6320 253a726f72726520 7320737462000a73 6f72726520706f74 [ 4535.002401] Call Trace: [ 4535.003364] [<ffffffff811f833e>] ? terminate_walk+0x6e/0xe0 [ 4535.004328] [<ffffffff811ede38>] __vfs_write+0x18/0x40 [ 4535.005283] [<ffffffff811ee479>] vfs_write+0xa9/0x190 [ 4535.006244] [<ffffffff810dbefd>] ? call_rcu_sched+0x1d/0x20 [ 4535.007182] [<ffffffff811ef1e6>] SyS_write+0x46/0xa0 [ 4535.008111] [<ffffffff817c36f2>] entry_SYSCALL_64_fastpath+0x16/0x75 [ 4535.009038] Code: 00 00 49 8b 84 24 40 03 00 00 48 89 85 c0 fe ff ff 4c 8b ad c0 fe ff ff 4d 8d a5 c0 fc ff ff 49 81 fc 00 55 c1 81 75 bc 45 31 e4 <45> 8b a4 24 30 04 00 00 48 c7 c7 1d a2 74 c0 31 c0 44 89 e6 e8 [ 4535.011028] RIP [<ffffffffc0747e78>] bts_write+0x1b8/0x830 [bts] [ 4535.011968] RSP <ffff88022029bd38> [ 4535.012902] CR2: 0000000000000430 [ 4535.013850] ---[ end trace bd7d268405d6447e ]---
从dmesg Log中可以看到 BUG: unable to handle kernel NULL pointer dereference at 0000000000000430 从字面意思来看遇到了一个空指针类型的错误,还有第二个信息是十分重要的,bts_write+0x1b8/0x830 [bts] ,从这个信息我们可以看出出错的函数以及偏移,出错的函数在 bts_write ,相对偏移为0x1b8;
针对这个信息,第一件要做的事情就是把驱动编译过程文件xxx.o进行反汇编,现在Linux 自带的objdump就可以了;
//要是不知道具体参数 objdump -h就知道了 curtis@curtis-virtual-machine:/mnt/hgfs/share/write_code/runqueue$ objdump --help Usage: objdump <option(s)> <file(s)> Display information from object <file(s)>. At least one of the following switches must be given: -a, --archive-headers Display archive header information -f, --file-headers Display the contents of the overall file header -p, --private-headers Display object format specific file header contents -P, --private=OPT,OPT... Display object format specific contents -h, --[section-]headers Display the contents of the section headers -x, --all-headers Display the contents of all headers -d, --disassemble Display assembler contents of executable sections -D, --disassemble-all Display assembler contents of all sections -S, --source Intermix source code with disassembly -s, --full-contents Display the full contents of all sections requested -g, --debugging Display debug information in object file -e, --debugging-tags Display debug information using ctags style -G, --stabs Display (in raw form) any STABS info in the file -W[lLiaprmfFsoRt] or --dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames, =frames-interp,=str,=loc,=Ranges,=pubtypes, =gdb_index,=trace_info,=trace_abbrev,=trace_aranges, =addr,=cu_index] Display DWARF info in the file -t, --syms Display the contents of the symbol table(s) -T, --dynamic-syms Display the contents of the dynamic symbol table -r, --reloc Display the relocation entries in the file -R, --dynamic-reloc Display the dynamic relocation entries in the file @<file> Read options from <file> -v, --version Display this program's version number -i, --info List object formats and architectures supported -H, --help Display this information The following switches are optional: -b, --target=BFDNAME Specify the target object format as BFDNAME -m, --architecture=MACHINE Specify the target architecture as MACHINE -j, --section=NAME Only display information for section NAME -M, --disassembler-options=OPT Pass text OPT on to the disassembler -EB --endian=big Assume big endian format when disassembling -EL --endian=little Assume little endian format when disassembling --file-start-context Include context from start of file (with -S) -I, --include=DIR Add DIR to search list for source files -l, --line-numbers Include line numbers and filenames in output -F, --file-offsets Include file offsets when displaying information -C, --demangle[=STYLE] Decode mangled/processed symbol names The STYLE, if specified, can be `auto', `gnu', `lucid', `arm', `hp', `edg', `gnu-v3', `java' or `gnat' -w, --wide Format output for more than 80 columns -z, --disassemble-zeroes Do not skip blocks of zeroes when disassembling --start-address=ADDR Only process data whose address is >= ADDR --stop-address=ADDR Only process data whose address is <= ADDR --prefix-addresses Print complete address alongside disassembly --[no-]show-raw-insn Display hex alongside symbolic disassembly --insn-width=WIDTH Display WIDTH bytes on a single line for -d --adjust-vma=OFFSET Add OFFSET to all displayed section addresses --special-syms Include special symbols in symbol dumps --prefix=PREFIX Add PREFIX to absolute paths for -S --prefix-strip=LEVEL Strip initial directory names for -S --dwarf-depth=N Do not display DIEs at depth N or greater --dwarf-start=N Display DIEs starting with N, at the same depth or deeper --dwarf-check Make additional dwarf internal consistency checks. objdump: supported targets: elf64-x86-64 elf32-i386 elf32-x86-64 a.out-i386-linux pei-i386 pei-x86-64 elf64-l1om elf64-k1om elf64-little elf64-big elf32-little elf32-big pe-x86-64 pe-i386 plugin srec symbolsrec verilog tekhex binary ihex objdump: supported architectures: i386 i386:x86-64 i386:x64-32 i8086 i386:intel i386:x86-64:intel i386:x64-32:intel i386:nacl i386:x86-64:nacl i386:x64-32:nacl l1om l1om:intel k1om k1om:intel plugin The following i386/x86-64 specific disassembler options are supported for use with the -M switch (multiple options should be separated by commas): x86-64 Disassemble in 64bit mode i386 Disassemble in 32bit mode i8086 Disassemble in 16bit mode att Display instruction in AT&T syntax intel Display instruction in Intel syntax att-mnemonic Display instruction in AT&T mnemonic intel-mnemonic Display instruction in Intel mnemonic addr64 Assume 64bit address size addr32 Assume 32bit address size addr16 Assume 16bit address size data32 Assume 32bit data size data16 Assume 16bit data size suffix Always display instruction suffix in AT&T syntax Report bugs to <http://www.sourceware.org/bugzilla/>. //这里使用-D参数把所有sections反汇编,并重定向到文件方便后续查看 curtis@curtis-HP-ProBook-6470b:~/Desktop/per_bts/drv$ objdump bts.o -D > err.txt
objdump 默认情况下输出的是ATT汇编语法,如果不习惯可以转换成intel汇编语法,添加参数 -M intel ,下一步就是找到出错函数的基址,vim打开搜索bts_write就可以找到:
0000000000000cc0 <bts_write>: cc0: e8 00 00 00 00 callq cc5 <bts_write+0x5> cc5: 55 push %rbp cc6: b9 20 00 00 00 mov $0x20,%ecx ccb: 48 89 e5 mov %rsp,%rbp cce: 41 57 push %r15 cd0: 41 56 push %r14 cd2: 45 31 f6 xor %r14d,%r14d cd5: 41 55 push %r13 cd7: 4c 8d ad c8 fe ff ff lea -0x138(%rbp),%r13 cde: 41 54 push %r12 ce0: 53 push %rbx ce1: 48 89 d3 mov %rdx,%rbx ce4: ba fe 00 00 00 mov $0xfe,%edx ce9: 48 81 ec 38 01 00 00 sub $0x138,%rsp cf0: 4c 8b bf d0 00 00 00 mov 0xd0(%rdi),%r15 cf7: 48 8d bd c8 fe ff ff lea -0x138(%rbp),%rdi cfe: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax d05: 00 00 d07: 48 89 45 c8 mov %rax,-0x38(%rbp)
从以上信息可以看出,函数的基址为0xcc0,想要找到具体的出错行,还需加上偏移0x1b8 --> 0xcc0+0x1b8=0xe78;
下一步就是如何定位出错代码行,这里就要用到另外一个工具,addr2line;
注意:有同学可能在编译驱动的时候在Makefile中没有添加参数 “KBUILD_CFLAGS+= -g” 参数,导致使用addr2line工具时无法找到oops具体对应的行号!!!
Makefile 示例如下:
obj-m += good.o
KBUILD_CFLAGS+= -g
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
curtis@curtis-HP-ProBook-6470b:~/Desktop/per_bts/drv$ addr2line -h Usage: addr2line [option(s)] [addr(s)] Convert addresses into line number/file name pairs. If no addresses are specified on the command line, they will be read from stdin The options are: @<file> Read options from <file> -a --addresses Show addresses -b --target=<bfdname> Set the binary file format -e --exe=<executable> Set the input file name (default is a.out) -i --inlines Unwind inlined functions -j --section=<name> Read section-relative offsets instead of addresses -p --pretty-print Make the output easier to read for humans -s --basenames Strip directory names -f --functions Show function names -C --demangle[=style] Demangle function names -h --help Display this information -v --version Display the program's version curtis@curtis-HP-ProBook-6470b:~/Desktop/per_bts/drv$ addr2line -C -f -e bts.o e78 find_pid /home/curtis/Desktop/per_bts/drv/bts_driver.c:108
这里成功找到出错行函数以及出错行号,出错函数为find_pid,行号为108,在代码中找到对应函数;
static int find_pid(char *string_name)
{
unsigned int pid;
char *find_name = &string_name; --> char *find_name = string_name;
struct task_struct* task;
task = find_task(find_name);
pid = task->pid; <--第108行
printk("Have find pid is %d\n",pid);
return pid;
}
仔细分析发现是因为find_task函数没有返回进程的task_struct结构体,导致出现空指针,根本原因是前后代码改动较大,忽略了对find_name的初始化出错了,传入的形参是字符串指针,改完之后,完美解决问题;
mod -S /path/to/driver.o
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。