赞
踩
物理坏块:通常是由于硬件损坏如磁盘异常导致、内存有问题、存储链路有问题、 IO 有问题、文件系统有问题、 Oracle 本身的问题等
逻辑坏块:可能都是软件问题导致通常是由于oracle bug 导致,比如data block 和 index block 数据不 一致。
物理坏块的场景:
This kind of block corruptions are normally reported by Oracle with error ORA-1578 and the detailed corruption description is printed in the alert log.
Corruption Examples are:
Bad header - the beginning of the block (cache header) is corrupt with invalid values
The block is Fractured/Incomplete - header and footer of the block do not match
The block checksum is invalid
The block is misplaced
Zeroed out blocks Note 1545366.1
逻辑坏块的场景:
This is when block contains a valid checksum and the structure below the beginning of the block is corrupt (Block content is corrupt). It may cause different ORA-600 errors.
The detailed corruption description for Logical Corruptions are not normally printed in the alert.log. DBVerify will report what is logically corrupted in the block.
Corruption Examples are:
row locked by non-existent transaction - ORA-600 [4512], etc
the amount of space used is not equal to block size
avsp bad
etc.
oracle根据自己的机制,把数据落盘后,就自己认为这个块是正常的,底层的东西他就不管了。落盘完成后,由于某种原因(存储、网络)等原因,导致磁盘上的块和之前写入磁盘的块不一致了(这里的不一致指的是各种结构和校验值,其实checksum校验值也是对快内容进行的校验,下面实验有验证),有所改动,oracle就会认为这是一个物理坏块。
而逻辑坏块通常就是索引和表不一致,报错ora-600.
基本环境准备
SQL> create table t2(id int,name varchar2(10)); Table created. SQL> insert into t2 values (1,'aaa'); 1 row created. SQL> commit; Commit complete. SQL> select rowid,id from zhuo.t2; ROWID ID ------------------ ---------- AAATtTAAFAAAAWUAAA 1 SQL> select file_id,extent_id,block_id,blocks from dba_extents where segment_name='T2' and owner = 'ZHUO'; FILE_ID EXTENT_ID BLOCK_ID BLOCKS ---------- ---------- ---------- ---------- 5 0 1424 8 SQL> select SEGMENT_NAME,HEADER_FILE,HEADER_BLOCK from dba_segments where SEGMENT_NAME='T2'; SEGMENT_NA HEADER_FILE HEADER_BLOCK ---------- ----------- ------------ T2 5 1426 select dbms_rowid.rowid_relative_fno('AAATtTAAFAAAAWUAAA') file_id, dbms_rowid.rowid_block_number('AAATtTAAFAAAAWUAAA') block_id from dual; FILE_ID BLOCK_ID ---------- ---------- 5 1428 --真正的数据块 SQL> alter system flush buffer_cache; System altered.
1.Bad header the beginning of the block (cache header) is corrupt with invalid values
在block结构中,kcbh损坏的都算。
测试:
BBED> p kcbh
struct kcbh, 20 bytes @0
ub1 type_kcbh @0 0x06
ub1 frmt_kcbh @1 0xa2
ub1 spare1_kcbh @2 0x00
ub1 spare2_kcbh @3 0x00
ub4 rdba_kcbh @4 0x01400594
ub4 bas_kcbh @8 0x000e0333
ub2 wrp_kcbh @12 0x0000
ub1 seq_kcbh @14 0x01
ub1 flg_kcbh @15 0x06 (KCBHFDLC, KCBHFCKV)
ub2 chkval_kcbh @16 0x4af9
ub2 spare3_kcbh @18 0x0000
修改为坏块
BBED> m /x ff offset 14 Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y File: /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf (5) Block: 1428 Offsets: 14 to 25 Dba:0x01400594 ------------------------------------------------------------------------ ff06f94a 00000100 0000533b <32 bytes per line> BBED> sum apply Check value for File 5, Block 1428: current = 0x4a07, required = 0x4a07 BBED> v DBVERIFY - Verification starting FILE = /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf BLOCK = 1428 Block 1428 is corrupt Corrupt block relative dba: 0x01400594 (file 0, block 1428) Fractured block found during verification Data in bad block: type: 6 format: 2 rdba: 0x01400594 last change scn: 0x0000.000e0333 seq: 0xff flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x03330601 check value in block header: 0x4a07 computed block checksum: 0x0 DBVERIFY - Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 0 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 1 Total Blocks Influx : 2 Message 531 not found; product=RDBMS; facility=BBED
查询报错ORA-1578:
SQL> select * from zhuo.t2;
select * from zhuo.t2
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 1428)
ORA-01110: data file 5:
'/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf'
alert日志报错:
Corrupt block relative dba: 0x01400594 (file 5, block 1428)
Fractured block found during multiblock buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x01400594
last change scn: 0x0000.000e0333 seq: 0xff flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x03330601
check value in block header: 0x4a07
computed block checksum: 0x0
2.The block is Fractured/Incomplete header and footer of the block do not match
block不完整,从block header中获取的信息和block tail中的不一致。
tail值=scn后4位+type+seq
测试:
BBED> p tailchk ub4 tailchk @8188 0x03330601 BBED> d /v count 12 File: /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf (5) Block: 1428 Offsets: 8188 to 8191 Dba:0x01400594 ------------------------------------------------------- 01063303 l ..3. <16 bytes per line> BBED> p bas_kcbh ub4 bas_kcbh @8 0x000e0333 BBED> m /x 01063304 offset 8188 File: /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf (5) Block: 1428 Offsets: 8188 to 8191 Dba:0x01400594 ------------------------------------------------------------------------ 01063304 <32 bytes per line> BBED> sum apply Check value for File 5, Block 1428: current = 0x4df9, required = 0x4df9 BBED> v DBVERIFY - Verification starting FILE = /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf BLOCK = 1428 Block 1428 is corrupt Corrupt block relative dba: 0x01400594 (file 0, block 1428) Fractured block found during verification Data in bad block: type: 6 format: 2 rdba: 0x01400594 last change scn: 0x0000.000e0333 seq: 0x1 flg: 0x06 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x04330601 check value in block header: 0x4df9 computed block checksum: 0x0 DBVERIFY - Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 0 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 1 Total Blocks Influx : 2 Message 531 not found; product=RDBMS; facility=BBED
查询报错ORA-1578:
SQL> select * from zhuo.t2;
select * from zhuo.t2
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 1428)
ORA-01110: data file 5:
'/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf'
---如果查询不报错,执行刷盘操作:alter system flush buffer_cache;
alert日志报错:
Corrupt block relative dba: 0x01400594 (file 5, block 1428)
Fractured block found during multiblock buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x01400594
last change scn: 0x0000.000e0333 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x04330601
check value in block header: 0x4df9
computed block checksum: 0x0
解决:
修改tail值
Set dba 5,131
M /x 23232 offset 8188
Sum apply
3.The block checksum is invalid
测试:
块修改之前的checksum
用 dd 把 5 号 file,1428 号 block 给拷 出来
[oracle@oracle11g datafile]$ dd if=o1_mf_zhuo_gxdcfr5s_.dbf of=/tmp/1428 bs=8192 count=1 skip=1428
可以看到这个block 现在在块头记录的 checksum 值是 F8 4A
现
现在我们将将ID 为 1 的那条记录的 ID 值由 1 改为 2 ,即将 C1 02 改成 C1 03
SQL> select dump(1,16) from dual; DUMP(1,16) ----------------- Typ=2 Len=2: c1,2 SQL> select dump('aaa',16) from dual; DUMP('AAA',16) ---------------------- Typ=96 Len=3: 61,61,61 SQL> select dump(2,16) from dual; DUMP(2,16) ----------------- Typ=2 Len=2: c1,3
改完后ftp 到 LINUX 上 然后再用 dd 把上述修改过的 block 给拷回去 再 startup force 上述数据库:
[oracle@oracle11g datafile]$ dd of=o1_mf_zhuo_gxdcfr5s_.dbf if=/tmp/1428 seek=1428 count=1 bs=8192 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000158362 s, 51.7 MB/s
从上述alert log 里我们可以看出以下两点:
1、块头记录的 checksum 值是 4a f9 oracle 这里做异或操作后的 checksum 值是100
2、 oracle 当发现 checksum 值不对的时候会尝试再次读一下该 block
上述的操作也可以直接用bbed修复offset16,不sum apply来完成。上述演示知识为了根据实际情况,由于硬件,值在存储上面修改了,绕过了oracle自动计算checksum的值,导致错误。
怎样计算出正确的 checksum 值?
1、手工计算
我们现在根据
0x4af9 和 0x100 来计算出上述 block 在修改后正确的 checksum 值
0x4af9 = 100101011111001
0x100 = 000000100000000
根据
异或算法原理 ,这里很容易可以看出 oracle 计算出来的正确的 checksum 值应该是:
100101111111001 ,即 4B F9
2、直接sum(首选)
BBED> sum
Check value for File 5, Block 1428:
current = 0x4af9, required = 0x4bf9 --所需要的值
3、1)需要把 block的16、17位清零,那么从alert log 就可以看到 oracle 计算出来的 checksum 值了,
而且这个时候 alert log 里的 checksum 值就是这个 block 被 oracle 计算出来的正确的 checksum值 。
2)因为oracle在alert log里记录的computed block checksum 值实际上是并不是这个block被oracle计算出来的正确的checksum值,而是这个block被oracle计算出来的正确的checksum值与这个block的第16 、17 位记录的checksum值做异或操作后的值。
3)当你把一个 block 的 16 、 17 位清零后,因为 0 与一个值做异或操作后还是等于那个值,所以在将
16 、17 位清零后,这个时候 alert log 里的 checksum 值就是这个 block 被 oracle 计算出来的正确的 checksum 值了
(computed block checksum)=(check value in block header) XOR (oracle计算出来的正确checksum)
check value in block header=bbed 里面p chkval_kcbh打印出来的值。块头的checksum值。
修复:
BBED> sum apply Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y Check value for File 5, Block 1428: current = 0x4bf9, required = 0x4bf9 BBED> p chkval_kcbh ub2 chkval_kcbh @16 0x4bf9 SQL> alter system flush buffer_cache; System altered. SQL> select * from zhuo.t2; ID NAME ---------- ---------- 2 aaa --1已经修改为2
总结下:如果block的checksum有问题,bbed直接sum 可以看出正确值,sum apply直接就可以修复
可以看出,当数据落盘后,由于人为修改了块里具体的值,导致oracle读取的时候,重新计算的checksum值和块头存储的checksum值不一致,导致物理坏块。所以物理坏块也会校验块内容。
4.The block is misplaced
oracle检测到读取的数据库包含的内容属于另一个block,同时checksum正常
测试:
BBED> copy file 5 block 132 to file 5 block 131 File: /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf (5) Block: 131 Offsets: 0 to 511 Dba:0x01400083 ------------------------------------------------------------------------ 06a20000 84004001 3ca20b00 00000204 12e40000 01000000 e23a0100 36a20b00 00000000 03003200 80004001 ffff0000 00000000 00000000 00000000 00800000 36a20b00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00015300 ffffb800 0c045403 54030000 5300311f e21e931e 421ef11d a11d4f1d f71ca91c 591c091c b51b5a1b 081bb31a 621a0b1a b4196119 0f19ad18 4c18f817 a4175317 fb16ad16 5d160b16 b9156715 1315bf14 70140e14 ad135b13 0e13bf12 70122012 ce117711 1e11cb10 7c102510 d70f860f 360fe80e 990e440e eb0d930d 380de30c 910c3e0c e80b870b 360be30a 8f0a3b0a e4098209 2109c708 6d081b08 c6077207 1e07c706 70061906 c2056b05 1405bc04 64040c04 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <32 bytes per line> BBED> set file 5 block 131 FILE# 5 BLOCK# 131 BBED> v DBVERIFY - Verification starting FILE = /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf BLOCK = 131 Block 131 is corrupt Corrupt block relative dba: 0x01400083 (file 0, block 131) Bad header found during verification Data in bad block: type: 6 format: 2 rdba: 0x01400084 last change scn: 0x0000.000ba23c seq: 0x2 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xa23c0602 check value in block header: 0xe412 computed block checksum: 0x0 DBVERIFY - Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 0 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 1 Total Blocks Influx : 0 Message 531 not found; product=RDBMS; facility=BBED BBED> BBED> set file 5 block 132 FILE# 5 BLOCK# 132 BBED> v DBVERIFY - Verification starting FILE = /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf BLOCK = 132 DBVERIFY - Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 1 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 0 Total Blocks Influx : 0 Message 531 not found; product=RDBMS; facility=BBED
5.Zeroed out blocks/ORA 8103 模拟
1、oracle bug 也可能导致逻辑坏块的产生 . 特别是 parallel dml . 例如
Bug 5621677 Logical corruption with PARALLEL update
Bug 6994194 Logical corruption from UPDATE DML
Bug 15980234 ORA1400 / logical corruption from direct path INSERT ALL(fail withORA 1400, in direct path / PDML) 。
2、多数情况下逻辑坏块可能都是软件问题导致,当然数据库异常也可能导致。比如掉电的情况下,
就可能导致块内数据不一致。
逻辑坏块的分类
参考:
Physical and Logical Block Corruptions. All you wanted to know about it. (Doc ID 840978.1)
Master Note for Handling Oracle Database Corruption Issues (Doc ID 1088018.1)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。