记一次Oracle坏块修复过程


昨天接备份同事电话反应在进行RMAN冷备的过程中报如下错,某个表空间备份失败。
RMAN-00571: ===========================================================
 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
 RMAN-00571: ===========================================================
 RMAN-03009: failure of backup command on ch00 channel at 04/25/2015 22:02:30
 ORA-19566: exceeded limit of 0 corrupt blocks for file +DATA/dbrac/datafile/tbs_11.11.435678937

经过分析发现坏块导致,且坏块不属于任何对象(空块),以下是本次坏块修复步骤:
1、查找坏块
1)使用RMAN查找坏块
 验证整个数据库:
Rman> backup validate check logical database ;
注:当数据库版本低于11g且非归档模式,以上命令必须在数据库处于mounted状态执行
 验证单个datafile
 Rman> backup validate check logical datafile 11 ;
而后执行以下SQL查看坏块:
SQL>Select * from v$database_block_corruption ;

例如:
validate.sh
 #!/bin/bash
 source /home/Oracle/.bash_profile
 $ORACLE_HOME/bin/rman log=/home/oracle/users/validate.log <<EOF
 connect target /
 Backup validate check logical datafile 11 ;
 exit;
 EOF

 2)使用DBV查找坏块:
dbv userid=system/system  file='+DATA/dbrac/datafile/tbs_11.11.435678937'    blocksize=32768

 2、确认坏块是否不属于任何对象
select segment_name, segment_type, owner
        from dba_extents
      where file_id = <Absolute file number>
        and <corrupted block number> between block_id
            and block_id + blocks  -1;
例如:
alter session force parallel query parallel 10;
 select  segment_name, segment_type, owner
        from dba_extents
      where file_id = 11
        and 184959440 between block_id
            and block_id + blocks  -1;
           
 3、确认块在 dba_free_space存在
Select * from dba_free_space where file_id= <Absolute file number>
      and <corrupted block number> between block_id and block_id + blocks -1;
例如:
Select * from dba_free_space where file_id= 11    and 184959440 between block_id and block_id + blocks -1;
     
 4、创建表
create table s (
      n number,
        c varchar2(4000)
      ) nologging tablespace <tablespace name having the corrupt block> pctfree 99;
例如:
create table users.s (
      n number,
        c varchar2(4000)
      ) nologging tablespace TBS_11 pctfree 99;


 select segment_name,tablespace_name from dba_segments
      where segment_name='S' ;
     
 Select table_name,tablespace_name from dba_tables where table_name='S' ;
     
 5、创建触发器
CREATE OR REPLACE TRIGGER corrupt_trigger
  AFTER INSERT ON users.s
  REFERENCING OLD AS p_old NEW AS new_p
  FOR EACH ROW
 DECLARE
  corrupt EXCEPTION;
 BEGIN
  IF (dbms_rowid.rowid_block_number(:new_p.rowid)=&blocknumber)
  and (dbms_rowid.rowid_relative_fno(:new_p.rowid)=&filenumber) THEN
      RAISE corrupt;
  END IF;
 EXCEPTION
  WHEN corrupt THEN
      RAISE_APPLICATION_ERROR( -20000, 'Corrupt block has been formatted');
 END;
 /

 6、分配空间创建在有坏块的datafile上的表
 注:
i)因为ASSM会自动确定下一个区段的大小,所以在ASSM的表空间上,需要创建多个表及
 不断的分配空间给这些表,直到坏块被分配至其中一个对象。
ii)设置datafile的AUTOEXTEND为OFF

 1)查找坏块的extent size
 Select BYTES from dba_free_space where file_id=<file no> and <corrupt block no> between
 block_id and block_id + blocks  -1;

例如:
alter database datafile '+DATA/dbrac/datafile/tbs_11.11.435678937' autoextend off;

 SQL> Select BYTES from dba_free_space where file_id=11 and 184959440 between
  2  block_id and block_id + blocks  -1;


      BYTES
 ----------
  29360128

 2)不断allocate直到坏块是S表的一部分
 如果步骤1输出结果是64K,执行以下SQL:
alter table users.s
 allocate extent (DATAFILE '+DATA/dbrac/datafile/tbs_11.11.435678937' SIZE 64K);

如果大于64K使用以下
BEGIN
 for i in 1..1000000 loop
 EXECUTE IMMEDIATE 'alter table users.s allocate extent (DATAFILE '||'''+DATA/dbrac/datafile/tbs_11.11.435678937'''||'SIZE 64K) ';
 end loop;
 end ;
 /


使用如下SQL查询坏块是否已属于某个对象:
select segment_name, segment_type, owner
        from dba_extents
      where file_id = <Absolute file number>
        and <corrupt block number> between block_id
            and block_id + blocks  -1 ;


例如:
select segment_name, segment_type, owner
        from dba_extents
      where file_id = 11
        and 184959440 between block_id
            and block_id + blocks  -1 ;
           
 3)插入数据初始化坏块
Begin
 FOR i IN 1..1000000000 loop
    for j IN 1..100000 loop
      Insert into users.s VALUES(i,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx');
    end loop;
    commit;
 END LOOP;
 END;

 7、验证是否存在坏块
Rman> Backup validate check logical datafile <fileno> ;
或者validate datafile <fileno> block <blockno reported corrupt>, <blockno reported corrupt>  ;
 Select * from v$database_block_corruption ;
例如:
validate datafile 11  block 184959440, 184961480,184961481 ;
 Select * from v$database_block_corruption ;

 8、当第7步确认坏块已消除,即可删除测试表
DROP TABLE users.s purge;

 9、切换多次日志及checkpoint
 Alter system switch logfile ;
 Alter system checkpoint ;

 10、删除触发器
DROP trigger CORRUPT_TRIGGER ;
 DROP trigger corrupt_trigger1 ;
 DROP trigger corrupt_trigger2 ;

相关内容