undo 损坏案列


公司一台测试环境的基于linux 平台下 Oracle 11.2.0.3 的数据库,为开归档,未备份。 21号晚上,因/目录下 空间使用%100,oracle HOME目录在系统 / 目录下:

因硬盘资源占尽,不能连接操作,oracle 数据库挂起。

某人的操作,查看undotbs1 占用最大,通过mv 移动到 另一目录,同时系统被重启,使得undotbs1 数据文件损坏,不能使用,最后又做了一个rm  操作, 重启库,导致故障出现!

报错一:

Wed Jan 22 09:42:50 2014
ALTER DATABASE OPEN
Errors in file /u01/app/oracle/diag/rdbms/gtadata13/gtadata13/trace/gtadata13_dbw0_4245.trc:
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/app/oracle/oradata/gtadata13/undotbs01.dbf'
ORA-27047: unable to read the header block of file
Linux-x86_64 Error: 25: Inappropriate ioctl for device
Additional information: 1
Wed Jan 22 09:42:52 2014
Checker run found 1 new persistent data failures
Errors in file /u01/app/oracle/diag/rdbms/gtadata13/gtadata13/trace/gtadata13_ora_4361.trc:
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/u01/app/oracle/oradata/gtadata13/undotbs01.dbf'
ORA-1157 signalled during: ALTER DATABASE OPEN...

--- 就是oracle 在mount后,不能加载到open 状态。

2 接下来操作: 因为undo tablespace 数据文件undotbs1 没有了,想通过重建一个undo 表空间 undotbs2 把数据库启动到open 状态

操作:

SQL> show parameter undo
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS1

SQL > CREATE UNDO TABLESPACE UNDOTBS2 DATAFILE '/XXXX.DBF' SIZE 32M AUTOEXTEND ON NEXT 32M MAXSIZE 10G; --重创建表空间
SQL > SELECT * FROM V$TABLESAPCE  SELECT NAME,STATUS FROM V$DATAFILE  -- 查询其状态值
SQL > ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS2 SCOPE=BOTH    -- 通过show parameter undo 查看是否使用。

3 此时,数据库可以open起来, 但是通过client ,或者其他用户连接时,报错:

报错二
SQL> conn input/INPUT
ERROR:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/gtadata13/undotbs01.dbf'
ORA-02002: error while writing to audit trail
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/gtadata13/undotbs01.dbf'

4 根据报错,发现不仅仅是 undotbs1数据文件有问题,还有开启了审计 audit: 如是

先关闭审计
SQL > SHOW PARAMETER AUDIT
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
audit_file_dest string /u01/app/oracle/admin/gtadata1
      3/adump
audit_sys_operations boolean FALSE
audit_syslog_level string
audit_trail string DB

SQL > alter system set audit_trail=none scope=spfile      -- 设置后需要重启库。  --具体见审计

5 再通过对undotbs1数据文件操作,使其offline 处理(看行否)

SQL > alter database datafile 3 offline drop ;

6 通过 v$logfile,dba_tablespaces, dba_data_files 查看数据表空间,数据文件的状态:

SQL> select tablespace_name,file_id,file_name from dba_data_files;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            TABLESPACE_NAME FILE_ID FILE_NAME
------- ---------- -------------------------------------------------------------
USERS 4 /u01/app/oracle/oradata/gtadata13/users01.dbf
UNDOTBS1 3 /u01/app/oracle/oradata/gtadata13/undotbs01.dbf
SQL> select status,tablespace_name from dba_tablespaces;
                                                     
STATUS TABLESPACE_NAME
--------- ------------------------------
ONLINE SYSTEM
ONLINE SYSAUX
ONLINE UNDOTBS1 

7 此时发现undotbs1 数据文件还在,同时undotbs1 表空online

如是操作: 

报错三
 SQL>  alter tablespace UNDOTBS1 offline;
    alter tablespace UNDOTBS1 offline
    *
      ERROR at line 1:
      ORA-01191: file 3 is already offline - cannot do a normal offline
    ORA-01110: data file 3: '/u01/app/oracle/oradata/gtadata13/undotbs01.dbf'
    --- 此时心想,怎么不能offline了,看能否风能 temporary  offline
 
查询数据文件头,select FILE#,checkpoint_change#,recover, fuzzy from v$datafile_header;

最后通过 SQL> alter system checkpoint;  --做一个检查点,再试试:
System altered.
SQL> alter tablespace undotbs1 offline temporary;
Tablespace altered.

再次通过 dba_tablespaces 查看 undotbs1 的状态,发现 是否offline。 offline 状态。

8 测试再看看能否通过其他用户连接或client 连接:
  -- 发现ok,可以通过其他用户连接了,但是一些程序 涉及到报错:

报错四:

执行存储过程失败 ORA-00376: 此时无法读取文件 3
ORA-01110: 数据文件 3: '/u01/app/oracle/oradata/gtadata13/undotbs01.dbf'
ORA-06512: 在 "GTA_DATA.SP_QA_TIMELINESS", line 54
ORA-06512: 在 line 1

 如是想了想 ,确实,因为undotbs1 是通过物理删除的,那么oracle 一致性 会是这些需要recovery恢复:

9 既然offline,可否删除掉,(估计比较麻烦,这回退给干掉了,怎么回退了?)

 通过dba_rollback_segs 发现 还有很多 recovery  的undotbs1 段需要回滚恢复,是数据一致性。

SQL> select segment_name,tablespace_name,status from dba_rollback_segs;
SEGMENT_NAME TABLESPACE_NAME STATUS
------------------------------ ------------------------------ ----------------
SYSTEM                        SYSTEM      ONLINE
_SYSSMU122_928896348$            UNDOTBS1 OFFLINE
_SYSSMU121_4101333926$            UNDOTBS1 OFFLINE
_SYSSMU120_471964226$            UNDOTBS1 OFFLINE
_SYSSMU119_3645569891$            UNDOTBS1 OFFLINE
_SYSSMU118_1816999230$            UNDOTBS1 OFFLINE
_SYSSMU117_3513527861$            UNDOTBS1 OFFLINE
_SYSSMU116_2167311593$            UNDOTBS1 OFFLINE
_SYSSMU90_1969094056$            UNDOTBS1 NEEDS RECOVERY
_SYSSMU89_2804401042$            UNDOTBS1 NEEDS RECOVERY
_SYSSMU88_3446396459$            UNDOTBS1 NEEDS RECOVERY
_SYSSMU87_268667266$              UNDOTBS1 NEEDS RECOVERY
_SYSSMU86_1912503840$            UNDOTBS1 NEEDS RECOVERY
_SYSSMU85_2732352333$            UNDOTBS1 NEEDS RECOVERY
_SYSSMU84_1805825668$            UNDOTBS1 NEEDS RECOVERY
_SYSSMU83_1984855352$            UNDOTBS1 NEEDS RECOVERY
_SYSSMU212_1777710046$            UNDOTBS2 ONLINE
_SYSSMU211_3260590093$            UNDOTBS2 ONLINE
_SYSSMU210_1915944113$            UNDOTBS2 ONLINE
_SYSSMU209_2868303011$            UNDOTBS2 ONLINE
_SYSSMU208_3687438092$            UNDOTBS2 ONLINE
_SYSSMU207_752508113$            UNDOTBS2 ONLINE

此时,百度,及询问了一些高手,说最好做个备份:  如是想通过expdp  导入导出:

报错五:

[oracle@gtadata13 dump_dir]$ impdp dcsys/DCSYS directory=dump_dir dumpfile=TBL_CHN_FN_ForecFin.dmp
Import: Release 11.2.0.3.0 - Production on Wed Jan 22 14:40:30 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORA-31626: job does not exist
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
ORA-06512: at "SYS.KUPV$FT", line 1042
ORA-31637: cannot create job SYS_IMPORT_FULL_01 for user DCSYS
ORA-31632: master table "DCSYS.SYS_IMPORT_FULL_01" not found, invalid, or inaccessible
ORA-31635: unable to establish job resource synchronization
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
ORA-06512: at "SYS.KUPV$FT_INT", line 2401
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/gtadata13/undotbs01.dbf'
  -- 这也不行,看来,只能老实的弄了

10 ,打算删除 这offline undotbs1表空间,看是否跳过:

报错六:

SQL> drop tablespace undotbs1;
drop tablespace undotbs1
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU1_1240252155$' found, terminate dropping tablespace
SQL> DROP ROLLBACK SEGMENT "_SYSSMU1_1240252155$";
DROP ROLLBACK SEGMENT "_SYSSMU1_1240252155$"
*
ERROR at line 1:
ORA-30025: DROP segment '_SYSSMU1_1240252155$' (in undo tablespace) not allowed

再次通过百度,高手请教:  发现需要在pfile 上 添加隐藏参数文件_offline_rollback_segments (‘xx’)和 _corrupted_rollback_segments ('xx') 后再删除,看否跳过

在pfile中加入参数
_offline_rollback_segments=(‘’)
_corrupted_rollback_segments=(‘’)    ---括号参数为dba_rollback_segs中 undotbs1 status 为need recovery 状态的这种值“_SYSSMU122_928896348$”

10 : 于是通过 pfile添加影藏参数 或者
    alter system set _offline_rollback_segments = " 值 " socpe=spfile
    alter system set _corrupted_rollback_segments  = " 值 " socpe=spfile  进行操作。


当时我通过重建pfile参数文件 *._offline_rollback_segments=('_SYSSMU90_1969094056$',。。。。)
                          *._corrupted_rollback_segments=('_SYSSMU90_1969094056$',    来操作

然后 通过删除所有 dba_rollback_segs 下的所有值后,在drop undotbs1 表空间:

SQL> drop rollback segment "_SYSSMU1_1240252155$";  ---注意双引号不能有空格
Rollback segment dropped.    ---对应的值,一个一个删除。

11 : 最后删除 undotbs1 表空间
 
  ---ok,可以删除了,再通过dba_rollback_segs发现,没有了undtotbs1 的表空间了。


SQL> select segment_name,tablespace_name,status from dba_rollback_segs;
SEGMENT_NAME TABLESPACE_NAME STATUS
------------------------------ ------------------------------ ----------------
SYSTEM SYSTEM ONLINE
_SYSSMU212_1777710046$ UNDOTBS2 ONLINE
_SYSSMU211_3260590093$ UNDOTBS2 ONLINE
_SYSSMU210_1915944113$ UNDOTBS2 ONLINE
_SYSSMU209_2868303011$ UNDOTBS2 ONLINE
_SYSSMU208_3687438092$ UNDOTBS2 ONLINE
_SYSSMU207_752508113$ UNDOTBS2 ONLINE
_SYSSMU206_883733676$ UNDOTBS2 ONLINE
_SYSSMU205_725465268$ UNDOTBS2 ONLINE
_SYSSMU204_1401227473$ UNDOTBS2 ONLINE
_SYSSMU203_3100642042$ UNDOTBS2 ONLINE

12 :  扫尾:  a: 恢复原来好审计功能设置,
              b: 多切换几次,查看业务数据
              c: 这样操作,虽然 可以了,但是有部分业务数据丢失
              d: 做好备份
              e:  就像大师说的,遇事,莫急躁

相关内容

    暂无相关文章