Oracle 11g RAC ASM磁盘全部丢失后的恢复
Oracle 11g RAC ASM磁盘全部丢失后的恢复
一、环境描述
(1)Oracle 11.2.0.3 RAC ON Oracle Linux 6 x86_64,只有一个ASM外部冗余磁盘组——DATA;
(2)OCR,VOTEDISK,DATAFILE,CONTROLFILE,SPFILE全部位于这个磁盘组上;
二、故障描述
(1)存储故障导致ASM磁盘丢失。
(2)CRS因为OCR和VOTEDISK的丢失,除了OHAS还联机外,CLUSTERWARE服务都已经停止。
三、备份情况
(1)RMAN备份:包括controlfile,database,spfile,archivelog,
(2)OCR备份:没有进行过人工备份,在$CRS_HOME/cdata目录下有CRS自动备份文件。
四、操作步骤
说明:准使用CRS自动备份的文件恢复OCR,使用RMAN备份来恢复数据库;准备恢复数据的同时,调整ASM磁盘组,将OCR,VOTEDISK同数据库文件分开存放。
推荐阅读:
Oracle 11g从入门到精通 PDF+光盘源代码
Ubuntu 12.04(amd64)安装完Oracle 11gR2后各种问题解决方法
4.1 恢复OCR和VOTEDISK
(1) 在所有RAC节点上停止CRS服务
- [root@rac1 ~]# crsctl stop has -f
- CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on'rac1'
- CRS-2673: Attempting to stop 'ora.mdnsd'on'rac1'
- CRS-2673: Attempting to stop 'ora.crf'on'rac1'
- CRS-2677: Stop of'ora.mdnsd'on'rac1' succeeded
- CRS-2677: Stop of'ora.crf'on'rac1' succeeded
- CRS-2673: Attempting to stop 'ora.gipcd'on'rac1'
- CRS-2677: Stop of'ora.gipcd'on'rac1' succeeded
- CRS-2673: Attempting to stop 'ora.gpnpd'on'rac1'
- CRS-2677: Stop of'ora.gpnpd'on'rac1' succeeded
- CRS-2793: Shutdown of Oracle High Availability Services-managed resources on'rac1' has completed
- CRS-4133: Oracle High Availability Services has been stopped.
- [root@rac2 ~]# crsctl stop has -f
- CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on'rac2'
- CRS-2673: Attempting to stop 'ora.mdnsd'on'rac2'
- CRS-2673: Attempting to stop 'ora.crf'on'rac2'
- CRS-2677: Stop of'ora.mdnsd'on'rac2' succeeded
- CRS-2677: Stop of'ora.crf'on'rac2' succeeded
- CRS-2673: Attempting to stop 'ora.gipcd'on'rac2'
- CRS-2677: Stop of'ora.gipcd'on'rac2' succeeded
- CRS-2673: Attempting to stop 'ora.gpnpd'on'rac2'
- CRS-2677: Stop of'ora.gpnpd'on'rac2' succeeded
- CRS-2793: Shutdown of Oracle High Availability Services-managed resources on'rac2' has completed
- CRS-4133: Oracle High Availability Services has been stopped.
(2) 在一个节点上以NOCRS方式启动CRS,此操作会启动ASM实例。
- [root@rac1 ~]# crsctl start crs -excl -nocrs
- CRS-4123: Oracle High Availability Services has been started.
- CRS-2672: Attempting to start 'ora.mdnsd'on'rac1'
- CRS-2676: Start of'ora.mdnsd'on'rac1' succeeded
- CRS-2672: Attempting to start 'ora.gpnpd'on'rac1'
- CRS-2676: Start of'ora.gpnpd'on'rac1' succeeded
- CRS-2672: Attempting to start 'ora.cssdmonitor'on'rac1'
- CRS-2672: Attempting to start 'ora.gipcd'on'rac1'
- CRS-2676: Start of'ora.cssdmonitor'on'rac1' succeeded
- CRS-2676: Start of'ora.gipcd'on'rac1' succeeded
- CRS-2672: Attempting to start 'ora.cssd'on'rac1'
- CRS-2672: Attempting to start 'ora.diskmon'on'rac1'
- CRS-2676: Start of'ora.diskmon'on'rac1' succeeded
- CRS-2676: Start of'ora.cssd'on'rac1' succeeded
- CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip'on'rac1'
- CRS-2672: Attempting to start 'ora.ctssd'on'rac1'
- CRS-2681: Clean of'ora.cluster_interconnect.haip'on'rac1' succeeded
- CRS-2672: Attempting to start 'ora.cluster_interconnect.haip'on'rac1'
- CRS-2676: Start of'ora.ctssd'on'rac1' succeeded
- CRS-2676: Start of'ora.cluster_interconnect.haip'on'rac1' succeeded
- CRS-2672: Attempting to start 'ora.asm'on'rac1'
- CRS-2676: Start of'ora.asm'on'rac1' succeeded
(3) 新添加了三块磁盘,已经使用UDEV进行了绑定,查看磁盘状态。
- [root@rac1 ~]# su - grid
- [grid@rac1 ~]$ sqlplus / as sysasm
- SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 5 17:41:49 2013
- Copyright (c) 1982, 2011, Oracle. All rights reserved.
- Connected to:
- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
- With the Real Application Clusters and Automatic Storage Management options
- SQL> select group_number group#, disk_number disk#, OS_MB, state, path, header_status from v$asm_disk orderby 1,2;
- GROUP# DISK# OS_MB STATE PATH HEADER_STATUS
- ---------- ---------- ---------- ---------- -------------------- ----------------------
- 0 0 1024 NORMAL /dev/asm-diskc CANDIDATE
- 0 1 5120 NORMAL /dev/asm-diskd CANDIDATE
- 0 2 20480 NORMAL /dev/asm-diskb CANDIDATE
(4) 创建三个磁盘组,SYSTEMDG给CRS使用,用于存放OCR,VOTEDISK和ASM实例的SPFILE。其余两个给ORACLE使用,DATADG用于存放datafile,controlfile,redolog,spfile;ARCLOGDG存放archivelog。
- SQL> create diskgroup SYSTEMDG external redundancy
- 2 disk '/dev/asm-diskc'
- 3 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
- Diskgroup created.
- SQL> create diskgroup DATADG external redundancy
- 2 disk '/dev/asm-diskb'
- 3 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
- Diskgroup created.
- SQL> create diskgroup ARCLOGDG external redundancy
- 2 disk '/dev/asm-diskd'
- 3 ATTRIBUTE 'compatible.rdbms' = '11.2','compatible.asm' = '11.2';
- Diskgroup created.
(5) 准备恢复OCR和VOTEDISK,/etc/oracle/ocr.loc中记录了OCR路径,修改ocrconfig_loc的值,以便将OCR恢复到新的磁盘组中。
- [root@rac1 ~]# more /etc/oracle/ocr.loc
- ocrconfig_loc=+DATA
- local_only=FALSE
- [root@rac1 ~]# vi /etc/oracle/ocr.loc
- ocrconfig_loc=+SYSTEMDG
- local_only=FALSE
(6) 恢复OCR
- [root@rac1 ~]# ocrconfig -showbackup
- PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy
- rac1 2013/07/05 12:30:00 /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
- rac1 2013/07/05 08:30:00 /u01/app/11.2.0/grid/cdata/rac-cluster/backup01.ocr
- rac1 2013/07/05 04:30:00 /u01/app/11.2.0/grid/cdata/rac-cluster/backup02.ocr
- rac1 2013/07/05 00:29:59 /u01/app/11.2.0/grid/cdata/rac-cluster/day.ocr
- rac1 2013/07/05 00:29:59 /u01/app/11.2.0/grid/cdata/rac-cluster/week.ocr
- PROT-25: Manual backups for the Oracle Cluster Registry are not available
- [root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
- [root@rac1 ~]#
- [root@rac1 ~]# ocrcheck
- Status of Oracle Cluster Registry isas follows :
- Version : 3
- Total space (kbytes) : 262120
- Used space (kbytes) : 2840
- Available space (kbytes) : 259280
- ID : 59415097
- Device/File Name : +SYSTEMDG
- Device/File integrity check succeeded
- Device/File not configured
- Device/File not configured
- Device/File not configured
- Device/File not configured
- Cluster registry integrity check succeeded
- Logical corruption check succeeded
(7) 创建VOTEDISK
- [root@rac1 ~]# crsctl replace votedisk +SYSTEMDG
- CRS-4602: Failed 27 toadd voting file afb0ca0f35684f1abfd43d5ec2dc1123.
- Failed toreplace voting disk groupwith +SYSTEMDG.
- CRS-4000: Command Replace failed, or completed with errors.
以上报错是因为使用UDEV绑定ASM磁盘时需要更改默认磁盘搜索路径为/dev/asm*,修改ASM磁盘搜索路径
- [root@rac1 ~]# su - grid
- [grid@rac1 ~]$ sqlplus / as sysasm
- SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 5 19:03:25 2013
- Copyright (c) 1982, 2011, Oracle. All rights reserved.
- Connected to:
- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
- With the Real Application Clusters and Automatic Storage Management options
- SQL> show parameter asm_diskstring
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- asm_diskstring string
- SQL>
- SQL>
- SQL> alter system set asm_diskstring = '/dev/asm*';
- System altered.
- SQL> create spfile from memory;
- create spfile from memory
- *
- ERROR at line 1:
- ORA-00349: failure obtaining block sizefor
- '+DATA/rac-cluster/asmparameterfile/registry.253.819922365'
- ORA-15001: diskgroup "DATA" does not exist orisnot mounted
- SQL> create spfile='+SYSTEMDG'from memory;
- File created.
- SQL> startup force mount;
- ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
- ASM instance started
- Total System Global Area 283930624 bytes
- Fixed Size 2227664 bytes
- Variable Size 256537136 bytes
- ASM Cache 25165824 bytes
- ASM diskgroups mounted
在次创建VOTEDISK,成功。
- [root@rac1 init]# crsctl replace votedisk +SYSTEMDG
- Successful addition of voting disk 8ebb7a63accb4fa8bfa7ab65df7a8c8a.
- Successfully replaced voting disk groupwith +SYSTEMDG.
- CRS-4266: Voting file(s) successfully replaced
(8) OCR和VOTEDISK都恢复完成后,重启CRS到正常模式。
- [root@rac1 ~]# crsctl stop has -f
- CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on'rac1'
- CRS-2673: Attempting to stop 'ora.mdnsd'on'rac1'
- CRS-2673: Attempting to stop 'ora.ctssd'on'rac1'
- CRS-2673: Attempting to stop 'ora.asm'on'rac1'
- CRS-2677: Stop of'ora.mdnsd'on'rac1' succeeded
- CRS-2677: Stop of'ora.asm'on'rac1' succeeded
- CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip'on'rac1'
- CRS-2677: Stop of'ora.ctssd'on'rac1' succeeded
- CRS-2677: Stop of'ora.cluster_interconnect.haip'on'rac1' succeeded
- CRS-2673: Attempting to stop 'ora.cssd'on'rac1'
- CRS-2677: Stop of'ora.cssd'on'rac1' succeeded
- CRS-2673: Attempting to stop 'ora.gipcd'on'rac1'
- CRS-2677: Stop of'ora.gipcd'on'rac1' succeeded
- CRS-2673: Attempting to stop 'ora.gpnpd'on'rac1'
- CRS-2677: Stop of'ora.gpnpd'on'rac1' succeeded
- CRS-2793: Shutdown of Oracle High Availability Services-managed resources on'rac1' has completed
- CRS-4133: Oracle High Availability Services has been stopped.
- [root@rac1 ~]# crsctl start crs
- CRS-4123: Oracle High Availability Services has been started.
- [root@rac1 ~]# crsctl check crs
- CRS-4638: Oracle High Availability Services is online
- CRS-4537: Cluster Ready Services is online
- CRS-4529: Cluster Synchronization Services is online
- CRS-4533: Event Manager is online
- [root@rac1 ~]#
|
评论暂时关闭