ASM DISK Group加载ORA-15183错误一例


进入Oracle 11gR2,ASM(Automatic Storage Management)从Database组件中剥离出来,作为独立组件Component进入Grid管理范畴。

本篇主要介绍笔者遇到的一个数据库启动加载过程中出现的问题。同官方MOS推荐的策略相比,有一些不同之处。记录下来,留待需要的朋友待查使用。

1、问题说明

笔者环境是Oracle 单实例+Grid Infrastructure,版本号为11.2.0.4。由于安全原因,从MOS上下载了最新的安全补丁和升级补丁。升级之后的版本为11.2.0.4.6。

但是,在升级最后步骤——执行SQL脚本环节,出现了一些问题。

SQL*Plus: Release 11.2.0.4.0 Production on Mon May 25 16:08:57 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> conn / as sysdba

Connected to an idle instance.

SQL> startup

ORACLE instance started.

Total System Global Area 2087780352 bytes

Fixed Size                  2254824 bytes

Variable Size            553650200 bytes

Database Buffers        1526726656 bytes

Redo Buffers                5148672 bytes

ORA-00205: error in identifying control file, check alert log for more info

从提示信息角度看,Oracle在经历启动nomount阶段之后,在定位control file的过程中出现了问题。

老实说,虽然是测试环境,但是笔者还是比较惊慌的。于是尝试使用srvctl集群件启动策略。

[grid@NCR-Standby-Asm ~]$ srvctl start database -d sicsstb

PRCC-1014 : sicsstb was already running

PRCR-1004 : Resource ora.sicsstb.db is already running

PRCR-1079 : Failed to start resource ora.sicsstb.db

CRS-5702: Resource 'ora.sicsstb.db' is already running on 'ncr-standby-asm'

2、问题分析

首先确认系统是否可以使用srvctl启动,判断一下GI上面各种资源resource状态。

[grid@NCR-Standby-Asm ~]$ srvctl stop database -d sicsstb

[grid@NCR-Standby-Asm ~]$ srvctl status asm

ASM is running on ncr-standby-asm

[grid@NCR-Standby-Asm ~]$ crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME          TARGET  STATE        SERVER                  STATE_DETAILS       

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATA.dg

              ONLINE  ONLINE      ncr-standby-asm                             

ora.LISTENER.lsnr

              ONLINE  ONLINE      ncr-standby-asm                             

ora.RECO.dg

              ONLINE  ONLINE      ncr-standby-asm                             

ora.asm

              ONLINE  ONLINE      ncr-standby-asm          Started             

ora.ons

              OFFLINE OFFLINE      ncr-standby-asm                             

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.cssd

      1        ONLINE  ONLINE      ncr-standby-asm                             

ora.diskmon

      1        OFFLINE OFFLINE                                                   

ora.evmd

      1        ONLINE  ONLINE      ncr-standby-asm                             

ora.sicsstb.db

      1        OFFLINE OFFLINE                              Instance Shutdown   

[grid@NCR-Standby-Asm ~]$ srvctl start database -d sicsstb

[grid@NCR-Standby-Asm ~]$ 

[oracle@NCR-Standby-Asm ~]$ cd $ORACLE_HOME/rdbms/admin

[oracle@NCR-Standby-Asm admin]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.4.0 Production on Mon May 25 16:14:00 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> conn / as sysdba

Connected.

SQL> select open_mode from v$database;

OPEN_MODE

--------------------

READ WRITE

笔者猜测,这个故障和ASM相关。按照逐步抽丝剥茧的思路,先从数据库日志入手(找到失败启动的那次动作)。

Mon May 25 16:09:28 2015

MMON started with pid=17, OS id=4151 

Mon May 25 16:09:28 2015

MMNL started with pid=18, OS id=4153 

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

starting up 1 shared server(s) ...

NOTE: initiating MARK startup 

Starting background process MARK

ORACLE_BASE from environment = /u02/app/oracle

Mon May 25 16:09:28 2015

MARK started with pid=21, OS id=4161 

NOTE: MARK has subscribed 

Mon May 25 16:09:28 2015

ALTER DATABASE  MOUNT

Mon May 25 16:09:28 2015

ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1521))' SCOPE=MEMORY SID='sicsstb';

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

WARNING: FAILED to load library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so 

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

SUCCESS: diskgroup DATA was dismounted

ERROR: diskgroup DATA was not mounted

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

WARNING: FAILED to load library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so 

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

SUCCESS: diskgroup RECO was dismounted

ERROR: diskgroup RECO was not mounted

ORA-00210: cannot open the specified control file

ORA-00202: control file: '+RECO/sicsstb/controlfile/current.256.878897845'

ORA-17503: ksfdopn:2 Failed to open file +RECO/sicsstb/controlfile/current.256.878897845

ORA-15001: diskgroup "RECO" does not exist or is not mounted

ORA-15040: diskgroup is incomplete

ORA-15040: diskgroup is incomplete

ORA-00210: cannot open the specified control file

ORA-00202: control file: '+DATA/sicsstb/controlfile/current.260.878897845'

ORA-17503: ksfdopn:2 Failed to open file +DATA/sicsstb/controlfile/current.260.878897845

ORA-15001: diskgroup "DATA" does not exist or is not mounted

ORA-15040: diskgroup is incomplete

ORA-15040: diskgroup is incomplete

ORA-15040: diskgroup is incomplete

ORA-205 signalled during: ALTER DATABASE  MOUNT...

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

WARNING: FAILED to load library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so 

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

Mon May 25 16:09:31 2015

SUCCESS: diskgroup DATA was dismounted

ERROR: diskgroup DATA was not mounted

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

WARNING: FAILED to load library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so 

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

SUCCESS: diskgroup RECO was dismounted

ERROR: diskgroup RECO was not mounted

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-15183: ASMLIB initialization error [driver/agent not installed]

从提示信息看,Oracle在mount阶段时候,利用spfile中指定的control file位置去访问+DATA和+RECO磁盘组,但是两个磁盘组没有mount,所以才开始报错。

参数中,control file以镜像冗余方式存在在ASM Diskgroup中。

SQL> show parameter spfile

NAME                                TYPE        VALUE

------------------------------------ ----------- ------------------------------

spfile  string      +DATA/sicsstb/spfilesicsstb.ora

SQL> show parameter control

NAME                                TYPE        VALUE

------------------------------------ ----------- ------------------------------

control_file_record_keep_time        integer    7

control_files                        string      +DATA/sicsstb/controlfile/curr

                                                ent.260.878897845, +RECO/sicsstb/controlfile/current.256.878

                                                897845

control_management_pack_access      string      DIAGNOSTIC+TUNING

注意:此处的ASM无法启动,并不是笔者没有启动ASM组件。如果是简单因为ASM组件没有开启,先启动数据库服务的话,错误信息如下:

[oracle@NCR-Standby-Asm ~]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jun 1 08:39:11 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> conn / as sysdba

Connected to an idle instance.

SQL> startup

ORA-01078: failure in processing system parameters

ORA-01565: error in identifying file '+DATA/sicsstb/spfilesicsstb.ora'

ORA-17503: ksfdopn:10 Failed to open file +DATA/sicsstb/spfilesicsstb.ora

ORA-15077: could not locate ASM instance serving a required diskgroup

nomount阶段要访问spfile,我们的SPFILE是在+DATA里面,如果ASM真的不可用的话,连nomount阶段都不能进入。

提示信息上,似乎是笔者的ASM驱动有问题。笔者操作系统环境是Red Hat Linux 6.5,使用kmod作为ASM驱动程序。

[root@NCR-Standby-Asm ~]# rpm -qa | grep asm

libatasmart-0.17-4.el6_2.x86_64

oracleasmlib-2.0.4-1.el6.x86_64

oracleasm-support-2.1.8-1.el6.x86_64

kmod-oracleasm-2.0.6.rh1-3.el6_5.x86_64

查找对应生成的trace文件,可以看到问题的更详细描述。

[root@NCR-Standby-Asm trace]# tail -n 200 sicsstb_rbal_4147.trc

Trace file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, Automatic Storage Management, OLAP, Data Mining

and Real Application Testing options

ORACLE_HOME = /u02/app/oracle/product/11.2.0/dbhome_1

System name:    Linux

Node name:      NCR-Standby-Asm

Release:        2.6.32-431.el6.x86_64

Version:        #1 SMP Sun Nov 10 22:19:54 EST 2013

Machine:        x86_64

VM name:        VMWare Version: 6

Instance name: sicsstb

Redo thread mounted by this instance: 0 <none>

Oracle process number: 15

Unix process pid: 4147, image: oracle@NCR-Standby-Asm (RBAL)

*** 2015-05-25 16:09:31.634

*** SESSION ID:(190.1) 2015-05-25 16:09:31.634

*** CLIENT ID:() 2015-05-25 16:09:31.634

*** SERVICE NAME:() 2015-05-25 16:09:31.634

*** MODULE NAME:() 2015-05-25 16:09:31.634

*** ACTION NAME:() 2015-05-25 16:09:31.634

ERROR: asm_version error. err: driver/agent not installed rc:2

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ERROR: asm_version error. err: driver/agent not installed rc:2

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ERROR: asm_version error. err: driver/agent not installed rc:2

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ERROR: asm_version error. err: driver/agent not installed rc:2

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ERROR: asm_version error. err: driver/agent not installed rc:2

ORA-15183: ASMLIB initialization error [driver/agent not installed]

ORA-15183: ASMLIB initialization error [driver/agent not installed]

Incident 9721 created, dump file: /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/incident/incdir_9721/sicsstb_rbal_4147_i9721.trc

ORA-00600: internal error code, arguments: [kfdskAlloc0], [], [], [], [], [], [], [], [], [], [], []

error 488 detected in background process

ORA-00600: internal error code, arguments: [kfdskAlloc0], [], [], [], [], [], [], [], [], [], [], []

kjzduptcctx: Notifying DIAG for crash event

----- Abridged Call Stack Trace -----

ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5594<-ksbrdp()+3507<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253 

----- End of Abridged Call Stack Trace -----

*** 2015-05-25 16:09:32.865

RBAL (ospid: 4147): terminating the instance due to error 488

ksuitm: waiting up to [5] seconds before killing DIAG(4129)

终止进程操作,查看alert log的进一步详细信息。

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc  (incident=9721):

ORA-00600: internal error code, arguments: [kfdskAlloc0], [], [], [], [], [], [], [], [], [], [], []

Incident details in: /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/incident/incdir_9721/sicsstb_rbal_4147_i9721.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_rbal_4147.trc:

ORA-00600: internal error code, arguments: [kfdskAlloc0], [], [], [], [], [], [], [], [], [], [], []

RBAL (ospid: 4147): terminating the instance due to error 488

System state dump requested by (instance=1, osid=4147 (RBAL)), summary=[abnormal instance termination].

System State dumped to trace file /u02/app/oracle/diag/rdbms/sicsstb/sicsstb/trace/sicsstb_diag_4129_20150525160933.trc

Dumping diagnostic data in directory=[cdmp_20150525160933], requested by (instance=1, osid=4147 (RBAL)), summary=[abnormal instance termination].

Instance terminated by RBAL, pid = 4147

Mon May 25 16:09:40 2015

Adjusting the default value of parameter parallel_max_servers

from 160 to 120 due to the value of parameter processes (150)

提示信息中出现了ora-600错误,并且在最后有一个半提示半建议的信息,要求提升参数parallel_max_servers的数量参数。GI和ASM要伴随着多个并行工作进程,笔者猜测一种可能是不是进程数量过多,参数设置较小而引起的故障。

尝试将参数进行调整。

SQL> show parameter parallel_max_servers

NAME                                TYPE        VALUE

------------------------------------ ----------- ------------------------------

parallel_max_servers                integer    120

SQL> alter system set parallel_max_servers=150 scope=both;

System altered.

SQL> show parameter parallel_max

NAME                                TYPE        VALUE

------------------------------------ ----------- ------------------------------

parallel_max_servers                integer    150

在MOS上,笔者也进行了检查,Oracle一些文章认为是权限问题。但是似乎没有过多问题。

[oracle@NCR-Standby-Asm ~]$ cd $ORACLE_HOME/bin

[oracle@NCR-Standby-Asm bin]$ ls -l grep oracle

ls: cannot access grep: No such file or directory

-rwsr-s--x 1 oracle asmadmin 239882127 May 25 17:06 oracle

之后,重启Database,服务正常。

SQL> startup

ORACLE instance started.

Total System Global Area 2087780352 bytes

Fixed Size                  2254824 bytes

Variable Size            553650200 bytes

Database Buffers        1526726656 bytes

Redo Buffers                5148672 bytes

Database mounted.

Database opened.

故障解决。

3、结论

老实说,笔者对这个故障的解决还是有一些不明白的地方。从直观看,在进行补丁操作之后,Oracle实例对进程数目要求是增加的,所以需要进行一些调整。

相关内容

    暂无相关文章