ASM单实例下CRS-4124,CRS-4000错误处理


安装一下GI,由于自己的笔记本资源有限,安装了Oracle11g GI,以便自己能学习ASM。安装完成之后一切都很正常。
但是今天启动以后发现报错如下:
[root@myrac1 ~]# su - grid
[grid@myrac1 ~]$ crsctl start has

CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.

查看ohasd.log日志
2014-02-19 18:02:42.143: [UiServer][2762939248] processMessage called
2014-02-19 18:02:42.144: [UiServer][2762939248] Sending message to PE. ctx= 0xa8be6268
2014-02-19 18:02:42.144: [UiServer][2762939248] Sending command to PE: 51
2014-02-19 18:02:42.144: [  CRSPE][2767141744] Processing PE command id=158. Description: [Stat Resource : 0xb1df7aa0]
2014-02-19 18:02:42.392: [  CRSPE][2767141744] PE Command [ Stat Resource : 0xb1df7aa0 ] has completed
2014-02-19 18:02:42.393: [  CRSPE][2767141744] UI Command [Stat Resource : 0xb1df7aa0] is replying to sender.
2014-02-19 18:02:42.395: [UiServer][2762939248] Done for ctx=0xa8be6268
2014-02-19 18:02:42.417: [UiServer][2756705136] Closed: remote end failed/disc.
2014-02-19 18:02:45.055: [UiServer][2756705136] S(0xa7fd3958): set Properties ( grid,0xb49c820)
2014-02-19 18:02:45.055: [UiServer][2756705136] S(0xa8bae2d8): Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=36)(KEY=CRSD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2014-02-19 18:02:45.066: [UiServer][2762939248] processMessage called
2014-02-19 18:02:45.066: [UiServer][2762939248] Sending message to PE. ctx= 0xa8be8d90
2014-02-19 18:02:45.067: [UiServer][2762939248] Sending command to PE: 52
2014-02-19 18:02:45.067: [  CRSPE][2767141744] Processing PE command id=159. Description: [Stat Resource : 0xa45323b0]
2014-02-19 18:02:45.092: [  CRSPE][2767141744] PE Command [ Stat Resource : 0xa45323b0 ] has completed
2014-02-19 18:02:45.093: [  CRSPE][2767141744] UI Command [Stat Resource : 0xa45323b0] is replying to sender.
2014-02-19 18:02:45.107: [UiServer][2762939248] Done for ctx=0xa8be8d90
2014-02-19 18:02:45.115: [UiServer][2756705136] Closed: remote end failed/disc.
2014-02-19 18:02:46.416: [UiServer][2756705136] S(0xa7fd3958): set Properties ( grid,0xb49c788)
2014-02-19 18:02:46.416: [UiServer][2756705136] S(0xa8bae2d8): Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=36)(KEY=CRSD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2014-02-19 18:02:46.427: [UiServer][2762939248] processMessage called
2014-02-19 18:02:46.428: [UiServer][2762939248] Sending message to PE. ctx= 0xa8bb23b0
2014-02-19 18:02:46.428: [UiServer][2762939248] Sending command to PE: 53
2014-02-19 18:02:46.428: [  CRSPE][2767141744] Processing PE command id=160. Description: [Stat Resource : 0xa453f3b8]
2014-02-19 18:02:46.436: [  CRSPE][2767141744] PE Command [ Stat Resource : 0xa453f3b8 ] has completed
2014-02-19 18:02:46.437: [  CRSPE][2767141744] UI Command [Stat Resource : 0xa453f3b8] is replying to sender.
2014-02-19 18:02:46.438: [UiServer][2762939248] Done for ctx=0xa8bb23b0
2014-02-19 18:02:46.460: [UiServer][2756705136] Closed: remote end failed/disc.
查看相关服务启动情况
[grid@myrac1 ohasd]$ ps -ef|grep cssd
grid      5402  4816  0 18:15 pts/3    00:00:00 grep cssd
[grid@myrac1 ohasd]$ ps -ef|grep has
grid      2857    1  1 17:29 ?        00:00:34 /g01/app/grid/product/11.2.0/grid/bin/ohasd.bin reboot
grid      5432  4816  0 18:16 pts/3    00:00:00 grep has
[grid@myrac1 ohasd]$ ps -ef|grep d.bin
grid      2857    1  1 17:29 ?        00:00:34 /g01/app/grid/product/11.2.0/grid/bin/ohasd.bin reboot
grid      3028    1  0 17:31 ?        00:00:01 /g01/app/grid/product/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid      3140    1  0 17:32 ?        00:00:20 /g01/app/grid/product/11.2.0/grid/bin/oraagent.bin
grid      3206    1  0 17:32 ?        00:00:04 /g01/app/grid/product/11.2.0/grid/bin/cssdagent
grid      3240    1  0 17:32 ?        00:00:03 /g01/app/grid/product/11.2.0/grid/bin/orarootagent.bin
grid      3253    1  0 17:32 ?        00:00:14 /g01/app/grid/product/11.2.0/grid/bin/diskmon.bin -d -f
grid      5461  4816  1 18:17 pts/3    00:00:00 grep d.bin

发现has服务没有启动,按理来说是开机自启动,应该会自动执行init.ohasd run命令
[grid@myrac1 ohasd]$ cat /etc/inittab |grep ohasd
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1
O官方的一段解释:
With Oracle Clusterware 11g Release 2 (11.2), cluster commands have been introduced that allow stopping the cluster stack on a remote note (as opposed to stopping it locally only with the commands listed above). In order to stop the cluster stack with the exception of the Oracle High Availability Services (OHAS, daemon OHASD), use :crsctl stop cluster -all      #stops the cluster layer on all servers in the cluster crsctl stop cluster -n   #stops the cluster layer the named server
在11g, ohasd包含了crsd、ocssd、evmd.11g cluster分两层, lower stack和higher stack.
ohasd负责启动lower stack的集群资源, crsd负责启动上层的集群资源.

既然ohasd服务没有启动,于是手工启动
[root@myrac1 ~]# /etc/init.d/init.ohasd run
mkfifo: cannot create fifo `/var/tmp/.oracle/npohasd': File exists
一直在运行,没有终止的,很像tomcat的运行方式,不过可以让它在后台运行,加&即可。
过一会儿查看资源启动情况
[grid@myrac1 ohasd]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME          TARGET  STATE        SERVER                  STATE_DETAILS     
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA_DG.dg   ONLINE  ONLINE      myrac1                                     
ora.DG_FRA.dg    ONLINE  ONLINE      myrac1                                     
ora.LISTENER.lsnr ONLINE  ONLINE      myrac1                                     
ora.SYS_DG.dg    ONLINE  ONLINE      myrac1                                     
ora.asm          ONLINE  ONLINE      myrac1                  Started           
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd        ONLINE  ONLINE      myrac1                                     
ora.diskmon      ONLINE  ONLINE      myrac1                                     
ora.hjj.db       OFFLINE OFFLINE                              Instance Shutdown 
[grid@myrac1 ohasd]$ crs_stat -t
Name          Type          Target    State    Host       
------------------------------------------------------------
ora.DATA_DG.dg ora....up.type ONLINE    ONLINE    myrac1     
ora.DG_FRA.dg  ora....up.type ONLINE    ONLINE    myrac1     
ora....ER.lsnr ora....er.type ONLINE    ONLINE    myrac1     
ora.SYS_DG.dg  ora....up.type ONLINE    ONLINE    myrac1     
ora.asm        ora.asm.type  ONLINE    ONLINE    myrac1     
ora.cssd      ora.cssd.type  ONLINE    ONLINE    myrac1     
ora.diskmon    ora....on.type ONLINE    ONLINE    myrac1     
ora.hjj.db    ora....se.type OFFLINE  OFFLINE       
只有ora.hjj.db是offline的,因为还没有启动数据库,启动之后就会变成ONLINE。
启动ASM实例
[grid@myrac1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on Wed Feb 19 17:34:03 2014

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Automatic Storage Management option

SQL> startup
ORA-01081: cannot start already-running ORACLE - shut it down first
SQL> shutdown immediater
SP2-0717: illegal SHUTDOWN option
SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL> startup
ASM instance started

Total System Global Area  284565504 bytes
Fixed Size                  1336036 bytes
Variable Size            258063644 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted


启动数据库
[oracle@myrac1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Wed Feb 19 18:40:55 2014

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area  313860096 bytes
Fixed Size                  1336232 bytes
Variable Size            130026584 bytes
Database Buffers          176160768 bytes
Redo Buffers                6336512 bytes
Database mounted.
Database opened.
查看资源启动情况
[grid@myrac1 ~]$ crs_stat -t
Name          Type          Target    State    Host       
------------------------------------------------------------
ora.DATA_DG.dg ora....up.type ONLINE    ONLINE    myrac1     
ora.DG_FRA.dg  ora....up.type ONLINE    ONLINE    myrac1     
ora....ER.lsnr ora....er.type ONLINE    ONLINE    myrac1     
ora.SYS_DG.dg  ora....up.type ONLINE    ONLINE    myrac1     
ora.asm        ora.asm.type  ONLINE    ONLINE    myrac1     
ora.cssd      ora.cssd.type  ONLINE    ONLINE    myrac1     
ora.diskmon    ora....on.type ONLINE    ONLINE    myrac1     
ora.hjj.db    ora....se.type ONLINE    ONLINE    myrac1 

如果发现有个资源没有启动,可以使用crsctl start res resource_name进行启动.
至此问题得到解决。

总结:在启动服务的时候,要时刻关注后台日志,做了哪些动作,这样才能清楚知道在哪个环节出错,以便快速定问题,解决问题。

附:常用命令
检查has的启动状态
crsctl check has
检查css的启动状态
crsctl check css
检查资源的启动情况
crs_stat -t -v
crsctl status res -t
启动某个资源
crsctl start res resource_name
ocr信息
ocrcheck
查看数据库hjj的配置信息
srvctl config database -d hjj
查看某个资源的参数
crs_stat -p ora.hjj.db
crsctl命令的用法
[grid@myrac1 ~]$ crsctl -h
Usage: crsctl add      - add a resource, type or other entity
      crsctl check    - check a service, resource or other entity
      crsctl config    - output autostart configuration
      crsctl debug    - obtain or modify debug state
      crsctl delete    - delete a resource, type or other entity
      crsctl disable  - disable autostart
      crsctl enable    - enable autostart
      crsctl get      - get an entity value
      crsctl getperm  - get entity permissions
      crsctl lsmodules - list debug modules
      crsctl modify    - modify a resource, type or other entity
      crsctl query    - query service state
      crsctl pin      - Pin the nodes in the nodelist
      crsctl relocate  - relocate a resource, server or other entity
      crsctl replace  - replaces the location of voting files
      crsctl setperm  - set entity permissions
      crsctl set      - set an entity value
      crsctl start    - start a resource, server or other entity
      crsctl status    - get status of a resource or other entity
      crsctl stop      - stop a resource, server or other entity
      crsctl unpin    - unpin the nodes in the nodelist
      crsctl unset    - unset a entity value, restoring its default

oracle高可用服务发布版本
crsctl query has releaseversion
oracle高可用服务版本
crsctl query has softwareversion

相关内容