VIP不能正常启动,报错CRS-1006


VIP不能正常启动
描述:我们的环境是2节点RAC,节点1发生物理故障造成宕机。
此时我想将节点1的VIP从节点2上启动,以便单节点对用户程序透明。

[Oracle@UNID02 ~]$ crs_start ora.unid01.vip
Attempting to start `ora.unid01.vip` on member `UNID02`
Start of `ora.unid01.vip` on member `UNID02` failed.
CRS-1006: No more members to consider

CRS-0215: Could not start resource 'ora.unid01.vip'.

[oracle@UNID02 ~]$
但是启动的时候报错CRS-1006: No more members to consider。
查看VIP日志(位于$CRS_HOME/log/<NODENAME>/racg),发现报网卡相关错:
2013-12-10 09:50:26.877: [    RACG][2345793472] [16627][2345793472][ora.unid01.vip]: checkIf: interface eth0 is down
Invalid parameters, or failed to bring up VIP (host=UNID02) ==============================>

2013-12-10 09:50:26.877: [    RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip start unid01

2013-12-10 09:50:26.877: [    RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: rc = 1, time = 3.130s

2013-12-10 09:50:30.010: [    RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip check unid01

2013-12-10 09:50:30.010: [    RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: rc = 1, time = 3.130s

2013-12-10 09:50:30.010: [    RACG][2345793472] [16627][2345793472][ora.unid01.vip]: end for resource = ora.unid01.vip, action = start, status = 1, time = 6.280s


013-12-10 01:17:41.966: [ COMMCRS][1472985408]clsc_receive: (0x2aaaac1428c0) error 2

2013-12-10 09:50:23.702: [  CRSRES][1538058560] startRunnable: setting CLI values
2013-12-10 09:50:23.705: [  CRSRES][1538058560] Attempting to start `ora.unid01.vip` on member `UNID02`
2013-12-10 09:50:30.012: [  CRSAPP][1538058560] StartResource error for ora.unid01.vip error code = 1
2013-12-10 09:50:33.198: [  CRSRES][1538058560] Start of `ora.unid01.vip` on member `UNID02` failed.
2013-12-10 09:50:33.204: [  CRSRES][1538058560] CRS-1006: No more members to consider

通过srvctl查看发现UNID02-vip的绑定网卡为eth2,而unid01-vip绑定网卡为eth0.
[oracle@UNID02 ~]$ srvctl config nodeapps -n UNID02 -a -g -s -l
VIP exists.: /UNID02-vip/10.0.15.176/255.255.255.0/eth2
GSD exists.
ONS daemon exists.
Listener exists.
[oracle@UNID02 ~]$ srvctl config nodeapps -n unid01 -a -g -s -l
VIP exists.: /unid01-vip/10.0.15.175/255.255.255.0/eth0
GSD exists.
ONS daemon exists.
Listener exists.

ifconfig查看发现eth0没有开启
[oracle@UNID02 ~]$

[root@UNID02 bin]# ifconfig
eth1      Link encap:Ethernet  HWaddr A4:BA:DB:13:EA:C2 
          inet addr:192.168.127.102  Bcast:192.168.127.255  Mask:255.255.255.0
          inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:53 errors:0 dropped:0 overruns:0 frame:0
          TX packets:43 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8246 (8.0 KiB)  TX bytes:6848 (6.6 KiB)
          Interrupt:122 Memory:d8000000-d8012800

eth2      Link encap:Ethernet  HWaddr A4:BA:DB:13:EA:C4 
          inet addr:10.0.15.172  Bcast:10.0.15.255  Mask:255.255.255.0
          inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5778770 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2798242 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1493987596 (1.3 GiB)  TX bytes:1004608379 (958.0 MiB)
          Interrupt:130 Memory:da000000-da012800

eth2:1    Link encap:Ethernet  HWaddr A4:BA:DB:13:EA:C4 
          inet addr:10.0.15.176  Bcast:10.0.15.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:130 Memory:da000000-da012800
lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:921339 errors:0 dropped:0 overruns:0 frame:0
          TX packets:921339 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:417953992 (398.5 MiB)  TX bytes:417953992 (398.5 MiB)

[root@UNID02 bin]#

咨询系统工程师,告知这台机器之前Public IP使用的是eth0网卡,后来eth0网卡发生了故障,切换到了eth2网卡,原来如此。
有2个解决方法:
1.将unid01-vip修改为eth2
[root@UNID02 ~]$ srvctl modify nodeapps -n unid01 -A 10.0.15.175/255.255.255.0/eth2
再次启动,启动成功。
[oracle@UNID02 ~]$ crs_start ora.unid01.vip
Attempting to start `ora.unid01.vip` on member `UNID02`
Start of `ora.unid01.vip` on member `UNID02` succeeded.

2.因为crs_start会调用racgvip这个脚本启动vip,所以直接修改环境变量,再直接执行sh  racgvip start ora.unid01.vip
[root@UNID02 ~]# export _USR_ORA_VIP=10.0.15.175
[root@UNID02 ~]# export _USR_ORA_NETMASK=255.255.255.0
[root@UNID02 ~]# export _USR_ORA_IF=eth2
[root@UNID02 ~]# export _CAA_NAME=ora.unid01.vip
[root@UNID02 bin]# sh -x racgvip start ora.unid01.vip
+ IFCONFIG=/sbin/ifconfig
+ GREP=/bin/grep
+ SED=/bin/sed
+ RM=/bin/rm
+ MV=/bin/mv
+ UNIQ=/usr/bin/uniq
+ PING=/bin/ping
+ WC=/usr/bin/wc
+ NETSTAT=/bin/netstat
+ AWK=/bin/awk
+ WHOAMI=/usr/bin/whoami
+ CAT=/bin/cat
+ UNAME=/bin/uname
+ SLEEP=/bin/sleep
+ SORT=/bin/sort
+ EXPR=/usr/bin/expr
+ DATE=/bin/date
+ RENICE=/usr/bin/renice
+ MIITOOL=/sbin/mii-tool
+ ARPING=/sbin/arping
+ IPCMD='/sbin/ip -f inet'
+ LANG=C
+ LC_ALL=C
+ export LANG LC_ALL
+ FAIL_WHEN_ALL_LINK_DOWN=1
+ FAIL_WHEN_DEFAULTGW_NOT_FOUND=1
+ DEFAULTGW=
+ /usr/bin/renice -20 -p 15145
++ /bin/hostname
+ HOSTNAME=UNID02
+ PING_TIMEOUT='-w 3 -c 1'
+ PING_COUNT=10
+ LOCKED=0
+ CRS_STAT=/bin/crs_stat
+ CHECK_TIMES=2
+ SUCCESS=0
+ ERROR=1
+ DEFAULT_TIMEOUT=60
+ IP=10.0.15.175
+ MASK=255.255.255.0
+ IF=eth2
+ OP=start
++ /usr/bin/whoami
+ USER=root
++ uname
+ [[ Linux != Linux ]]
+ listif_result=
+ '[' root '!=' root -a start '!=' list ']'
+ '[' -n 10.0.15.175 -a -n 255.255.255.0 ']'
++ IFS=.
++ set 10 0 15 175 255 255 255 0
++ echo 10.0.15.255
+ BROADCAST=10.0.15.255
+ logx 'Broadcast = 10.0.15.255'
+ '[' -n '' ']'
+ '[' start = list ']'
++ echo ora.unid01.vip
++ /bin/sed '-es/^ora\.//;s/\.vip$//'
+ VIP_NAME=unid01
+ NAME=ora.unid01.vip
+ '[' -z ora.unid01.vip ']'
+ IF_USING=
+ '[' -n 10.0.15.175 ']'
+ logx Checking interface existance
+ '[' -n '' ']'
+ logx 'Calling getifbyip'
+ '[' -n '' ']'
++ getifbyip 10.0.15.175
++ __LOCAL_IP=10.0.15.175
++ gf_retif=
++ logx 'getifbyip:  started for 10.0.15.175'
++ '[' -n '' ']'
+++ /sbin/ip -f inet -o addr
+++ /bin/grep 'inet 10.0.15.175/'
+++ /bin/awk '{ print $NF }'
++ gf_retif=
++ logx 'getifbyip:  returning IP '
++ '[' -n '' ']'
++ '[' -z '' ']'
+ LI=
+ logx Completed getifbyip
+ '[' -n '' ']'
+ logx 'Calling getifbyip -a'
+ '[' -n '' ']'
++ getifbyip 10.0.15.175 -a
++ __LOCAL_IP=10.0.15.175
++ gf_retif=
++ logx 'getifbyip:  started for 10.0.15.175'
++ '[' -n '' ']'
+++ /sbin/ip -f inet -o addr
+++ /bin/grep 'inet 10.0.15.175/'
+++ /bin/awk '{ print $NF }'
++ gf_retif=
++ logx 'getifbyip:  returning IP '
++ '[' -n '' ']'
++ '[' -z -a ']'
++ '[' -n '' ']'
+ LI_A=
+ logx Completed getifbyip
+ '[' -n '' ']'
+ '[' '' '!=' '' ']'
+ echo ''
+ /bin/grep -q :
+ '[' 1 -ne 0 ']'
+ '[' start = stop ']'
+ ping_vip 10.0.15.175
+ logx 'ping_vip 10.0.15.175 started'
+ '[' -n '' ']'
+ '[' -n 10.0.15.175 ']'
+ _count=1
+ '[' 1 -le 10 ']'
+ /bin/ping 10.0.15.175 -w 3 -c 1
+ '[' 1 -ne 0 ']'
+ logx 'ping_vip: 10.0.15.175 is not pingable, _count = 1'
+ '[' -n '' ']'
+ return 1
+ '[' 1 -eq 0 ']'
+ logx 'Completed with initial interface test'
+ '[' -n '' ']'
+ case $OP in
+ '[' start = check ']'
+ '[' start = check ']'
+ '[' -n 10.0.15.175 -a -n 255.255.255.0 -a -n eth2 ']'
+ '[' -n '' ']'
+ logx 'Interface tests'
+ '[' -n '' ']'
++ echo eth2
++ /bin/sed '-es/|/ /g'
+ IF=eth2
+ for I in '$IF'
+ '[' eth2 = '' ']'
+ checkIf eth2
+ _IF=eth2
+ _RET=0
+ _LINK_STAT=
+ logx 'checkIf: start for if=eth2'
+ '[' -n '' ']'
+ '[' -z eth2 ']'
+ /sbin/ifconfig eth2
+ /bin/grep -q -w UP
+ '[' 0 -ne 0 ']'
+ '[' -x /sbin/mii-tool ']'
++ /sbin/mii-tool eth2
+ _LINK_STAT='eth2: negotiated 100baseTx-FD flow-control, link ok'
+ '[' 0 -eq 0 ']'
+ echo 'eth2: negotiated 100baseTx-FD flow-control, link ok'
+ /bin/grep -q 'link ok'
+ '[' 0 -eq 0 ']'
+ logx 'checkIf: mii-tool checked if=eth2 ok'
+ '[' -n '' ']'
+ _RET=0
+ '[' -z 'eth2: negotiated 100baseTx-FD flow-control, link ok' ']'
+ '[' 0 -eq 1 ']'
+ logx 'checkIf: end for if=eth2'
+ '[' -n '' ']'
+ return 0
+ '[' 0 -eq 0 ']'
+ getnextli eth2
+ _LOCAL_IF=eth2
+ nextli=
+ _LIN=
+ logx 'getnextli:  started for if=eth2'
+ '[' -n '' ']'
++ listif
++ logx 'listif: starting'
++ '[' -n '' ']'
++ '[' -z '' ']'
+++ /sbin/ip -f inet -o addr
++ /bin/grep eth2:
++ /bin/sed '-es/^.*://'
+++ /bin/awk '{ print $NF }'
++ /bin/sort -n
+++ /bin/grep -vw lo
++ listif_result='eth1
eth2
eth2:1'
++ logx 'listif:  completed with eth1
eth2
eth2:1'
++ '[' -n '' ']'
++ echo 'eth1
eth2
eth2:1'
+ _LIN=1
+ i=1
+ '[' 1 -le 256 ']'
+ _found=0
+ for j in '${_LIN}'
+ '[' 1 -eq 0 ']'
+ '[' 1 -eq 1 ']'
+ _found=1
+ break
+ '[' 1 -eq 0 ']'
+ i=2
+ '[' 2 -le 256 ']'
+ _found=0
+ for j in '${_LIN}'
+ '[' 1 -eq 0 ']'
+ '[' 2 -eq 1 ']'
+ '[' 0 -eq 0 ']'
+ get_lock eth2_2
+ TOUCH=/bin/touch
+ LS=/bin/ls
+ KILL=/bin/kill
+ LOCK=/var/tmp/vip_eth2_2_UNID02.lock
+ /bin/touch /var/tmp/vip_eth2_2_UNID02.lock.15145
+ '[' 0 -ne 0 ']'
++ /bin/ls /var/tmp/vip_eth2_2_UNID02.lock.15145
++ /usr/bin/wc -l
+ '[' 1 -eq 1 ']'
+ logx 'get_lock: lock file /var/tmp/vip_eth2_2_UNID02.lock.15145 is created'
+ '[' -n '' ']'
+ LOCKED=1
+ return 0
+ '[' 0 -eq 0 ']'
+ listif_result=
+ listif
+ logx 'listif: starting'
+ '[' -n '' ']'
+ '[' -z '' ']'
++ /sbin/ip -f inet -o addr
+ /bin/grep -w eth2:2
++ /bin/awk '{ print $NF }'
++ /bin/grep -vw lo
+ listif_result='eth1
eth2
eth2:1'
+ logx 'listif:  completed with eth1
eth2
eth2:1'
+ '[' -n '' ']'
+ echo 'eth1
eth2
eth2:1'
+ '[' 1 -ne 0 ']'
+ break
+ '[' 2 -eq 256 ']'
+ nextli=eth2:2
+ logx 'getnextli:  completed with nextli=eth2:2'
+ '[' -n '' ']'
+ return 2
+ LI=eth2:2
+ /sbin/ifconfig eth2:2 10.0.15.175 netmask 255.255.255.0 broadcast 10.0.15.255 up
+ '[' 0 -ne 0 ']'
+ logx 'Success exit 1'
+ '[' -n '' ']'
+ '[' -n '' ']'
+ /sbin/arping -q -U -c 3 -I eth2 10.0.15.175
+ release_lock
+ '[' 1 = 1 ']'
+ /bin/rm -f /var/tmp/vip_eth2_2_UNID02.lock.15145
+ logx 'release_lock: remove lock file /var/tmp/vip_eth2_2_UNID02.lock.15145'
+ '[' -n '' ']'
+ LOCKED=0
+ exit 0  --返回值为0,启动成功

[root@UNID02 bin]# ifconfig
eth1      Link encap:Ethernet  HWaddr A4:BA:DB:13:EA:C2 
          inet addr:192.168.127.102  Bcast:192.168.127.255  Mask:255.255.255.0
          inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:53 errors:0 dropped:0 overruns:0 frame:0
          TX packets:43 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8246 (8.0 KiB)  TX bytes:6848 (6.6 KiB)
          Interrupt:122 Memory:d8000000-d8012800

eth2      Link encap:Ethernet  HWaddr A4:BA:DB:13:EA:C4 
          inet addr:10.0.15.172  Bcast:10.0.15.255  Mask:255.255.255.0
          inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5778770 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2798242 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1493987596 (1.3 GiB)  TX bytes:1004608379 (958.0 MiB)
          Interrupt:130 Memory:da000000-da012800

eth2:1    Link encap:Ethernet  HWaddr A4:BA:DB:13:EA:C4 
          inet addr:10.0.15.176  Bcast:10.0.15.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:130 Memory:da000000-da012800

eth2:2    Link encap:Ethernet  HWaddr A4:BA:DB:13:EA:C4 
          inet addr:10.0.15.175  Bcast:10.0.15.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:130 Memory:da000000-da012800

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:921339 errors:0 dropped:0 overruns:0 frame:0
          TX packets:921339 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:417953992 (398.5 MiB)  TX bytes:417953992 (398.5 MiB)

[root@UNID02 bin]#

ifconfig查看vip已经启动成功。

Return the outputs if still not working. Then refer the script  /u01/app/11.2.0/grid/bin/racgvip
If there is more than one interfaces, remove the cable on the interface
which VIP is set and run check action, the VIP should be set to another interface.

 

# 1. becomes root user
# 2. set environment variables
#    - _USR_ORA_VIP for VIP address
#    - _USR_ORA_NETMASK for netmask address
#    - _USR_ORA_IF for interface names, they are separated by '|' character
#    - _CAA_NAME for the VIP resource name, ora.<nodename>.vip
# 3. Test list command
#    # sh racgvip list
# 4. Test start command
#    # sh racgvip start
#    # echo $?
#    # ifconfig (to check if the VIP is set)
# 5. Test check command
#    # sh racgvip check
#    # echo $?
# 6. Test stop command
#    # sh racgvip stop
#    # echo $?
#    # ifconfig (to check if the VIP is unset)
# 7. If there is more than one interfaces, remove the cable on the interface
#    which VIP is set and run check action, the VIP should be set to another
#    interface.
#    Note: if cables are pulled from all interfaces or there is only one
#    interface, VIP will stay on the original interface and
#    the script returns success. This behavior is to keep VIP resource
#    from failover if there is a network brown out.
#
#    # sh racgvip check
#    # echo $?
#    # ifconfig (to check if the VIP is set to another interface)


VIP is brought up using /u01/app/11.2.0/grid/bin/racgvip. From the script, it will check the status of the insterface. If it is down then VIP can not be up.

Reviewed the scripts in /u01/app/11.2.0/grid/bin/racgvip:
if [ -z "$_IF" ]
then
echo "checkIf: interface name is NULL"
return 1
fi

# check if ther interface is up
$IFCONFIG $_IF | $GREP -q -w UP
if [ $? -ne 0 ]
then
echo "checkIf: interface $_IF is down"
return 1
fi

相关内容

    暂无相关文章