11gR2 RAC vip和network资源的依赖性与频繁failover

在11gR2中出现了Oracle Network resource类型的CRS resource用以监控Public Network:

 

[root@vrh1 ~]# crsctl status resource ora.net1.network -f
NAME=ora.net1.network
TYPE=ora.network.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=restore
CHECK_INTERVAL=1
CREATION_SEED=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION=Oracle Network resource
ENABLED=1
ID=ora.net1.network
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=60
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=0
TYPE_VERSION=2.2
UPTIME_THRESHOLD=1d
USR_ORA_AUTO=static
USR_ORA_ENV=
USR_ORA_IF=eth0
USR_ORA_NETMASK=255.255.255.0
USR_ORA_SUBNET=192.168.1.0
VERSION=11.2.0.3.0

[root@vrh1 ~]# crsctl status resource ora.vrh1.vip -f        
NAME=ora.vrh1.vip
TYPE=ora.cluster_vip_net1.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CARDINALITY_ID=0
CHECK_INTERVAL=1
CHECK_TIMEOUT=30
CREATION_SEED=34700
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=vip)
DEGREE=1
DESCRIPTION=Oracle VIP resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
GEN_USR_ORA_STATIC_VIP=
GEN_USR_ORA_VIP=
HOSTING_MEMBERS=vrh1
ID=ora.vrh1.vip
LOAD=1
LOGGING_LEVEL=2
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=favored
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=0
SCRIPT_TIMEOUT=60
SERVER_POOLS=*
START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network)
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(ora.net1.network)
STOP_TIMEOUT=0
TYPE_VERSION=2.1
UPTIME_THRESHOLD=1h
USR_ORA_ENV=
USR_ORA_VIP=vrh1-vip
VERSION=11.2.0.3.0

[root@vrh1 ~]# crsctl status res ora.LISTENER.lsnr -f
NAME=ora.LISTENER.lsnr
TYPE=ora.listener.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=%CRS_HOME%/bin/racgwrap%CRS_SCRIPT_SUFFIX%
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME_LOWER_CASE%.LISTENER_%CRS_CSS_NODENAME_UPPER_CASE%.lsnr
AUTO_START=restore
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
CREATION_SEED=35089
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=listener) PROPERTY(LISTENER_NAME=PARSE(%NAME%, ., 2))
DEGREE=1
DESCRIPTION=Oracle Listener resource
ENABLED=1
ENDPOINTS=TCP:1521
ID=ora.LISTENER.lsnr
LOAD=1
LOGGING_LEVEL=5
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=%CRS_HOME%
PORT=1521
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=hard(type:ora.cluster_vip_net1.type) pullup(type:ora.cluster_vip_net1.type)
START_TIMEOUT=180
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:type:ora.cluster_vip_net1.type)
STOP_TIMEOUT=0
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=ORACLE_BASE=/g01/orabase
USR_ORA_OPI=false
VERSION=11.2.0.3.0

 

 

可以看到在11.2 RAC中Listener资源依赖于VIP, 而VIP资源依赖于ora.net1.network;这就造成了当public network短时不可用(或曰network hiccup)时会造成ora.net1.network资源OFFLINE,这就将造成该节点上VIP资源的FAILOVER和LISTENER的OFFLINE。

且由于在11.2上ora.net1.network资源的默认CHECK_INTERVAL=1,即每秒都会对该NETWORK资源进行监控,所以NETWORK资源变得十分敏感,不管是由于硬件网络亦或者较高的主机负载造成短时的Public Network不可用,都可能导致VIP和LISTENER由于NETWORK依赖资源OFFLINE而受到影响。

 

我们来实际体会一下:

 

 

 

[root@vrh1 orarootagent_root]# pwd
/g01/11.2.0/grid/log/vrh1/agent/crsd/orarootagent_root

2012-12-21 02:04:13.600: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:13.600: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.596: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:14.597: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.602: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:15.603: [ora.vrh1.vip][1266022720][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:16.598: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:04:17.594: [ora.vrh1.vip][2281658688][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network

[root@vrh1 orarootagent_root]# lsof |grep orarootagent_root.log
orarootag  3835      root    4w      REG              253,0   9109896              4717997 /g01/11.2.0/grid/log/vrh1/agent/ohasd/orarootagent_root/orarootagent_root.log
orarootag 23559      root    4w      REG              253,0   3166410              4723518 /g01/11.2.0/grid/log/vrh1/agent/crsd/orarootagent_root/orarootagent_root.log

[root@vrh1 orarootagent_root]# ps -ef|egrep "3835|23559"|grep -v grep
root      3835     1  0 Oct23 ?        06:57:18 /g01/11.2.0/grid/bin/orarootagent.bin
root     23559     1  0 Nov20 ?        04:06:24 /g01/11.2.0/grid/bin/orarootagent.bin

 

 

 

 

可以看到如上orarootagent.bin进程每秒都会对VIP的依赖资源ora.net1.network做心跳检测,我们手动在3s内关闭和重启NETWORK资源,并观测结果:

 

 

 

 

2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} ora.net1.network vrh1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} Switching online monitor to offline one
2012-12-21 02:09:05.781: [    AGFW][1400256832] {1:62611:2} Started implicit monitor for [ora.net1.network vrh1 1] interval=60000 delay=60000
2012-12-21 02:09:05.781: [    AGFW][1400256832] {0:1:53734} Generating new Tint for unplanned state change. Original Tint: {1:62611:2}
2012-12-21 02:09:05.781: [    AGFW][1400256832] {0:1:53734} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047944
2012-12-21 02:09:05.781: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776] {1:62611:2} [check] VipAgent::checkIp returned false
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.782: [ora.vrh1.vip][1333139776][ENTER] {1:62611:2} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:05.790: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_PROBE[ora.vrh1.vip 1 1] ID 4097:417234
2012-12-21 02:09:05.791: [    AGFW][1400256832] {0:1:53734} Preparing CHECK command for: ora.vrh1.vip 1 1
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} Preparing START command for: ora.net1.network vrh1 1
2012-12-21 02:09:05.876: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: OFFLINE to: STARTING
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} Agent received the message: RESOURCE_PROBE[ora.scan1.vip 1 1] ID 4097:417241
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} Preparing CHECK command for: ora.scan1.vip 1 1
2012-12-21 02:09:05.878: [ora.net1.network][1266022720] {0:1:53734} [start] (:CLSN00107:) clsn_agent::start {
2012-12-21 02:09:05.878: [ora.net1.network][1266022720] {0:1:53734} [start] NetworkAgent::init enter {
2012-12-21 02:09:05.878: [ora.scan1.vip][2281658688] {0:1:53734} [check] VipAgent::checkIp returned false
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53734} ora.scan1.vip 1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Generating new Tint for unplanned state change. Original Tint: {0:1:53734}
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047956
2012-12-21 02:09:05.878: [    AGFW][1400256832] {0:1:53735} Agent sending last reply for: RESOURCE_PROBE[ora.scan1.vip 1 1] ID 4097:417241
2012-12-21 02:09:06.056: [ora.net1.network][1266022720] {0:1:53734} [start] Checking if eth0 Interface is fine
2012-12-21 02:09:06.176: [ora.net1.network][1266022720] {0:1:53734} [start] synchStatic {
2012-12-21 02:09:06.176: [ora.net1.network][1266022720] {0:1:53734} [start] synchStatic }
2012-12-21 02:09:06.177: [ora.net1.network][1266022720] {0:1:53734} [start] unsynchStatic {
2012-12-21 02:09:06.177: [ora.net1.network][1266022720] {0:1:53734} [start] unsynchStatic }
2012-12-21 02:09:06.347: [ USRTHRD][1333139776] {1:62611:2} Thread:[SendFail2SrvThread:]start {
2012-12-21 02:09:06.348: [ USRTHRD][1333139776] {1:62611:2} Thread:[SendFail2SrvThread:]start }
2012-12-21 02:09:06.348: [    AGFW][1400256832] {1:62611:2} ora.vrh1.vip 1 1 state changed from: ONLINE to: OFFLINE
2012-12-21 02:09:06.348: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.348: [    AGFW][1400256832] {0:1:53736} Generating new Tint for unplanned state change. Original Tint: {1:62611:2}
2012-12-21 02:09:06.348: [    AGFW][1400256832] {0:1:53736} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047960
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776] {0:1:53734} [check] VipAgent::checkIp returned false
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.349: [ora.vrh1.vip][1333139776][ENTER] {0:1:53734} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:09:06.350: [    AGFW][1400256832] {0:1:53734} Agent sending last reply for: RESOURCE_PROBE[ora.vrh1.vip 1 1] ID 4097:417234
2012-12-21 02:09:06.395: [   AGENT][1266022720] {0:1:53734} UserErrorException: Locale is 
2012-12-21 02:09:06.445: [ USRTHRD][1467373888] {1:62611:2} VipAgent::sendFail2Srv {
2012-12-21 02:09:06.535: [ USRTHRD][1467373888] {1:62611:2} VipAgent::sendFail2Srv }
2012-12-21 02:09:06.535: [ USRTHRD][1467373888] {1:62611:2} Thread:[SendFail2SrvThread:]isRunning is reset to false here
2012-12-21 02:09:06.687: [ora.net1.network][1266022720] {0:1:53734} [start] CRS-5008: Invalid attribute value: eth0 for the network interface

2012-12-21 02:09:06.687: [ora.net1.network][1266022720] {0:1:53734} [start] NetworkAgent::init exit }
2012-12-21 02:09:06.687: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]stop {
2012-12-21 02:09:06.687: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]stop }
2012-12-21 02:09:06.725: [ora.net1.network][1266022720] {0:1:53734} [start] Ocr Context init default level 45001104
2012-12-21 02:09:06.726: [ora.net1.network][1266022720] {0:1:53734} [start] Ocr Context init default level 45001104
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [start] (:CLSN00107:) clsn_agent::start }
2012-12-21 02:09:06.836: [    AGFW][1266022720] {0:1:53734} Command: start for resource: ora.net1.network vrh1 1 completed with status: SUCCESS
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::init enter {
2012-12-21 02:09:06.836: [    AGFW][1400256832] {0:1:53734} Agent sending reply for: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237
2012-12-21 02:09:06.836: [ora.net1.network][1266022720] {0:1:53734} [check] Checking if eth0 Interface is fine
2012-12-21 02:09:06.837: [   AGENT][1266022720] {0:1:53734} UserErrorException: Locale is 
2012-12-21 02:09:06.837: [ora.net1.network][1266022720] {0:1:53734} [check] CRS-5008: Invalid attribute value: eth0 for the network interface

2012-12-21 02:09:06.837: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::init exit }
2012-12-21 02:09:07.184: [ora.net1.network][1266022720] {0:1:53734} [check] NetInterface::scheckNetInterface returned 0 
2012-12-21 02:09:07.184: [ora.net1.network][1266022720] {0:1:53734} [check] NetworkAgent::checkInterface returned false
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: STARTING to: OFFLINE
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Switching online monitor to offline one
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Started implicit monitor for [ora.net1.network vrh1 1] interval=60000 delay=60000
2012-12-21 02:09:07.185: [    AGFW][1400256832] {0:1:53734} Agent sending last reply for: RESOURCE_START[ora.net1.network vrh1 1] ID 4098:417237

 

 

可以看到如上在Public Network DOWN之后1s内orarootagent进程即检测到并试图重启ora.net1.network,重启失败后将VIP置为OFFLINE状态,随后LISTENER也被OFFLINE,之后当Public Network恢复后,network、vip、listener相继自动恢复:

 

 

2012-12-21 02:10:07.196: [ora.net1.network][1333139776] {0:1:53734} [check] NetworkAgent::init enter {
2012-12-21 02:10:07.196: [ora.net1.network][1333139776] {0:1:53734} [check] Checking if eth0 Interface is fine
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] ifname=eth0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] subnetmask=255.255.255.0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] subnetnumber=192.168.1.0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] Interface Name = eth0
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] Primary IP = 192.168.1.161
2012-12-21 02:10:07.217: [ora.net1.network][1333139776] {0:1:53734} [check] NetworkAgent::init exit }
2012-12-21 02:10:07.218: [ USRTHRD][1333139776] {0:1:53734} Thread:[VipRelocate:]start {
2012-12-21 02:10:07.218: [ USRTHRD][1333139776] {0:1:53734} Thread:[VipRelocate:]start }
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} ora.net1.network vrh1 1 state changed from: OFFLINE to: ONLINE
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} Switching offline monitor to online one
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53734} Started implicit monitor for [ora.net1.network vrh1 1] interval=1000 delay=1000
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53739} Generating new Tint for unplanned state change. Original Tint: {0:1:53734}
2012-12-21 02:10:07.220: [    AGFW][1400256832] {0:1:53739} Agent sending message to PE: RESOURCE_STATUS[Proxy] ID 20481:24047975
2012-12-21 02:10:07.244: [ USRTHRD][1266022720] {0:1:53734} Relocating Node Vip Resource
2012-12-21 02:10:07.245: [ USRTHRD][1266022720] {0:1:53734} synchStatic {
2012-12-21 02:10:07.245: [ USRTHRD][1266022720] {0:1:53734} synchStatic }
2012-12-21 02:10:07.270: [    GIPC][1266022720] gipcCheckInitialization: possible incompatible non-threaded init from [clsss.c : 5326], original from [clsnOraRootAgent.
cpp : 208]
2012-12-21 02:10:07.278: [ USRTHRD][1266022720] {0:1:53734} unsynchStatic {
2012-12-21 02:10:07.279: [ USRTHRD][1266022720] {0:1:53734} unsynchStatic }
2012-12-21 02:10:07.310: [ USRTHRD][1266022720] {0:1:53734} Thread:[VipRelocate:]isRunning is reset to false here
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} Agent received the message: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} Preparing START command for: ora.vrh1.vip 1 1
2012-12-21 02:10:07.343: [    AGFW][1400256832] {1:62611:962} ora.vrh1.vip 1 1 state changed from: OFFLINE to: STARTING
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] (:CLSN00107:) clsn_agent::start {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[SendFail2SrvThread:]stop {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[SendFail2SrvThread:]stop }
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]stop {
2012-12-21 02:10:07.344: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]stop }
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [start] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] InterfaceName = eth0
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] NetworkMask = 255.255.255.0
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] NetworkAuto = static
2012-12-21 02:10:07.344: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Checking if eth0 Interface is fine
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] ifname=eth0

2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] subnetmask=255.255.255.0
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] subnetnumber=192.168.1.0
2012-12-21 02:10:07.349: [ora.vrh1.vip][1333139776] {1:62611:962} [start] InterfaceName = eth0
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] HostName vrh1-vip translated to 192.168.1.162
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [start] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Interface Name = eth0
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Ip Address = 192.168.1.162
2012-12-21 02:10:07.350: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Utils::getCrsHome crsHome /g01/11.2.0/grid
2012-12-21 02:10:07.351: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Ocr Context init default level -1677304432
2012-12-21 02:10:07.526: [ default][1333139776]ICMP Ping from 192.168.1.161 to 192.168.1.162
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Trying to plumb  192.168.1.162 on eth0
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] VipActions::startIp {
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Adding 192.168.1.162 on eth0:1
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] VipActions::startIp }
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Flushing neighbours ARP Cache
2012-12-21 02:10:08.529: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sCreateSocket {
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sCreateSocket }
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sFlushArpCache {
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Arp::sSend:  sending type 1
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start]  Arp::sFlushArpCache }
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Succeed to bind udp://192.168.1.162:0
2012-12-21 02:10:08.532: [ora.vrh1.vip][1333139776] {1:62611:962} [start] Succeed to bind udp://192.168.1.162:0
2012-12-21 02:10:08.533: [ora.vrh1.vip][1333139776] {1:62611:962} [start] (:CLSN00107:) clsn_agent::start }
2012-12-21 02:10:08.533: [    AGFW][1333139776] {1:62611:962} Command: start for resource: ora.vrh1.vip 1 1 completed with status: SUCCESS
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [    AGFW][1400256832] {1:62611:962} Agent sending reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.534: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} ora.vrh1.vip 1 1 state changed from: STARTING to: ONLINE
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Started implicit monitor for [ora.vrh1.vip 1 1] interval=1000 delay=1000
2012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Agent sending last reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
012-12-21 02:10:08.535: [    AGFW][1400256832] {1:62611:962} Agent sending last reply for: RESOURCE_START[ora.vrh1.vip 1 1] ID 4098:417329
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network
2012-12-21 02:10:09.529: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]start {
2012-12-21 02:10:09.529: [ USRTHRD][1333139776] {1:62611:962} Thread:[StartorphanSrvThread:]start }
2012-12-21 02:10:09.530: [ USRTHRD][1500932416] {1:62611:962} VipAgent::startOrphanSrv {
2012-12-21 02:10:09.555: [ USRTHRD][1500932416] {1:62611:962} VipAgent::startOrphanSrv }
2012-12-21 02:10:09.555: [ USRTHRD][1500932416] {1:62611:962} Thread:[StartorphanSrvThread:]isRunning is reset to false here
2012-12-21 02:10:10.532: [ora.vrh1.vip][1333139776][ENTER] {1:62611:962} [check] START_DEPENDENCIES of type ora.network.type: ora.net1.network

 

 

解决由于Network资源过于敏感导致的不必要的vip和listener的方法:

打补丁12378938 该补丁被包含在” 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above”;并修改vip资源的依赖属性,例如:

 

 

# $GRID_HOME/bin/crsctl modify res ora.scan.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net.network)"
For example:

# /ocw/grid/bin/crsctl modify res ora.scan1.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/crsctl modify res ora.scan2.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/crsctl modify res ora.scan3.vip -attr "STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)"
# /ocw/grid/bin/srvctl stop scan -f
$ /ocw/grid/bin/srvctl start scan_listener

 

 

 

MORE INFO:

 

 

After check timed out, 11gR2 Grid Infrastructure network resource (usually ora.net1.network) goes to INTERMEDIATE state, then goes back to ONLINE very shortly. This note will not discuss cause of check time out, but most common cause is public network hiccup.

Once network resource goes into INTERMEDIATE state, it may trigger VIP/service to be failed over and listener to go offline due to resource dependence, which could result in unnecessary connectivity issue for that period of time. After network resource is back online, affected resources (listener, service or instance) may not come back online.

$GRID_HOME/log//crsd/crsd.log
2011-06-12 07:12:31.261: [ AGFW][10796] {0:1:2881} Received state change for ora.net1.network racnode1 1 [old state = ONLINE, new state = UNKNOWN]
2011-06-12 07:12:31.261: [ AGFW][10796] {0:1:2881} Received state LABEL change for ora.net1.network racnode1 1 [old label = , new label = CHECK TIMED OUT]
..
2011-06-12 07:12:31.297: [ CRSPE][12081] {0:1:2881} RI [ora.net1.network racnode1 1] new external state [INTERMEDIATE] old value: [ONLINE] on racnode1 label = [CHECK TIMED OUT]
..
2011-06-12 07:12:31.981: [ AGFW][10796] {0:1:2882} Received state change for ora.net1.network racnode1 1 [old state = UNKNOWN, new state = ONLINE]
..
2011-06-12 07:12:32.307: [ CRSPE][12081] {0:1:2881} RI [ora.LISTENER.lsnr racnode1 1] new internal state: [STOPPING] old value: [STABLE]
2011-06-12 07:12:32.308: [ CRSPE][12081] {0:1:2881} CRS-2673: Attempting to stop ‘ora.LISTENER.lsnr’ on ‘racnode1’
$GRID_HOME/log//agent/crsd/orarootagent_root/orarootagent_root.log
2011-06-12 07:12:08.965: [ AGFW][2070] {1:27767:2} Created alert : (:CRSAGF00113:) : Aborting the command: check for resource: ora.net1.network racnode1 1
2011-06-12 07:12:08.966: [ora.net1.network][2070] {1:27767:2} [check] clsn_agent::abort {
..
2011-06-12 07:12:31.257: [ AGFW][2070] {1:27767:2} Command: check for resource: ora.net1.network racnode1 1 completed with status: TIMEDOUT
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: ONLINE to: UNKNOWN
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 would be continued to monitored!
2011-06-12 07:12:31.258: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state details has changed from: to: CHECK TIMED OUT
..
2011-06-12 07:12:31.923: [ora.net1.network][2314][F-ALGO] {1:27767:2} CHECK initiated by timer for: ora.net1.network racnode1 1
..
2011-06-12 07:12:31.973: [ora.net1.network][8502][F-ALGO] {1:27767:2} [check] Command check for resource: ora.net1.network racnode1 1 completed with status ONLINE
2011-06-12 07:12:31.978: [ AGFW][2314] {1:27767:2} ora.net1.network racnode1 1 state changed from: UNKNOWN to: ONLINE
$GRID_HOME/log//agent/crsd/oraagent_/oraagent_.log
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} Agent received the message: RESOURCE_STOP[ora.LISTENER.lsnr racnode1 1] ID 4099:14792
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} Preparing STOP command for: ora.LISTENER.lsnr racnode1 1
2011-06-12 07:12:32.335: [ AGFW][2314] {0:1:2881} ora.LISTENER.lsnr racnode1 1 state changed from: ONLINE to: STOPPING

Cause

This is caused by bug 12680491

@ Related issues bug 12378938 bug 11857122 bug 12428737

Solution

The fix of bug 12680491 will add intermediate modifyer to stop dependency between network resource and VIP to avoid unnecessary resource state change, it’s included in 11.2.0.2 GI PSU4, 11.2.0.3 GI PSU3, 11.2.0.3 Windows Patch 7, 11.2.0.4 and above. This fix is recommended instead of fix for bug 12378938 to avoid the issue in first place.

Once patch for this bug is applied, the following needs to be executed to change the dependence for all VIPs:

# $GRID_HOME/bin/crsctl modify res ora..vip -attr “STOP_DEPENDENCIES=hard(intermediate:ora..network)”

For example:

# /ocw/grid/bin/crsctl modify res ora.racnode1.vip -attr “STOP_DEPENDENCIES=hard(intermediate:ora.net1.network)”
Once the attribute is changed, a restart of nodeapps/VIP is needed to be in effect

 


Posted

in

by

Tags:

Comments

One response to “11gR2 RAC vip和network资源的依赖性与频繁failover”

  1. ludatou Avatar
    ludatou

    在10205开始是reply ping是1s

Leave a Reply

Your email address will not be published. Required fields are marked *