到Oracle数据库11g第2版(11.2)的过渡中,Oracle集群做了大量的改变,完全重新设计了CRSD,引进了“本地CRS”(OHASD)和紧密集成的代理层更换RACK层。新的功能,如Grid Naming Service,即插即用,集群时间同步服务和Grid IPC。集群同步服务(CSS)可能是影响最小的变化,但它提供支持新功能的功能,以及添加了新的功能,如支持IPMI。
有了这个技术文件,我们想借此机会,提供所有的我们已经积累了多年的11.2的发展技术,并将其转发给那些刚刚开始学习Oracle 11.2集群的人。本文提供了总体概述,以及相关的诊断和调试的详细信息。
1. Oracle 集群架构
1.1 守护进程(Daemons)和代理(agent)概述
Oracle高可用服务守护进程 (OHASD)
Oracle集群由两个独立的堆栈组成。上层Cluster Ready Services守护进程(CRSD)堆栈和下层Oracle High Availability Services守护进程(ohasd)堆栈。这两个堆栈有促进集群操作几个进程。下面的章节将详细介绍这些内容。
OHASD入口点是/etc/inittab文件,其执行/etc/init.d/ohasd和/etc/init.d/init.ohasd。/etc/init.d/ohasd脚本是包含开始和停止操作的RC脚本。/etc/init.d/init.ohasd脚本是OHASD框架控制脚本将生成Grid_home/bin/ ohasd.bin可执行文件。
集群控制文件位于/ etc/ ORACLE / scls_scr/<hostname>/root(这是Linux的位置),并维护CRSCTL;换句话说,一个“crsctl enable / disable crs”命令将更新该目录中的文件。
[root@rac1 root]# ls /etc/oracle/scls_scr/rac1/root crsstart ohasdrun ohasdstr
# crsctl enable -h Usage: crsctl enable crs Enable OHAS autostart on this server
# crsctl disable –h Usage: crsctl disable crs Disable OHAS autostart on this server |
scls_scr/<hostname>/root/ohasdstr文件的内容是控制CRS堆栈的自动启动;文件中的两个可能的值是“enable” – 启用自动启动,或者“disable” – 禁用自动启动。
scls_scr/<hostname>/root/ohasdrun文件控制init.ohasd脚本。三个可能的值是“reboot” – 和OHASD同步,“restart” – 重启崩溃的OHASD,“stop” – 计划OHASD关机。
Oracle 11.2集群有OHASD最大的好处是在一个集群的方式运行某些CRSCTL命令的能力。命令是完全独立于操作系统,因此他们只能靠ohasd。如果ohasd正在运行,则远程操作,如启动,停止和检查远程节点的堆栈状态都是可以执行的。
- crsctl check cluster
- crsctl start cluster
- crsctl stop cluster
[root@rac2 bin]# ./crsctl stop cluster
CRS-2673: Attempting to stop ‘ora.crsd’ on ‘rac2’ CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on ‘rac2’ CRS-2673: Attempting to stop ‘ora.OCR_VOTEDISK.dg’ on ‘rac2’ CRS-2673: Attempting to stop ‘ora.registry.acfs’ on ‘rac2’ 。。。。。。。。。。。。。
[root@rac2 bin]# ./crsctl start cluster CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘rac2’ CRS-2676: Start of ‘ora.cssdmonitor’ on ‘rac2’ succeeded CRS-2672: Attempting to start ‘ora.cssd’ on ‘rac2’ CRS-2672: Attempting to start ‘ora.diskmon’ on ‘rac2’ CRS-2676: Start of ‘ora.diskmon’ on ‘rac2’ succeeded 。。。。。。。。。。。。。
[root@rac2 bin]# ./crsctl check cluster CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online |
OHASD Resource Dependency(OHASD资源依赖)
Oracle 11.2的集群堆栈由OHASD守护进程启动,这本身是由一个启动了的节点的/etc/init.d/init.ohasd脚本产生的。另外用’CRSCTL stop CRS后用‘CRSCTL start CRS‘,ohasd开始运行的节点上。然后OHASD守护进程将启动其他守护进程和代理。每个集群守护进程由存储在OLR的OHASD资源表示。下面的图表显示了OHASD资源/集群守护程序和各自的代理进程和所有者的关系。
Resource Name | Agent Name | Owner |
ora.gipcd | oraagent | crs user |
ora.gpnpd | oraagent | crs user |
ora.mdnsd | oraagent | crs user |
ora.cssd | cssdagent | Root |
ora.cssdmonitor | cssdmonitor | Root |
ora.diskmon | orarootagent | Root |
ora.ctssd | orarootagent | Root |
ora.evmd | oraagent | crs user |
ora.crsd | orarootagent | Root |
ora.asm | oraagent | crs user |
ora.driver.acfs | orarootagent | Root |
ora.crf (new in | orarootagent | root |
Daemon Resources(守护进程资源)
[grid@rac1 admin]$ crsctl stat res -init -t
——————————————————————————- NAME TARGET STATE SERVER STATE_DETAILS ——————————————————————————- Cluster Resources ——————————————————————————- ora.asm 1 ONLINE ONLINE rac1 Started ora.cluster_interconnect.haip 1 ONLINE ONLINE rac1 ora.crf 1 ONLINE OFFLINE ora.crsd 1 ONLINE ONLINE rac1 。。。。。。 |
[grid@rac1 admin]$ crsctl stat type -init
TYPE_NAME=application BASE_TYPE=cluster_resource
TYPE_NAME=cluster_resource BASE_TYPE=resource
TYPE_NAME=generic_application BASE_TYPE=cluster_resource
TYPE_NAME=local_resource BASE_TYPE=resource
TYPE_NAME=ora.asm.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.crf.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.crs.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.cssd.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.cssdmonitor.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.ctss.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.daemon.type BASE_TYPE=cluster_resource
TYPE_NAME=ora.diskmon.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.drivers.acfs.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.evm.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.gipc.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.gpnp.type BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.haip.type BASE_TYPE=cluster_resource
TYPE_NAME=ora.mdns.type BASE_TYPE=ora.daemon.type
用ora.cssd资源作为一个例子,所有的ora.cssd属性可以使用crsctl stat res ora.cssd –init –f显示。(列出一部分比较重要的)
[grid@rac1 admin]$ crsctl stat res ora.cssd -init -f
NAME=ora.cssd TYPE=ora.cssd.type STATE=ONLINE TARGET=ONLINE ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r–,user:grid:r-x AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX% CHECK_INTERVAL=30 CLEAN_ARGS=abort CLEAN_COMMAND= CREATION_SEED=6 CSSD_MODE= CSSD_PATH=%CRS_HOME%/bin/ocssd%CRS_EXE_SUFFIX% CSS_USER=grid ID=ora.cssd LOGGING_LEVEL=1 START_DEPENDENCIES=weak(concurrent:ora.diskmon)hard(ora.cssdmonitor,ora.gpnpd,ora.gipcd)pullup(ora.gpnpd,ora.gipcd) STOP_DEPENDENCIES=hard(intermediate:ora.gipcd,shutdown:ora.diskmon,intermediate:ora.cssdmonitor) |
[root@rac2 bin]# ./crsctl set log res ora.cssd:3 -init
Set Resource ora.cssd Log Level: 3 |
[root@rac2 bin]# ./crsctl get log res ora.cssd -init
Get Resource ora.cssd Log Level: 3 |
[root@rac2 bin]# ./crsctl stat res ora.cssd -init -f | grep LOGGING_LEVEL
Oracle 11.2集群引入了一个新概念,代理,这使得Oracle集群更强大和高性能。这些代理是多线程的守护进程,实现多个资源类型的入口点和为不同的用户生成新流程。代理是高可用的,此外oraagent,orarootagent和cssdagent/ cssdmonitor,可以有一个应用程序代理和脚本代理。
两个主要代理是oraagent和orarootagent。 ohasd和CRSD各使用一个oraagent和一个orarootagent。如果CRS用户和Oracle用户不同,那么CRSD将利用两个oraagent和一个orarootagent。
ohasd’s oraagent:
- 实现对asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd的启动/停止/检查/清除操作。
crsd’s oraagent:
- 实现对asm, ora.eons, ora.LISTENER.lsnr, SCAN listeners, ora.ons的启动/停止/检查/清除操作
- 实现对服务, 数据库和磁盘组的启动/停止/检查/清除操作
- Receives eONS events, and translates and forwards them to interested clients (eONS will be removed and its functionality included in EVM in 2.0.2)
- Receives CRS state change events and dequeues RLB events and enqueues HA events for OCI and ODP.NET clients
ohasd’s orarootagent:
- 实现对ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs, ora.crf (的启动/停止/检查/清除操作。
crsd’s orarootagent:
- 实现对GNS, VIP, SCAN VIP和网络资源的启动/停止/检查/清除操作。
cssdagent / cssdmonitor
请参照章节: “cssdagent and cssdmonitor”.
Application agent / scriptagent
请参照章节:“application and scriptagent”.
Agent Log Files
ohasd/crsd代理的日志放在Grid_home/log/<hostname>/agent/ {ohasd|crsd}/ <agentname>_<owner>/ <agentname>_<o wner>.log.例如,ora.crsd是ohasd管理属于root用户,那么代理的日志名字为:
[grid@rac2 orarootagent_root]$ ls /u01/app/11.2.0/grid/log/rac2/agent/ohasd/orarootagent_root
orarootagent_root.log orarootagent_rootOUT.log orarootagent_root.pid |
Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>/<agentna me>_<owner>OUT.log
<timestamp>:[<component>][<thread id>]…
<timestamp>:[<component>][<thread id>][<entry point>]…
2016-04-01 13:39:23.070: [ora.drivers.acfs][3027843984]{0:0:2} [check] execCmd ret = 0
[ clsdmc][3015236496]CLSDMC.C returnbuflen=8, extraDataBuf=A6, returnbuf=8D33FD8 2016-04-01 13:39:24.201: [ora.ctssd][3015236496]{0:0:213} [check] clsdmc_respget return: status=0, ecode=0, returnbuf=[0x8d33fd8], buflen=8 2016-04-01 13:39:24.201: [ora.ctssd][3015236496]{0:0:213} [check] translateReturnCodes, return = 0, state detail = OBSERVERCheckcb data [0x8d33fd8]: mode[0xa6] offset[343 ms]. |
请记住,一个代理日志文件将包含多个资源的启动/停止/检查。以crsd orarootagent资源名称”ora.rac2.vip”为例。
[root@rac2 orarootagent_root]# grep ora.rac2.vip orarootagent_root.log
。。。。。。。。 2016-04-01 12:30:33.606: [ora.rac2.vip][3013606288]{2:57434:199} [check] Failed to check on eth0 2016-04-01 12:30:33.607: [ora.rac2.vip][3013606288]{2:57434:199} [check] (null) category: 0, operation: , loc: , OS error: 0, other: 2016-04-01 12:30:33.607: [ora.rac2.vip][3013606288]{2:57434:199} [check] VipAgent::checkIp returned false 。。。。。。。。。。。。。 |
– 集群监听线程(CLT) – 试图在启动时连接到所有远程节点,接收和处理所有收到的消息,并响应其他节点的连接请求。每当从节点收到一个数据包,监听重置该节点漏掉的统计数量。
– 发送线程(ST) -专门每秒发送一次网络心跳(NHB)到所有节点,和使用grid IPC(GIPC)每秒发送一次当地的心跳(LHB)到cssdagent和cssdmonitor。
– 投票线程(PT) – 监视远程节点的NHB的。如果CSS守护进程之间的通信通道发生故障时,心跳会被错过。如果某一个节点有太多的心跳信号被错过了,它被怀疑是关闭或断开。重新配置的线程会被唤醒,重新配置将发生,并最终将一个节点驱逐。
- 重新配置管理线程
– 发现进程 -发现投票文件
– 避开线程 – 用于I / O防护diskmon进程通信,如果使用EXADATA。
– 磁盘ping线程(每个投票的文件)
– kill block线程 – (每个投票的文件)监控投票文件可用性,以确保足够可访问的投票文件的数量。如果使用的Oracle冗余,我们需要配置多数投票磁盘在线。
– 工作线程 – (里新增加的,每个投票文件)各种I / O在投票文件。
– 磁盘Ping监视器 – 监视器的I / O投票文件状态
此监视线程,确保磁盘ping线程正确地读取多数投票配置文件里的kill blocks。如果我们不能对投票文件进行I/O操作,由于I / O挂起或I / O故障或其他原因,我们把这个投票文件设置离线。该线程监视磁盘ping线程。如果CSS是无法读取多数投票的文件,它可能不再获得至少一个盘在所有的节点上。这个节点有可能会错过的驱逐通知;换句话说,CSS是不能够进行合作,并必须被终止。
其他线程- Occasionally
– 节点杀死线程 – (瞬时的)用于通过IPMI杀死节点
– 成员杀死线程 – (瞬时的)杀成员期间使用
本地杀死线程 – 当一个CSS客户端开始杀死成员,当地CSS杀死线程将被创建
– SKGXN监视器(skgxnmon只出现在供应商集群)
相应的命令来更改节点固定行为(固定或不固定任何特定节点),是crsctl pin/unpin的CSS命令。固定节点是指节点名称与节点号码的关联是固定的。如果一个节点不固定,如果租赁到期时,节点号可能会改变。一个固定节点的租约永不过期。用crsctl delete node命令删除一个节点隐含取消节点固定。
– 在Oracle集群升级,所有服务器都固定,而经过Oracle集群的全新安装11g第2版(11.2),您添加到集群中的所有服务器都不固定。
– 在安装了11.2集群的服务器上有比11.2早版本的实例,那么您无法取消固定。
该CSS层是使用新的通信层Grid PC(GIPC),它仍然支持11.2之前使用CLSC通信层。在11.2.0.2,GIPC将支持的使用多个NIC的单个通信链路,例如CSS / NM间的通信。
2009-11-24 03:46:21.110
[crsd(27731)]CRS-2757:Command ‘Start’ timed out waiting for response from the resource ‘ora.stnsp006.vip’. Details at (:CRSPE00111:) in
2009-11-24 03:58:07.375
[cssd(27413)]CRS-1605:CSSD voting file is online: /dev/sdj2; details in
在Oracle集群11g第2版(11.2)集群独占模式是一个新的概念。此模式将允许您在一个节点上启动堆栈无需其他跟多的堆栈启动。投票文件不是必需的,不需要的网络连接。此模式用于维护或故障定位。因为这是一个用户调用命令确保在同一时刻只有一个节点是开启的。在独占模式下root用户在某一个节点上使用crsctl start crs –excl命令启动堆栈。
识别投票文件的方法在11.2已经改变。投票文件在11.1和更早版本里的OCR配置,在11.2投票文件通过在GPNP配置文件中的CSS文件投票字符串的发现位置。 例如:
CSS voting file discovery string referring to ASM
<orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/>
<orcl:ASM-Profile id=”asm” DiscoveryString=”” SPFile=””/>
CSS voting file discovery string referring to list of LUN’s/disks
在下面的例子中,CSS文件投票字符串发现其实是指磁盘/ LUN列表中。这可能是配置在块设备或设备使用非默认位置。在这种情况下,对于CSS VF发现字符串与ASM发现字符串的值是相同的。
<orcl:CSS-Profile id=”css” DiscoveryString=”/dev/shared/sdsk-a[123]-*-part8″ LeaseDuration=”400″/>
<orcl:ASM-Profile id=”asm” DiscoveryString=”/dev/shared/sdsk-a[123]-*-part8 SPFile=””/>
如下 – lease expiry time = last DHB time + lease duration(租约到期时间=最后一次DHB时间+租约期限)。
– 固定租约
– 不固定租约
[cssd(8433)]CRS-1707:Lease acquisition for node staiv10 number 5 completed
Network Heartbeat (NHB)
# crsctl set log css ocssd:3
# tail -f ocssd.log | grep -i misstime
2009-10-22 06:06:07.275: [ ocssd][2840566672]clssnmPollingThread: node 2, stnsp006, ninfmisstime 270, misstime 270, skgxnbit 4, vcwmisstime 0, syncstage 0 2009-10-22 06:06:08.220: [ ocssd][2830076816]clssnmHBInfo: css timestmp 1256205968 220 slgtime 246596654 DTO 28030 (index=1) biggest misstime 220 NTO 28280 2009-10-22 06:06:08.277: [ ocssd][2840566672]clssnmPollingThread: node 2, stnsp006, ninfmisstime 280, misstime 280, skgxnbit 4, vcwmisstime 0, syncstage 0 2009-10-22 06:06:09.223: [ ocssd][2830076816]clssnmHBInfo: css timestmp 1256205969 223 slgtime 246597654 DTO 28030 (index=1) biggest misstime 1230 NTO 28290 2009-10-22 06:06:09.279: [ ocssd][2840566672]clssnmPollingThread: node 2, stnsp006, ninfmisstime 270, misstime 270, skgxnbit 4, vcwmisstime 0, syncstage 0 2009-10-22 06:06:10.226: [ ocssd][2830076816]clssnmHBInfo: css timestmp 1256205970 226 slgtime 246598654 DTO 28030 (index=1) biggest misstime 2785 NTO 28290 |
要显示当前misscount设置的值,使用命令crsctl get css misscount。我们不支持misscount设置默认值以外的值。
[grid@rac2 ~]$ crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services. |
Disk Heartbeat (DHB)
关于“太长”的定义取决于对DHB下列情形。首先,长期磁盘I / O超时(LIOT),其中有一个默认的200秒的设定。如果我们不能在时间内完成一个投票文件内的I / O,我们将此投票文件脱机。其次,短期磁盘I / O超时(SIOT),其中CSS集群重新配置过程中使用。SIOT是有关misscount(misscount(30) – reboottime(3)=27秒)。默认重启时间为3秒。要显示CSS 的disktimeout参数的值,使用命令crsctl get css disktimeout。
[grid@rac2 ~]$ crsctl get css disktimeout
CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services. |
当最近DHB和最后NHB的时间戳之间的差是大于SIOT (misscount – reboottime),一个节点被认为仍然活跃。
– 通过网络发送驱逐消息。在大多数情况下,这将失败,因为现有的网络故障。
– 通过投票的文件
– 通过IPMI,如果支持和配置
Nodes A and B receive each other’s heartbeats
Nodes C and D receive each other’s heartbeats
Nodes A and B cannot see heartbeats of C or D
Nodes C and D cannot see heartbeats of A or B
Nodes A and B are one cohort, C and D are another cohort
Split begins when 2 cohorts stop receiving NHB’s from each other
在这种情况下,CSS使用投票文件和DHB解决脑裂。kill block,是投票文件结构的一个组成部分,将更新和用于通知已被驱逐的节点。每个节点每一秒读取它的kill block,当另一个节点已经更新kill block后,就会自杀。
Member Kill Architectur
在11.2.0.1中kill守护程序是一个没有杀死CSS组的成员的权利。它是由在I/ O客户端加入组OCSSD库代码催生,并在需要时重生。每个用户有一个杀守护进程(oclskd)(例如crsowner,oracle)。
client_listener – receives group join and kill requests
peer_listener – receives kill requests from remote nodes
death_check – provides confirmation of termination
member_kill – spawned to manage a member kill request
local_kill – spawned to carry out member kills on local node
node termination – spawned to carry out escalation
Member kills are issued by clients who want to eliminate group members doing IO, for example:
LMON of the ASM instance
LMON of a database instance
crsd on Policy Engine (PE) master node (new in 11.2)
与杀守护进程运行实时线程cssdagent / cssdmonitor(,有更高的机会杀死请求成功,尽管高系统负载。
2009-10-21 12:22:03.613810 : kjxgrKillEM: schedule kill of inst 2 inc 20
in 20 sec
2009-10-21 12:22:03.613854 : kjxgrKillEM: total 1 kill(s) scheduled kgxgnmkill: Memberkill called – group: DBPOMMI, bitmap:1
2009-10-21 12:22:22.151: [ CSSCLNT]clssgsmbrkill: Member kill request: Members map 0x00000002
2009-10-21 12:22:22.152: [ CSSCLNT]clssgsmbrkill: Success from kill call rc 0
2009-10-21 12:22:22.151: [ ocssd][2996095904]clssgmExecuteClientRequest: Member kill request from client (0x8b054a8)
2009-10-21 12:22:22.151: [ ocssd][2996095904]clssgmReqMemberKill: Kill requested map 0x00000002 flags 0x2 escalate 0xffffffff
2009-10-21 12:22:22.152: [ ocssd][2712714144]clssgmMbrKillThread: Kill requested map 0x00000002 id 1 Group name DBPOMMI flags 0x00000001 start time 0x91794756 end time 0x91797442 time out 11500 req node 2
DBPOMMI is the database group where LMON registers as primary member time out = misscount (in milliseconds) + 500ms
map = 0x2 = 0010 = second member = member 1 (other example: map = 0x7 = 0111 = members 0,1,2)
2009-10-21 12:22:22.201: [ ocssd][3799477152]clssgmmkLocalKillThread: Local kill requested: id 1 mbr map 0x00000002 Group name DBPOMMI flags 0x00000000 st time 1088320132 end time 1088331632 time out 11500 req node 2
2009-10-21 12:22:22.201: [ ocssd][3799477152]clssgmmkLocalKillThread: Kill requested for member 1 group (0xe88ceda0/DBPOMMI)
2009-10-21 12:22:22.201: [ ocssd][3799477152]clssgmUnreferenceMember: global grock DBPOMMI member 1 refcount is 7
2009-10-21 12:22:22.201: [ ocssd][3799477152]GM Diagnostics started for mbrnum/grockname: 1/DBPOMMI
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DBPOMMI, member 1 (client
0xe330d5b0, pid 23929)
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DBPOMMI, member 1 (client 0xe331fd68, pid 23973) sharing group DBPOMMI, member 1, share type normal
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0x89f7858, pid 23957) sharing group DBPOMMI, member 1, share type xmbr
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DBPOMMI, member 1 (client 0x8a1e648, pid 23949) sharing group DBPOMMI, member 1, share type normal
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DBPOMMI, member 1 (client 0x89e7ef0, pid 23951) sharing group DBPOMMI, member 1, share type normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DBPOMMI, member 1 (client 0xe8aabbb8, pid 23947) sharing group DBPOMMI, member 1, share type normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0x8a23df0, pid 23949) sharing group DG_LOCAL_POMMIDG, member 0, share type normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0x8a25268, pid 23929) sharing group DG_LOCAL_POMMIDG, member 0, share type normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0x89e9f78, pid 23951) sharing group DG_LOCAL_POMMIDG, member 0, share type normal
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsnkillagent_main:killreq received:
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23929
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23973
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23957
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23949
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23951
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23947
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23949
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23929
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23951
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23947
2009-10-21 12:22:33.655: [ ocssd][2712714144]clssgmMbrKillThread: Time up:
Start time -1854322858 End time -1854311358 Current time -1854311358 timeout 11500
2009-10-21 12:22:33.655: [ ocssd][2712714144]clssgmMbrKillThread: Member kill request complete.
2009-10-21 12:22:33.655: [ ocssd][2712714144]clssgmMbrKillSendEvent: Missing answers or immediate escalation: Req member 2 Req node 2 Number of answers expected 0 Number of answers outstanding 1
2009-10-21 12:22:33.656: [ ocssd][2712714144]clssgmQueueGrockEvent: groupName(DBPOMMI) count(4) master(0) event(11), incarn 0, mbrc 0, to member 2, events 0x68, state 0x0
2009-10-21 12:22:33.656: [ ocssd][2712714144]clssgmMbrKillEsc: Escalating node
1 Member request 0x00000002 Member success 0x00000000 Member failure 0x00000000 Number left to kill 1
2009-10-21 12:22:33.656: [ ocssd][2712714144]clssnmKillNode: node 1 (staiu02) kill initiated
2009-10-21 12:22:33.656: [ ocssd][2712714144]clssgmMbrKillThread: Exiting
2009-10-21 12:22:33.705: [ ocssd][3799477152]clssgmmkLocalKillThread: Time up.
Timeout 11500 Start time 1088320132 End time 1088331632 Current time 1088331632
2009-10-21 12:22:33.705: [ ocssd][3799477152]clssgmmkLocalKillResults: Replying to kill request from remote node 2 kill id 1 Success map 0x00000000 Fail map 0x00000000
2009-10-21 12:22:33.705: [ ocssd][3799477152]clssgmmkLocalKillThread: Exiting
2009-10-21 12:22:34.679: [
ocssd][3948735392](:CSSNM00005:)clssnmvDiskKillCheck: Aborting, evicted by node 2, sync 151438398, stamp 2440656688
2009-10-21 12:22:34.679: [ ocssd][3948735392]###################################
2009-10-21 12:22:34.679: [ ocssd][3948735392]clssscExit: ocssd aborting from thread clssnmvKillBlockThread
2009-10-21 12:22:34.679: [ ocssd][3948735392]###################################
2009-10-21 12:22:22.151: [ocssd][2996095904]clssgmExecuteClientRequest: Member kill request from client (0x8b054a8)
<search backwards to when client registered>
2009-10-21 12:13:24.913: [ocssd][2996095904]clssgmRegisterClient:
proc(22/0x8a5d5e0), client(1/0x8b054a8)
<search backwards to when process connected to ocssd>
2009-10-21 12:13:24.897: [ocssd][2996095904]clssgmClientConnectMsg: Connect from con(0x677b23) proc(0x8a5d5e0) pid(20485/20485) version 11:2:1:4, properties: 1,2,3,4,5
$ ps -ef|grep ora_lmon
spommere 20485 1 0 01:46 ? 00:01:15 ora_lmon_pommi_3
智能平台管理接口 (IPMI)
智能平台管理接口(IPMI),今天是包含在许多服务器的行业标准管理协议。 IPMI独立于操作系统系统,如果系统不通电也能工作。IPMI服务器包含一个基板管理控制器(BMC),其用于与服务器通信(BMC)。
为了支持会员杀死升级为终止节点,您必须配置和使用一个外部机制能够重启问题节点, 或从Oracle集群或从运行的操作系统的配置和使用能够重新启动该节点。IPMI是这样的机制,从11.2开始支持。通常情况下,在安装的过程中配置IPMI。如果在安装过程中没有配置IPMI,则可以在CRS的安装完成后用CRSCTL配置。
About Node-termination Escalation with IPMI
To use IPMI for node termination, each cluster member node must be equipped with a Baseboard Management Controller (BMC) running firmware compatible with IPMI version 1.5, which supports IPMI over a local area network (LAN). During database operation, member-kill escalation is accomplished by communication from the evicting ocssd daemon to the victim node’s BMC over LAN. The IPMI over LAN protocol is carried over an authenticated session protected by a user name and password, which are obtained from the administrator during installation. If the BMC IP addresses are DHCP assigned, ocssd requires direct communication with the local BMC during CSS startup. This is accomplished using a BMC probe command (OSD), which communicates with the BMC through an IPMI driver, which must be installed and loaded on each cluster system.
OLR Configuration for IPMI
There are two ways to configure IPMI, either during the Oracle Clusterware installation via the Oracle Universal Installer or afterwards via crsctl.
OUI – asks about node-fencing via IPMI
tests for driver to enable full support (DHCP addresses)
obtains IPMI username and password and configures OLR on all cluster nodes
Manual configuration – after install or when using static IP addresses for BMCs
crsctl query css ipmidevice
crsctl set css ipmiadmin <ipmi-admin>
crsctl set css ipmiaddr
参见: Oracle Clusterware Administration and Deployment Guide, “Configuration and Installation for Node Fencing” for more information and Oracle Grid Infrastructure Installation Guide, “Enabling Intelligent Platform Management Interface (IPMI)”
# crsctl set log css CSSD:N (where N is the logging level)
Logging level 2 = 默认的
Logging level 3 =详细信息,显示各个心跳信息包括misstime,有助于调试NHB的相关问题。
Logging level 4 = 超级详细
大多数问题在级别2就能解决了,有一些需要级别3,很少需要级别4. 使用3或4级,跟踪信息可能只保持几个小时(甚至分钟),因为跟踪文件可以填满和信息可以被覆盖。请注意,日志级别高会造成性能影响ocssd由于数量的跟踪。如果你需要保持更长一段时间的数据,创建一个cron作业来备份和压缩CSS日志。
# crsctl set log res ora.cssd=2 -init
# crsctl set log res ora.cssdmonitor=2 -init
# crsctl modify resource ora.cssd -attr “ENV_OPTS=DEV_ENV” -init
# crsctl modify resource ora.cssdmonitor -attr “ENV_OPTS=DEV_ENV” –init
[root@rac2 bin]# ./crsctl lsmodules css
List CSSD Debug Module: CLSF List CSSD Debug Module: CSSD List CSSD Debug Module: GIPCCM List CSSD Debug Module: GIPCGM List CSSD Debug Module: GIPCNM List CSSD Debug Module: GPNP List CSSD Debug Module: OLR List CSSD Debug Module: SKGFD |
CLSF and SKGFD – 关于仲裁盘的I/O
CSSD – same old one
GIPCCM – gipc communication between applications and CSS
GIPCGM – communication between peers in the GM layer
GIPCNM – communication between nodes in the NM layer
GPNP – trace for gpnp calls within CSS
OLR – trace for olr calls within CSS
# crsctl set log css GIPCCM=1,GIPCGM=2,GIPCNM=3
# crsctl get log css CSSD=4
# crsctl get log ALL
# crsctl get log css GIPCCM
CSSDAGENT and CSSDMONITOR几乎提供相同的功能。cssdagent启动,停止,检查ocssd守护进程状态。cssdmonitor监控cssdagent。没有ora.cssdagent资源,也不是ocssd守护进程的资源。
另外,cssdagent 和cssdmonitor提供下面的服务来确保数据完整性:
监控ocssd,如果ocssd失败,那么cssd* 重启节点
为了启动ocssd代理调试,可以用crsctl set log res ora.cssd:3 –init命令。这个操作的日志记录在Grid_home/log/<hostname>/agent/ohasd/oracssdagent_root/oracssdagent_root.log和更多跟踪信息写在oracssdagent_root.log里。
2009-11-25 10:00:52.386: [ AGFW][2945420176] Agent received the message: RESOURCE_MODIFY_ATTR[ora.cssd 1 1] ID 4355:106099
2009-11-25 10:00:52.387: [ AGFW][2966399888] Executing command:
res_attr_modified for resource: ora.cssd 1 1
2009-11-25 10:00:52.387: [ USRTHRD][2966399888] clsncssd_upd_attr: setting trace to level 3
2009-11-25 10:00:52.388: [ CSSCLNT][2966399888]clssstrace: trace level set to 2 2009-11-25 10:00:52.388: [ AGFW][2966399888] Command: res_attr_modified for resource: ora.cssd 1 1 completed with status: SUCCESS
2009-11-25 10:00:52.388: [ AGFW][2945420176] Attribute: LOGGING_LEVEL for
resource ora.cssd modified to: 3
2009-11-25 10:00:52.388: [ AGFW][2945420176] config version updated to : 7 for ora.cssd 1 1
2009-11-25 10:00:52.388: [ AGFW][2945420176] Agent sending last reply for: RESOURCE_MODIFY_ATTR[ora.cssd 1 1] ID 4355:106099
2009-11-25 10:00:52.484: [ CSSCLNT][3031063440]clssgsgrpstat: rc 0, gev 0, incarn
2, mc 2, mast 1, map 0x00000003, not posted
1.4.11 概念
Disk HeartBeat (DHB) 磁盘心跳,定期的写在投票文件里,一秒钟一次
Network HeartBeat (NHB)网络心跳,每一秒钟发送一次到其他节点上
Local HeartBeat (LHB)本地心跳,每一秒钟一次发送到代理或监控
ocssd 线程
Sending Thread (ST) 同一时间发送网络心跳和本地心跳
Disk Ping thread 每一秒钟把磁盘心跳写到投票文件里
Cluster Listener (CLT) 接收其他节点发送过来的消息,主要是网络心跳
HeartBeat thread (HBT)从ocssd接收本地心跳和检测连接失败
OMON thread (OMT) 监控连接失败
OPROCD thread (OPT) 监控agent/moniter调度进程
VMON thread (VMT)取代clssvmon可执行文件,注册在skgxn组供应商集群软件
Misscount (MC) 一个节点在被删除之前没有网络心跳的时间
Network Time Out (NTO) 一个节点在被删除之前没有网络心跳的最大保留时间
Disk Time Out (DTO) 大多数投票文件被认为是无法访问的最大时间
ReBoot Time (RBT) 允许重新启动的时间,默认是三秒钟。
Misscount, SIOT, RBT
Disk I/O Timeout amount of time for a voting file to be offline before it is unusable
SIOT – Short I/O Timeout, in effect during reconfig
LIOT – Long I/O Timeout, in effect otherwise
Long I/O Timeout – (LIOT)通过crsctl set css disktimeout配置超时时间,默认200秒。
Short I/O Timeout (SIOT) is (misscount – reboot time)
In effect when NHB’s missed for misscount/2
ocssd terminates if no DHB for SIOT
Allows RBT seconds after termination for reboot to complete
Disk Heartbeat Perceptions
Other node perception of local state in reconfig
No NHB for misscount, node not visible on network
No DHB for SIOT, node not alive
If node alive, wait full misscount for DHB activity to be missing, i.e. node not alive
As long as DHB’s are written, other nodes must wait
Perception of local state by other nodes must be valid to avoid data corruption
Disk Heartbeat Relevance
DHB only read starting shortly before a reconfig to remove the node is started
When no reconfig is impending, the I/O timeout not important, so need not be monitored
If the disk timeout expires, but the NHB’s have been sent to and received from other nodes, it will still be misscount seconds before other nodes will start a reconfig
The proximity to a reconfig is important state information for OPT
Time Of Day Clock (TODC) the clock that indicates the hour/minute/second of the day (may change as a result of commands)
aTODC is the agent TODC
cTODC is the ocssd TODC
Invariant Time Clock (ITC) a monotonically increasing clock that is invariant i.e. does not change as a result of commands). The invariant clock does not change if time set backwards or forwards; it is always constant.
aITC is the agent ITC
cITC is the ocssd ITC
ocssd state information contains the current clock information, the network time out (NTO) based on the node with the longest time since the last NHB and a disk I/O timeout based on the amount of time since the majority of voting files was last online. The sending thread gathers this current state information and sends both a NHB and local heartbeat to ensure that the agent perception of the aliveness of ocssd is the same as that of other nodes.
The cluster listener thread monitors the sending thread. It ensures the sending thread has been scheduled recently and wakes up if necessary. There are enhancements here to ensure that even after clock shifts backwards and forwards, the sending thread is scheduled accurately.
There are several agent threads, one is the oprocd thread which just sleeps and wakes up periodically. Upon wakeup, it checks if it should initiate a reboot, based on the last known ocssd state information and the local invariant time clock (ITC). The wakeup is timer driven. The heartbeat thread is just waiting for a local heartbeat from the ocssd. The heartbeat thread will calculate the value that the oprocd thread looks at, to determine whether to reboot. It checks if the oprocd thread has been awake recently and if not, pings it awake. The heartbeat thread is event driven and not timer driven.
当ocssd失败, 启动文件系统同步。有大量的时间来做到这一点,我们可以等待几秒钟同步。最后当地心跳表明我们可以等多久,等待事件基于misscount。当等待时间超时了,oprocd会重启这个节点。大多数情况下,诊断数据会写到磁盘里。在极少数的情况下,如因为CSS夯住同步还没执行才会没写到磁盘。
集群就绪服务 (CRS)
Policy Engine
grep “PE MASTER” Grid_home/log/hostname/crsd/crsd.*
crsd.log:2010-01-07 07:59:36.529: [ CRSPE][2614045584] PE MASTER NAME: staiv13
- The Policy Engine (a.k.a PE/CRSPE in logs)负责所有的策略决定。
- The Agent Proxy Server (a.k.a Proxy/AGFW in logs) 负责管理代理,和Policy Engine 与代理之间的代理命令/事件。
- The UI Server (a.k.a UI/UiServer in logs)负责管理客户端连接和PE与客户端的程序的代理。
- The OCR/OLR module (OCR in logs) 是所有OCR/OLR 交互的前端。
- The Reporter module (CRSRPT in logs) 负责输出CRSD的所有事件
CRSCTL UI Server PE OCR Module PE Reporter (event publishing)
Proxy (to notify the agent)
Resource Instances & IDs
在11.2中,CRS模块支持资源多样性的两个概念:基数和程度。In 11.2, CRS modeling supports two concepts of resource multiplicity: cardinality and degree. The former controls the number of nodes where the resource can run concurrently while the latter controls the number of instances of the resource that can be run on each node. To support the concepts, the PE now distinguishes between resources and resource instances. The former can be seen as a configuration profile for the entire resource while the latter represents the state data for each instance of the resource. For example, a resource with CARDINALITY=2, DEGREE=3 will have 6 resource instances. Operations that affect resource state (start/stopping/etc.) are performed using resource instances. Internally, resource instances are referred to with IDs which following the following format: “<A> <B>
<C>” (note space separation), where <A> is the resource name, <C> is the degree of the instance (mostly 1), and <B> is the cardinality of the instance for cluster_resource resources or the name of the node to which the instance is assigned for local_resource names. That’s why resource name have “funny” decorations in logs:
[ CRSPE][2660580256] {1:25747:256} RI [r1 1 1] new target state: [ONLINE] old
value: [OFFLINE]
Log Correlation
CRSD is event-driven in nature. Everything of interest is an event/command to process. Two kinds of commands are distinguished: planned and unplanned. The former are usually administrator-initiated (add/start/stop/update a resource, etc.) or system-initiated (resource auto start at node reboot, for instance) actions while the latter are normally unsolicited state changes (a resource failure, for example). In either case, processing such events/commands is what CRSD does and that’s when module interaction takes place. One can easily follow the interaction/processing of each event in the logs, right from the point of origination (say from the UI module) through to PE and then all the way to the agent and back all the way using the concept referred to as a “tint”. A tint is basically a cluster-unique event ID of the following format: {X:Y:Z}, where X is the node number, Y a node-unique number of a process where the event first entered the system, and Z is a monotonically increasing sequence number, per process. For instance, {1:25747:254} is a tint for the 254th event that originated in some process internally referred to us 25747 on node number 1. Tints are new in and can be seen in CRSD/OHASD/agent logs. Each event in the system gets assigned a unique tint at the point of entering the system and modules prefix each log message while working on the event with that tint.
例如,在3节点的集群,node0是PE,在node1上执行“crsctl start resource r1 –n node2”,恰好如上面的图形,将会在日志里产生下面信息:
2009-12-29 17:07:24.742: [UiServer][2689649568] {1:25747:256} Container [ Name: UI_START
2009-12-29 17:07:24.742: [UiServer][2689649568] {1:25747:256} Sending message to PE. ctx= 0xa3819430
节点0上的CRSD日志(with PE master)
2009-12-29 17:07:24.745: [ CRSPE][2660580256] {1:25747:256} Cmd : 0xa7258ba8 :
2009-12-29 17:07:24.745: [ CRSPE][2660580256] {1:25747:256} Processing PE
command id=347. Description: [Start Resource : 0xa7258ba8]
2009-12-29 17:07:24.748: [ CRSPE][2660580256] {1:25747:256} RI [r1 1 1] new
target state: [ONLINE] old value: [OFFLINE]
2009-12-29 17:07:24.748: [ CRSOCR][2664782752] {1:25747:256} Multi Write Batch
2009-12-29 17:07:24.753: [ CRSPE][2660580256] {1:25747:256} Sending message to
agfw: id = 2198
2009-12-29 17:07:24.763: [ AGFW][2703780768] {1:25747:256} Agfw Proxy Server
received the message: RESOURCE_START[r1 1 1] ID 4098:2198
2009-12-29 17:07:24.767: [ AGFW][2703780768] {1:25747:256} Starting the agent:
/ade/agusev_bug/oracle/bin/scriptagent with user id: agusev and incarnation:1
节点2上的代理日志 (代理执行启动命令)
2009-12-29 17:07:25.120: [ AGFW][2966404000] {1:25747:256} Agent received the
message: RESOURCE_START[r1 1 1] ID 4098:1459
2009-12-29 17:07:25.122: [ AGFW][2987383712] {1:25747:256} Executing command:
start for resource: r1 1 1
2009-12-29 17:07:26.990: [ AGFW][2987383712] {1:25747:256} Command: start for
resource: r1 1 1 completed with status: SUCCESS
2009-12-29 17:07:26.991: [ AGFW][2966404000] {1:25747:256} Agent sending reply
for: RESOURCE_START[r1 1 1] ID 4098:1459
2009-12-29 17:07:27.514: [ AGFW][2703780768] {1:25747:256} Agfw Proxy Server
received the message: CMD_COMPLETED[Proxy] ID 20482:2212
2009-12-29 17:07:27.514: [ AGFW][2703780768] {1:25747:256} Agfw Proxy Server
replying to the message: CMD_COMPLETED[Proxy] ID 20482:2212
节点0上的CRSD 日志(收到回复信息,通知通讯员并返回给UI服务器,通讯员发布信息到EVM)
2009-12-29 17:07:27.012: [ CRSPE][2660580256] {1:25747:256} Received reply to
action [Start] message ID: 2198
2009-12-29 17:07:27.504: [ CRSPE][2660580256] {1:25747:256} RI [r1 1 1] new
external state [ONLINE] old value: [OFFLINE] on agusev_bug_2 label = []
2009-12-29 17:07:27.504: [ CRSRPT][2658479008] {1:25747:256} Sending UseEvm mesg
2009-12-29 17:07:27.513: [ CRSPE][2660580256] {1:25747:256} UI Command [Start
Resource : 0xa7258ba8] is replying to sender.
2009-12-29 17:07:27.525: [UiServer][2689649568] {1:25747:256} Container [ Name:
r1: TextMessage[0]
2009-12-29 17:07:27.526: [UiServer][2689649568] {1:25747:256} Done for
The above demonstrates the ease of following distributed processing of a single request across 4 processes on 3 nodes by using tints as a way to filter, extract, group and correlate information pertaining to a single event across a plurality of diagnostic logs.
1.6 Grid Plug and Play (GPnP)
1.6.1 GPnP Configuration
GPnP钱包只是一个二进制blob,包含公共/私有RSA密钥, 用于登录和验证GPnP概要文件。钱夹对于所有的GPnP是相同的,在安装数据库软件时创建,不会更改且永远的活着的。
一个典型的配置文件将包含以下信息。永远不会直接改变XML文件; 通过使用支持工具,比如ASMCA,asmcd,oifcfg等等。来修改GPnP的配置信息。
不建议用GPnP 工具来修改GPnP配置文件,要修改配置文件需要很多步骤。如果添加了无效的信息,那么就会弄坏配置文件,并后续会产生问题。
# gpnptool get
Warning: some command line parameters were defaulted. Resulting command line:
/scratch/grid_home_11.2/bin/gpnptool.bin get -o-
<?xml version=”1.0″ encoding=”UTF-8″?><gpnp:GPnP-Profile Version=”1.0″
xmlns:gpnp=”http://www.grid- pnp.org/2005/11/gpnp-profile”
xmlns:orcl=”http://www.oracle.com/gpnp/2005/11/gpnp- profile”
xsi:schemaLocation=”http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd”
ProfileSequence=”4″ ClusterUId=”0cd26848cf4fdfdebfac2138791d6cf1″
ClusterName=”stnsp0506″ PALocation=””><gpnp:Network-Profile><gpnp:HostNetwork
id=”gen” HostName=”*”><gpnp:Network id=”net1″ IP=”″ Adapter=”eth0″
Use=”public”/><gpnp:Network id=”net2″ IP=”″ Adapter=”eth2″
Profile id=”css” DiscoveryString=”+asm”
LeaseDuration=”400″/><orcl:ASM-Profile id=”asm”
<ds:Signature xmlns:ds=”http://www.w3.org/2000/09/xmldsig#”>
<ds:SignedInfo><ds:CanonicalizationM ethod
Algorithm=”http://www.w3.org/2000/09/xmldsig#rsa-sha1″/><ds:Reference URI=””>
<ds:Transforms><ds:Transform Algorithm=”http://www.w3.org/2000/09/xmldsig#enveloped-signature”/>
<ds:Transform Algorithm=”http://www.w3.org/2001/10/xml-exc-c14n#”>
<InclusiveNamespaces xmlns=”http://www.w3.org/2001/10/xml-exc-c14n#”
PrefixList=”gpnp orcl xsi”/></ds:Transform></ds:Transforms>
<ds:DigestMethod Algorithm=”http://www.w3.org/2000/09/xmldsig#sha1″/><ds:DigestValue>ORAmrPMJ/plFtG Tg/mZP0fU8ypM=</ds:DigestValue>
K u7QBc1/fZ/RPT6BcHRaQ+sOwQswRfECwtA5SlQ2psCopVrO6XJV+BMJ1UG6sS3vuP7CrS8LXrOTyoIxSkU 7xWAIB2Okzo/Zh/sej5O03GAgOvt+2OsFWX0iZ1+2e6QkAABHEsqCZwRdI4za3KJeTkIOPliGPPEmLuImu
1.6.2 GPnP 守护进程
- detects running gpnpd, connects back to oraagent 查找运行的GPnPD,连接返回给oraagent
- opens wallet/profile 打开钱夹/配置文件
- opens local/remote endpoints 打开本地/远程节点
- advertises remote endpoint with mdnsd mdnsd通知远程节点
- starts OCR availability check 启动OCR可用性检查
- discovers remote gpnpds 发现远程GPnPD
- equalizes profile 平等的概要文件
- starts to service clients 开始服务客户端
1.6.3 GPnP CLI Tools
- crsctl replace discoverystring
- oifcfg getif / setif
- ASM – srvctl or sqlplus changing the spfile location or the ASM disk discoverystring
注意,参数文件的改变会系列化整个集群的CSS锁(bug 7327595)。
Oracle GPnP Tool Usage:
“gpnptool <verb> <switches>”, where verbs are:
create Create a new GPnP Profile
edit Edit existing GPnP Profile
getpval Get value(s) from GPnP Profile
get Get profile in effect on local node
rget Get profile in effect on remote GPnP node put Put profile as a current best
find Find all RD-discoverable resources of given type
lfind Find local gpnpd server
check Perform basic profile sanity checks
c14n Canonicalize, format profile text (XML C14N)
sign Sign/re-sign profile with wallet’s private key
unsign Remove profile signature, if any
verify Verify profile signature against wallet certificate
help Print detailed tool help
ver Show tool version
1.6.4 Debugging and Troubleshooting
为了获取更多的日志和跟踪文件,可以设置环境变量GPNP_TRACELEVEL 范围为0-6。GPnP跟踪文件在:
Grid_home/log/<hostname>/alert*, Grid_home/log/<hostname>/client/gpnptool*, other client logs Grid_home/log/<hostname>/gpnpd|mdnsd/* Grid_home/log/<hostname>/agent/ohasd/oraagent_<username>/*
Grid_home/gpnp/<hostname>/* [profile+wallet]
如果GPnP 安装失败,应该进行下面失败场景的检查:
- 不能创建配置文件,钱夹?不能访问配置文件或钱夹? [gpnpd is dead, stack is dead] (bug:8609709,bug:8445816)
- 配置文件里配置错误或者少配置了(例如没有discovery string, 没有interconnect, 太多 interconnects)? [gpnpd is up, stack is dead – e.g. no voting files, no interconnects]
- 无法传播集群范围内的配置文件?[gpnpd daemons are not communicating, no put]
- mdnsd是否在运行?GPnPD无法注册到mdnsd上?Discovery 失败? [no put, rget]
- 是否是gpnpd 死掉/或者没运行? [no get, immediately fails]
- 是否gpnpd没有全部启动[no get, no put, client spins in retries, times out]
- 发现假的节点是否是集群的成员?D [no put, can block gpnpd dispatch]
- 是否ocssd 没有启动? [no put]
- OCR 已经启动,但是是失败的 [gpnpd dispatch can block, client waits in receive until OCR recovers]
上面的解决的所有第一步都应该先活动守护进程的日志文件并通过crsctl stat res –init –t检查资源的状态。
- 检查GPnP配置的有效性和检查GPnP日志里的错误信息。可以通过gpnptool check or gpnptool verify进行清晰的检查
# gpnptool check -\
Profile cluster=”stnsp0506″, version=4
GPnP profile signed by peer, signature valid.
Got GPnP Service current profile to check against.
Current GPnP Service Profile cluster=”stnsp0506″, version=4
Error: profile version 4 is older than- or duplicate of- GPnP Service current profile version 4.
Profile appears valid, but push will not succeed.
# gpnptool verify Oracle GPnP Tool
verify Verify profile signature against wallet certificate Usage:
“gpnptool verify <switches>”, where switches are:
-p[=profile.xml] GPnP profile name
-w[=file:./] WRL-locator of OracleWallet with crypto keys
-wp=<val> OracleWallet password, optional
-wu[=owner] Wallet certificate user (enum: owner,peer,pa)
-t[=3] Trace level (min..max=0..7), optional
-f=<val> Command file name, optional
-? Print verb help and exit
– 如果GPnPD服务在本地,可以用gpnptool lfind进行检查
# gpnptool lfind
Success. Local gpnpd found.
‘gpnptool get’ 可以返回本地配置文件的信息。如果gpnptool lfind|get夯住了,从客户端 夯住的信息和GPnPD日志在Grid_home/log/<hostname>/gpnpd,将会对进一步解决问题有很大的帮助。
– 检查远程GPnPD是响应的,’find’选项将很有帮助:
# gpnptool find -h=stnsp006
Found 1 instances of service ‘gpnp’. mdns:service:gpnp._tcp.local.://stnsp006:17452/agent=gpnpd,cname=stnsp0506
,host=stnsp006,pid=13133/gpnpd h:stnsp006 c:stnsp0506
Grid_home/log/<hostname>/mdnsd/*.log files和 gpnpd日志。
– 检查所有节点都是响应的,运行gpnptool find –c=<clustername>
# gpnptool find -c=stnsp0506
Found 2 instances of service ‘gpnp’. mdns:service:gpnp._tcp.local.://stnsp005:23810/agent=gpnpd,cname=stnsp0506
,host=stnsp005,pid=12408/gpnpd h:stnsp005 c:stnsp0506 mdns:service:gpnp._tcp.local.://stnsp006:17452/agent=gpnpd,cname=stnsp0506,host=stnsp006,pid=13133/gpnpd h:stnsp006 c:stnsp0506
1.7 Oracle网格名称服务(GNS)
DHCP提供了动态配置的主机IP地址,但是不能提供一个好的外部客户端使用的名字,因此在混合服务器已经很罕见了。在Oracle 11.2集群,提供了我们的服务来解析名称解决这个问题,和DNS的连接客户是可见的。
- 需要在集群中一个公共网络的静态地址使用GNS VIP。
- 配置高级别的DNS来代表集群中的一个子区域
- 在公共网络上有一个DHCP提供动态地址
- 一个正在运行的集群正确配置GNS
# Delegate to gns on strdv0108
strdv0108-gns.mycorp.com NS strdv0108.mycorp.com
#Let the world know to go to the GNS vip strdv0108.mycorp.com
在这里,子区域是strdv0108.mycorp.com,GNS VIP 已经分配了的名称是strdv0108-gns.us.mycorp.com(对应于一个静态IP地址),GNS守护进程将监听默认端口53。
注意:这并不是建立一个地址的名字strdv0108.mycorp.com,创建了一种解析子区域中名字的方法,比如clusterNode1- VIP.strdv0108.mycorp.com。
- 每个主机的一个IP地址(节点VIP)
- 每个集群单个IP作为SCAN
GNS VIP 不能从DHCP获取,因为它必须提前知道,因此必须静态分配。
- the interface on the subnet is (netmask 255.252.0)
- the addresses allowed to be served are through 228.215.254
- the gateway is 228.212.1
- the domain the machines will reside in for DNS purposes is mycorp.com
/etc/dhcp.conf 将包含类似的信息:
subnet netmask
default-lease-time 43200;
max-lease-time 86400;
option subnet-mask;
option broadcast-address;
option routers;
option domain-name-servers M.N.P.Q, W.X.Y.Z; option domain-name “strdv0108.mycorp.com”; pool
options attempts: 2
options timeout: 1
search us.mycorp.com mycorp.com
/etc/nsswitch.conf控制名称服务查找顺序。在一些系统上配置,网络信息系统可能在解析Oracle SCAN时产生错误。建议在搜索列表里添加NIS 条目。
hosts: files dns nis
请参阅:Oracle Grid Infrastructure Installation Guide,
“DNS Configuration for Domain Delegation to Grid Naming Service” for more information.
在11.2 GNS由集群代理orarootagent管理。这个代理启动,停止和检查DNS。GNS添加到OCR和GNS添加到集群的信息通过srvctl add gns –d <mycluster.company.com>命令。
- The GNS Server
<Time stamp>: [GNS][Thread ID]<Thread name>::<function>:<message>
2009-09-21 10:33:14.344: [GNS][3045873888] Resolve::clsgnmxInitialize: initializing mutex 0x86a7770 (SLTS 0x86a777c).
- The GNS Agent
#grep -i ‘updat.*gns’
orarootagent_root.log:2009-10-07 10:17:23.513: [ora.gns.vip] [check] Updating GNS with stnsp0506-gns-vip
orarootagent_root.log:2009-10-07 10:17:23.540: [ora.scan1.vip] [check] Updating GNS with stnsp0506-scan1-vip
orarootagent_root.log:2009-10-07 10:17:23.562: [ora.scan2.vip] [check] Updating GNS with stnsp0506-scan2-vip
orarootagent_root.log:2009-10-07 10:17:23.580: [ora.scan3.vip] [check] Updating GNS with stnsp0506-scan3-vip
orarootagent_root.log:2009-10-07 10:17:23.597: [ora.stnsp005.vip] [check] Updating GNS with stnsp005-vip
orarootagent_root.log:2009-10-07 10:17:23.615: [ora.stnsp006.vip] [check] Updating GNS with stnsp006-vip
- Command Line Interface
# srvctl {start|stop|modify|etc.} gns …
启动 gns
# srvctl start gns [-l <log_level>] – where –l is the level of logging that GNS should run with.
# srvctl stop gns
# srvctl modify gns -N <name> -A <address>
- Debugging GNS
默认的GNS服务日志级别是0,我们可以通过ps –ef | grep gnsd.bin简单查看:
/scratch/grid_home_11.2/bin/gnsd.bin -trace-level 0 -ip-address – startup-endpoint ipc://GNS_stnsp005_31802_429f8c0476f4e1
调试GNS服务是可能需要提高日志级别。必须先通过srvctl stop gns命令停掉GNS服务,并通过srvctl start gns –v –l 5重启。只有root用户可以停止和启动GNS。
Usage: srvctl start gns [-v] [-l <log_level>] [-n <node_name>]
-v Verbose output
-l <log_level> Specify the level of logging that GNS should run with.
-n <node_name> Node name
-h Print usage
在11.2.0.1由于8705125bug,在初始化安装后默认的GNS服务日志级别是6。用‘srvctl stop / start’命令停掉和重启GNS,把日志级别设成0,这只需要停止和启动gnsd.bin,不会对正在运行的集群产生其他影响。
- srvctl stop gns
- srvctl start gns –l 0
用srvctl 可用查看当前GNS配置
srvctl config gns –a GNS is enabled.
GNS is listening for DNS server requests on port 53 GNS is using port 5353 to connect to mDNS
GNS status: OK
Domain served by GNS: stnsp0506.oraclecorp.com GNS version:
GNS VIP network: ora.net1.network
从11.2.0.2开始,使用-l 选项对调试GNS很有帮助。
1.8 Grid Interprocess Communication(GIPC)
Grid 进程间的通讯是一个普通的通讯设施用来替代CLSC/NS.他提供一个完全的控制从操作系统到任何客户端的通讯堆栈。在11.2之前依赖NS已经撤掉了,但是为了往下兼容,存在CLSC客户端(主要是从11.1开始)。
The requirement for the same interfaces to exist with the same name on all nodes is more relaxed, as long as communication will be established.在GPnP配置文件里关于私人和公共的网络连接配置:
<gpnp:Network id=”net1″ IP=”″ Adapter=”eth0″ Use=”public”/>
<gpnp:Network id=”net2″ IP=”″ Adapter=”eth2″ Use=”cluster_interconnect”/>
用crsctl设置不同组件的GIPC trace级别。
# crsctl set log css COMMCRS:abcd
- a denotes the trace level for NM
- b denotes the trace level for GM
- c denotes the trace level for GIPC
- d denotes the trace level for PROC
# crsctl set log css COMMCRS:2242
为所有的组件打开GIPC跟踪((NM, GM,等等),设置:
# crsctl set log css COMMCRS:3 or
# crsctl set log css COMMCRS:4
Another option is to set a pair of environment variables for the component using GIPC as communication e.g. ocssd. In order to achieve this, a wrapper script is required. Taking ocssd as an example, the wrapper script is Grid_home/bin/ocssd that invokes ‘ocssd.bin’. Adding the variables below to the wrapper script (under the LD_LIBRARY_PATH) and restarting ocssd will enable GIPC tracing. To restart ocssd.bin, perform a crsctl stop/start cluster.
case `/bin/uname` in Linux)
LD_LIBRARY_PATH=/scratch/grid_home_11.2/lib export LD_LIBRARY_PATH
export GIPC_FIELD_LEVEL=0x80
# forcibly eliminate LD_ASSUME_KERNEL to ensure NPTL where available
export LD_ASSUME_KERNEL LOGGER=”/usr/bin/logger”
if [ ! -f “$LOGGER” ];then
LOGMSG=”$LOGGER -puser.err”
GIPC_TRACE_LEVEL=3 (valid range [0-6])
GIPC_FIELD_LEVEL=0x80 (only 0x80 is supported)
2009-10-23 05:47:40.952: [GIPCMUX][2993683344]gipcmodMuxCompleteSend: [mux] Completed send req 0xa481c0e0 [00000000000093a6] { gipcSendRequest : addr ”, data 0xa481c830, len 104, olen 104, parentEndp 0x8f99118, ret gipcretSuccess (0), objFlags 0x0, reqFlags 0x2 }
2009-10-23 05:47:40.952: [GIPCWAIT][2993683344]gipcRequestSaveInfo: [req]
Completed req 0xa481c0e0 [00000000000093a6] { gipcSendRequest : addr ”, data 0xa481c830, len 104, olen 104, parentEndp 0x8f99118, ret gipcretSuccess (0), objFlags 0x0, reqFlags 0x4 }
其他的如CRS/EVM/OCR/CTSS 从11.2.0.2开始使用GIPC。设置GIPC跟踪日志级别对于调试连接问题将很重要。
1.9 Cluster time synchronization service daemon (CTSS):
The CTSS is a new feature in Oracle Clusterware 11g release 2 (11.2), which takes care of time synchronization in a cluster, in case the network time protocol daemon is not running or is not configured properly.
The CTSS synchronizes the time on all of the nodes in a cluster to match the time setting on the CTSS master node. When Oracle Clusterware is installed, the Cluster Time Synchronization Service (CTSS) is installed as part of the software package. During installation, the Cluster Verification Utility (CVU) determines if the network time protocol (NTP) is in use on any nodes in the cluster. On Windows systems, CVU checks for NTP and Windows Time Service.
If Oracle Clusterware finds that NTP is running or that NTP has been configured, then NTP is not affected by the CTSS installation. Instead, CTSS starts in observer mode (this condition is logged in the alert log for Oracle Clusterware). CTSS then monitors the cluster time and logs alert messages, if necessary, but CTSS does not modify the system time. If Oracle Clusterware detects that NTP is not running and is not configured, then CTSS designates one node as a clock reference, and synchronizes all of the other cluster member time and date settings to those of the clock reference.
Oracle Clusterware considers an NTP installation to be misconfigured if one of the following is true:
- NTP is not installed on all nodes of the cluster; CVU detects an NTP installation by a configuration file, such as conf
- The primary and alternate clock references are different for all of the nodes of the cluster
- The NTP processes are not running on all of the nodes of the cluster; only one type of time synchronization service can be active on the
To check whether CTSS is running in active or observer mode run crsctl check ctss
CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
CRS-4701: The Cluster Time Synchronization Service is in Active mode. CRS-4702: Offset from the reference node (in msec): 100
The tracing for the ctssd daemon is written to the octssd.log. The alert log (alert<hostname>.log) also contains information about the mode in which CTSS is running.
[ctssd(13936)]CRS-2403:The Cluster Time Synchronization Service on host node1 is in observer mode.
[ctssd(13936)]CRS-2407:The new Cluster Time Synchronization Service reference node
is host node1.
[ctssd(13936)]CRS-2401:The Cluster Time Synchronization Service started on host node1.
- CVU checks
There are pre-install CVU checks performed automatically during installation, like: cluvfy stage –pre crsinit <>
This step will check and make sure that the operating system time synchronization software (e.g. NTP) is either properly configured and running on all cluster nodes, or on none of the nodes.
During the post-install check, CVU will run cluvfy comp clocksync –n all. If CTSS is in observer mode, it will perform a configuration check as above. If the CTSS is in active mode, we verify that the time difference is within the limit.
- CTSS resource
When CTSS comes up as part of the clusterware startup, it performs step time sync, and if everything goes well, it publishes its state as ONLINE. There is a start dependency on ora.cssd but note that it has no stop dependency, so if for some reasons (maybe faulted CTSSD), CTSSD dumps core or exits, nothing else should be affected.
The chart below shows the start dependency build on ora.ctssd for other resources.
crsctl stat res ora.ctssd -init –t
1.10 mdnsd
Debugging mdnsd
In order to capture mdnsd network traffic, use the mDNS Network Monitor located in
# mkdir Grid_home/log/$HOSTNAME/netmon
# Grid_home/bin/oranetmonitor &
The output from oranetmonitor will be captured in netmonOUT.log in the above directory.
2. Voting Files and Oracle Cluster Repository Architecture
在ASM上存储OCR和voting files 消除了第三方卷管理器和消除了安装Oracle集群时的OCR和投票文件复杂的磁盘分区。
2.1 Voting File in ASM
在一个磁盘组上能够存放的投票文件的数量依赖于你的ASM磁盘组冗余 。
- 一个磁盘组外部冗余,只能存放一个投票文件
- 一个磁盘组标准冗余,可以存放三个投票文件
- 一个磁盘组高度冗余:可以存放五个投票文件
By default, Oracle ASM puts each voting file in its own failure group within the disk group. A failure group is a subset of the disks in a disk group, which could fail at the same time because they share hardware, e.g. a disk controller. The failure of common hardware must be tolerated. For example, four drives that are in a single removable tray of a large JBOD (Just a Bunch of Disks) array are in the same failure group because the tray could be removed, making all four drives fail at the same time. Conversely, drives in the same cabinet can be in multiple failure groups if the cabinet has redundant power and cooling so that it is not necessary to protect against failure of the entire cabinet. However, Oracle ASM mirroring is not intended to protect against a fire in the computer room that destroys the entire cabinet. If voting files stored on Oracle ASM with Normal or High redundancy, and the storage hardware in one failure group suffers a failure, then if there is another disk available in a disk group in an unaffected failure group, Oracle ASM recovers the voting file in the unaffected failure group.
2.2 Voting File Changes
- The voting files formation critical data are stored in the voting file and not in the OCR anymore. From a voting file perspective, the OCR is not touched at all. The critical data each node must agree on to form a cluster are e.g. misscount and the list of voting files configured。
- In Oracle Clusterware 11g release 2 (11.2), it is no longer necessary to back up the voting disk. The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk that is being added. If all voting disks are corrupted, however, you can restore them as described in the Oracle Clusterware Administration and Deployment
- New blocks added to the voting files are the voting file identifier block (needed for voting file stored in ASM), and it contains the cluster GUID and the file UID. The committed and pending configuration incarnation number (CCIN and PCIN) contain this formation critical
- 查询投票文件的配置和用crsctl query cssvotedisk查询配置文件的位置:
$ crsctl query css votedisk
## STATE | File Universal Id | File Name | Disk group |
— —– | —————– | ——— | ———- |
- ONLINE 3e1836343f534f51bf2a19dff275da59 (/dev/sdf10) [DATA]
- ONLINE 138cbee15b394f3ebf57dbfee7cec633 (/dev/sdg11) [DATA] 3. ONLINE 462722bd24c94f70bf4d90539c42ad4c (/dev/sdu12) [DATA] Located 3 voting file(s).
- 投票文件存放在ASM里
o 投票文件存放在ASM里,一个现存的投票文件损坏可能会自动删除和添加回去。
- 投票文件可以从/迁移到NAS/ASM和从ASM迁移到ASM,例如
$ crsctl replace css votedisk /nas/vdfile1 /nas/vdfile2 /nas/vdfile3
$ crsctl replace css votedisk +OTHERDG
- 如果索引的投票文件都损坏了,你可以用下面的方法恢复。如果因为投票文件丢失而使集群已经关闭并无法重启,你必须赢独占模式启动CSS,并输入下面命令替换投票文件:
- # crsctl start crs –excl (on one node only) o # crsctl delete css votedisk FUID
- # crsctl add css votedisk path_to_voting_disk
假如是扩展的oracle集群/扩展的RAC配置, 第三投票文件必须存放在第三方存储上的三个位置防止数据中心宕机。我们支持第三方投票文件在标准的NFS上. 更多信息参考附录 “Oracle Clusterware 11g release 2 (11.2) – Using standard NFS to support a third voting file on a stretch cluster configuration”.
参见: Oracle Clusterware Administration and Deployment Guide, “Voting file, Oracle Cluster Registry, and Oracle Local Registry” for more information. For information about extended clusters and how to configure the quorum voting file see the Appendix.
2.3 Oracle Cluster Registry (OCR)
# cat /etc/oracle/ocr.loc
From a user and maintenance perspective, the rest remains the same. The OCR can only be configured in ASM when the cluster completely migrated to 11.2 (crsctl query crs activeversion >= We still support mixed configurations, so we could have OCR’s stored in ASM and another stored on a supported NAS device, as we support up to 5 OCR locations in We do not support raw or block devices for neither OCR nor voting files anymore.
There are small enhancements in ocrcheck like the –config which is only checking the configuration. Run ocrcheck as root otherwise the logical corruption check will not run. To check OLR data use the –local keyword.
Usage: ocrcheck [-config] [-local]
Shows OCR version, total, used and available space Performs OCR block integrity (header and checksum) checks Performs OCR logical corruption checks (
‘-config’ checks just configuration (11.2) ‘-local’ checks OLR, default OCR
Can be run when stack is up or down
# ocrcheck
Status of Oracle Cluster Registry is as follows: Version : 3
Total space (kbytes) : 262120 Used space (kbytes) : 3072 Available space (kbytes) : 259048 ID : 701301903
Device/File Name : +DATA
Device/File integrity check succeeded Device/File Name : /nas/cluster3/ocr3
Device/File integrity check succeeded Device/File Name : /nas/cluster5/ocr1
Device/File integrity check succeeded Device/File Name : /nas/cluster2/ocr2
Device/File integrity check succeeded Device/File Name : /nas/cluster4/ocr4
Device/File integrity check succeeded
Cluster registry integrity check succeeded Logical corruption check succeeded
2.4 Oracle Local Registry (OLR)
配置信息存放在‘/etc/oracle/olr.loc’ (on Linux)或其他操作系统的类似位置上。在安装好Oracle集群后的默认位置:
- RAC: Grid_home/cdata/<hostname.olr>
- Oracle Restart: Grid_home/cdata/localhost/hostname。
# ocrcheck -local –config
Oracle Local Registry configuration is : Device/File Name : Grid_home/cdata/node1.olr
# ocrdump -local –stdout (or filename)
ocrdump –h to get the usage
参见:Oracle Clusterware Administration and Deployment Guide, “Managing the Oracle Cluster Registry and Oracle Local Registries” for more information about using the ocrconfig and ocrcheck.
2.5 Bootstrap and Shutdown if OCR is located in ASM
OHASD maintains the resource dependency and will bring up ASM with the required diskgroup mounted before it starts CRSD.
Once ASM is up with the diskgroup mounted, the usual ocr* commands (ocrcheck, ocrconfig, etc.) can be used.
asmcmd lsct (v$asm_client)
DB_Name Status Software_Version Compatible_version Instance_Name Disk_Group
asmcmd lsof
DB_Name Instance_Name Path
+ASM +ASM2 +data.255.4294967295
2.6 OCR in ASM diagnostics
- 确认ASM实例是启动的和相应的磁盘组是挂载的,检查log看ASM实例的日志。
- 核实OCR文件创建在磁盘组上了,用asmcmd ls 查看。集群堆栈保持访问OCR文件,大多数时候在log里可能会有CRSD的错误信息。一些错误关于ocr* 命令(如crsd,在客户端经常考虑),将会在Grid_home/log/<hostname>/client目录里产生跟踪文件; 其他情况下,在错误堆栈的头部寻找kgfo / kgfp / kgfn 。
- Confirm that the ASM compatible.asm property of the diskgroup is set to at least
The ASM Diskgroup Resource
典型的ASM alert.log 里的成功/失败和警告信息:
NOTE: diskgroup resource ora.DATA.dg is offline
NOTE: diskgroup resource ora.DATA.dg is online
ERROR: failed to online diskgroup resource ora.DATA.dg
ERROR: failed to offline diskgroup resource ora.DATA.dg
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
This warning may appear when the stack is started WARNING: unknown state for diskgroup resource ora.DATA.dg
“ERROR”: the resource operation failed; check CRSD log and Agent log for more details
“WARNING”: cannot communicate with CRSD.
磁盘组资源的状态和磁盘组时要一致的。在少数情况下,会出现短暂的不同步。执行srvctl让状态同步,或者等待一段时间让代理去刷新状态。如果这个不同步的时间比较长,请检查CRSD 日志和ASM日期看更多的细节信息。
打开更全面的跟踪用事件event=”39505 trace name context forever, level 1“。
2.7 The Quorum Failure Group
FAILGROUP fg1 DISK ‘<a disk in SAN1>’
FAILGROUP fg2 DISK ‘<a disk in SAN2>’
QUORUM FAILGROUP fg3 DISK ‘<another disk or file on a third location>’
ATTRIBUTE ‘compatible.asm’ = ’’;
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
— —– —————– ——— ———
- ONLINE 3e1836343f534f51bf2a19dff275da59 (/dev/sdg10) [DATA]
- ONLINE 138cbee15b394f3ebf57dbfee7cec633 (/dev/sdf11) [DATA]
- ONLINE 462722bd24c94f70bf4d90539c42ad4c (/voting_disk/vote_node1) [DATA]
Located 3 voting file(s).
如果是通过SQL*PLUS,就要执行crsctl replace css votedisk。
参见:Oracle Database Storage Administrator’s Guide, “Oracle ASM Failure Groups” for more information. Oracle Clusterware Administration and Deployment Guide, “Voting file, Oracle Cluster Registry, and Oracle Local Registry” for more information about backup and restore and failure recovery.
2.8 ASM spfile
- ASM spfile location
Oracle建议把ASM SPFILE存放在磁盘组上。你不能给已经存在的ASM SPFILE创建别名。
如果你没有用共享的Oracle grid家目录,Oracle ASM实例会使用PFILE。相同规则的文件名,默认位置,和查找用来适用于数据库初始化参数的文件和也适用于ASM的初始化参数文件。
- 在GPnP配置文件里指定的初始化参数文件的位置。
- 如果GPnP配置文件没有指定位置,查找顺序改变为:
- 在ASM实例的家目录下的SPFILE
例如:在Linux环境下,SPFILE的默认路径是在Oracle grid的家目录下:
- 在ASM实例的家目录下的PFILE
Backing Up, Moving a ASM spfile
你可以备份,复制,或移动ASM SPFILE 用ASMCMD的spbackup,spcopy或spmove命令。关于ASMCMD的命令参见Oracle Database Storage Administrator’s Guide。
参见:Oracle Database Storage Administrator’s Guide “Configuring Initialization Parameters for an Oracle ASM Instance” for more information.
3. Resources
Oracle 集群管理应用和进程是通过管理你在集群里注册的资源。你在集群里注册的资源数量取决于你的应用。应用只由一个进程组成,通常就只有一个资源。有些复制的应用,由多个进程或组件组成,可能需要多个资源。
3.1 资源类型
- 管理只需要资源的属性Manage only necessary resource attributes
- 管理所有的资源基于资源类型Manage all resources based on the resource type
- 基础资源:基础类型
- 本地资源:集群里的每个服务器上实例的本地资源(类型名是local_resource) 例如 node14.vip 。
- 集群资源:集群资源类型(类型名是cluster_resours)are aware of the cluster environment and are subject to cardinality and cross-server switchover and failover; 例如: asm.
执行crsctl stat type命令可以列出说有的定义的类型:
列出类型的所有属性和默认值,执行crsctl stat type <typeName> -f (for full configuration) or –p (for static configuration)。
- Base Resource Type Definition
查看所有的基础资源类型的名称和默认值,运行crsctl stat type resource –p命令。
Name | History | Description |
NAME | From 10gR2 | The name of the resource. Resource names must be unique and may not be modified once the resource is created. |
TYPE | From 10gR2,
modified |
Semantics are unchanged; values other than application exist
Type: string Special Values: No |
CHECK_INTERVAL | From 10gR2 | Unchanged
Type: unsigned integer Special Values: No Per-X Support: Yes |
DESCRIPTION | From 10gR2 | Unchanged Type: string
Special Values: No |
RESTART_ATTEMPTS | From 10gR2 | Unchanged
Type: unsigned integer Special Values: No Per-X Support: Yes |
START_TIMEOUT | From 10gR2 | Unchanged
Type: unsigned integer Special Values: No Per-X Support: Yes |
STOP_TIMEOUT | From 10gR2 | Unchanged
Type: unsigned integer Special Values: No Per-X Support: Yes |
SCRIPT_TIMEOUT | From 10gR2 | Unchanged
Type: unsigned integer Special Values: No Per-X Support: Yes |
UPTIME_THRESHOLD | From 10gR2 | Unchanged Type: string
Special Values: No Per-X Support: Yes |
AUTO_START | From 10gR2 | Unchanged Type: string
Format: restore|never|always Required: No Default: restore Special Values: No |
BASE_TYPE | New | The name of the base type from which this type extends. This is the value of the “TYPE” in the base type’s profile.
Type: string Format: [name of the base type] Required: Yes Default: empty string (none) Special Values: No Per-X Support: No |
DEGREE | New | This is the count of the number of instances of the resource that are allowed to run on a single server. Today’s application has a fixed degree of one. Degree supports multiplicity within a server
Type: unsigned integer Format: [number of attempts, >=1] Required: No Default: 1 Special Values: No |
ENABLED | New | The flag that governs the state of the resource as far as being managed by Oracle Clusterware, which will not attempt to manage a disabled resource whether directly or because of a dependency to another resource. However, stopping of the resource when requested by the administrator will be allowed
(so as to make it possible to disable a resource without having to stop it). Additionally, any change to the resource’s state performed by an ‘outside force’ will still be proxied into the clusterware. Type: unsigned integer Format: 1 | 0 Required: No Default: 1 Special Values: No Per-X Support: Yes |
START_DEPENDENCIES | New | Specifies a set of relationships that govern the start of the resource.
Type: string Required: No Default: Special Values: No |
STOP_DEPENDENCIES | New | Specifies a set of relationships that govern the stop of the resource.
Type: string Required: No Default: Special Values: No |
AGENT_FILENAME | New | An absolute filename (that is, inclusive of the path and file name) of the agent program that handles this type. Every resource type must have an agent program that handles its resources. Types can do so by either specifying the value for this attribute or inheriting it from their base type.
Type: string Required: Yes Special Values: Yes Per-X Support: Yes (per-server only) |
modified |
An absolute filename (that is, inclusive of the path and file name) of the action script file. This attribute is used in conjunction with the AGENT_FILENAME. CRSD will invoke the script in the manner it did in 10g for all entry points (operations) not implemented in the agent binary. That is, if the agent program implements a particular entry point, it is invoked; if it does not, the script specified in this attribute will be executed.
Please note that for backwards compatibility with previous releases, a built-in agent for the application type will be included with CRS. This agent is implemented to always invoke the script specified with this attribute. Type: string Required: No Default: Special Values: Yes Per-X Support: Yes (per-server only) |
ACL | New | Contains permission attributes. The value is populated at resource creation time based on the identity of the process creating the resource, unless explicitly overridden. The value can subsequently be changed using the APIs/command line utilities, provided that such a change is allowed based on the existing permissions of the resource.
Format:owner:<user>:rwx,pgrp:<group>:rwx,other::r— Where owner: the OS User of the resource owner, followed by the permissions that the owner has. Resource actions will be executed as with this user ID. pgrp: the OS Group that is the resource’s primary group, followed by the permissions that members of the group have other: followed by permissions that others have Type: string Required: No Special Values: No |
STATE_CHANGE_EVENT_TEM PLATE | New | The template for the State Change events. Type: string Required: No
Default: Special Values: No |
PROFILE_CHANGE_EVENT_TE MPLATE | New | The template for the Profile Change events. Type: string Required: No
Default: Special Values: No |
ACTION_FAILURE_EVENT_TE MPLATE | New | The template for the State Change events.
Type: string Required: No Default: Special Values: No |
LAST_SERVER | New | An internally managed, read-only attribute that contains the name of the server on which the last start action has succeeded.
Type: string Required: No, read-only Default: empty Special Values: No |
OFFLINE_CHECK_INTERVAL | New | Used for controlling off-line monitoring of a resource. The value represents the interval (in seconds) to use for implicitly monitoring the resource when it is OFFLINE. The monitoring is turned off if the value is 0
Type: unsigned integer Required: No Default: 0 Special Values: No Per-X Support: Yes |
STATE_DETAILS | New | An internally managed, read-only attribute that contains details about the state of the resource. The attribute fulfills the following needs:
1. CRSD understood resource states (Online, Offline, Intermediate, etc) may map to different resource-specific values (mounted, unmounted, open, closed, etc). In order to provide a better description of this mapping, resource agent developers may choose to provide a ‘state label’ as part of providing the value of the STATE. 2. Providing the label, unlike the value of the resource state, is optional. If not provided, the Policy Engine will use CRSD- understood state values (Online, Offline, etc). Additionally, in the event the agent is unable to provide the label (as may also happen to the value of STATE), the Policy Engine will set the value of this attribute to do it is best at providing the details as to why the resource is in the state it is (why it is Intermediate and/or why it is Unknown) Type: string Required: No, read-only Default: empty Special Values: No |
- Local Resource Type Definition
The local_resource type is the basic building block for resources that are instantiated for each server but are cluster oblivious and have a locally visible state. While the definition of the type is global to the clusterware, the exact property values of the resource instantiation on a particular server are stored on that server. This resource type has no equivalent in Oracle Clusterware 10gR2 and is a totally new concept to Oracle Clusterware.
The following table specifies the attributes that make up the local_resource type definition. To see all default values run the command crsctl stat type local_resource –p.
Name | Description |
ALIAS_NAME | Type: string Required: No Special Values: Yes Per-X Support: No |
LAST_SERVER | Overridden from resource: the name of the server to which the
resource is assigned (“pinned”).
Only Cluster Administrators will be allowed to register local resources.
- Cluster Resource Type Definition
The cluster_resource is the basic building block for resources that are cluster aware and have globally visible state. 11.1‘s application is a cluster_resource. The type’s base is resource. The type definition is read-only. The following table specifies the attributes that make up the cluster_resource type definition.
The following table specifies the attributes that make up the cluster_resource type definition. Run crsctl stat type cluster_resource –p to see all default values.
Name | History | Description |
ACTIVE_PLACEMENT | From 10gR2 | Unchanged
Type: unsigned integer Special Values: No |
FAILOVER_DELAY | From 10gR2 | Unchanged, Deprecated Special Values: No |
FAILURE_INTERVAL | From 10gR2 | Unchanged
Type: unsigned integer Special Values: No Per-X Support: Yes |
FAILURE_THRESHOLD | From 10gR2 | Unchanged
Type: unsigned integer Special Values: No Per-X Support: Yes |
PLACEMENT | From 10gR2 | Format: value
where value is one of the following: restricted Only servers that belong to the associated server pool(s) or hosting members may host instances of the resource. favored If only SERVER_POOLS or HOSTING_MEMBERS attribute is non-empty, servers belonging to the specified server pool(s)/hosting member list will be considered first if available; if/when none are available, any other server will be used. If both SERVER_POOLS and HOSTING_MEMBERS are populated, the former indicates preference while the latter – restricts the choices to the servers within that preference balanced Any ONLINE, enabled server may be used for placement. Less loaded servers will be preferred to more loaded ones. To measure how loaded a server is, clusterware will use the LOAD attribute of resources that are ONLINE on the server. The sum total of LOAD values is used as the absolute measure of the current server load. Type: string Default: balanced Special Values: No |
HOSTING_MEMBERS | From 10g | The meaning from this attribute is taken from the previous release.
Although not officially deprecated, the use of this attribute is discouraged. Special Values: No Required: @see SERVER_POOLS
SERVER_POOLS | New | Format:
* | [<pool name1> […]] This attribute creates an affinity between the resource and one or more server pools as far as placement goes. The meaning of this attribute depends on what the value of PLACEMENT is. When a resource should be able to run on any server of the cluster, a special value of * needs to be used. Note that only Cluster Administrators can specify * as the value for this attribute. Required: restricted PLACEMENT requires either SERVER_POOLS or HOSTING_MEMBERS favored PLACEMENT requires either SERVER_POOLS or HOSTING_MEMBERS but allows both. Balanced PLACEMENT does not require a value Type: string Default: * Special Values: No |
CARDINALITY | New | The count of the number of servers on which a resource wants to be running simultaneously. In other words, this is the ‘upper’ limit for resource cardinality. There’s currently no support for the ‘lower’ cardinality limit.
Please note CRS special values may be used for specifying values of this attribute. Type: string Format: max Required: No Default: 1 Special Values: Yes |
LOAD | New | A non-negative, numeric value designed to represent a quantitative measure of how much server capacity an instance of the resource consumes. The value of this parameter is interpreted in conjunction with that of the PLACEMENT attribute. For balanced placement policy, the value of this attribute place a role in determining where the resource is best placed. This value is an improvement to the original behavior of the balanced placement policy which assumed that the load of every resource is a constant and equal number (1).
Type: unsigned integer Format: non-negative number Required: No Default:1 Special Values: No Per-X Support: Yes |
3.2 资源依赖性
With Oracle Clusterware 11.2 a new dependency concept is introduced, to be able to build dependencies for start and stop actions independent and have a much better granularity.
- Hard Dependency
If resource A has a hard dependency on resource B, B must be ONLINE before A will be started. Please note there is no requirement that A and B be located on the same server.
A possible parameter to this dependency would allow resource B to be in either in ONLINE or INTERMEDIATE state. Such a variation is sometimes referred to as the intermediate dependency.
Another possible parameter to this dependency would make it possible to differentiate if A requires that B be present on the same server or on any server in the cluster. In other words, this illustrates that the presence of resource B on the same server as A is a must for resource A to start.
If the dependency is on a resource type, as opposed to a concrete resource, this should be interpreted as “any resource of the type”. The aforementioned modifiers for locality/state still apply accordingly.
- Weak Dependency
If resource A has a weak dependency on resource B, an attempt to start of A will attempt to start B if is not ONLINE. The result of the attempt to start B is, however, of no consequence to the result of starting A (it is ignored). Additionally, if start of A causes an attempt to start B, failure to start A has no affect on B.
A possible parameter to this dependency is whether or not the start of A should wait for start of B to complete or may execute concurrently.
Another possible parameter to this dependency would make it possible to differentiate if A desires that B be running on the same server or on any server in the cluster. In other words, this illustrates that the presence of resource B on the same server as A is a desired for resource A to start. In addition to the desire to have the dependent resource started locally or on any server in the cluster, another possible parameter is to start the dependent resource on every server where it can run。
If the dependency is on a resource type, as opposed to a concrete resource, this should be interpreted as “every resource of the type”. The aforementioned modifiers for locality/state still apply accordingly.
- Attraction
If resource A attracts B, then whenever B needs to be started, servers that currently have A running will be first on the list of placement candidates. Since a resource may have more than one resource to which it is attracted, the number of attraction-exhibiting resources will govern the order of precedence as far as server placement goes.
If the dependency is on a resource type, as opposed to a concrete resource, this should be interpreted as “any resource of the type”.
A possible flavor of this relation is to require that a resource’s placement be re-evaluated when a related resource’s state changes. For example, resource A is attracted to B and C. At the time of starting A, A is started where B is. Resource C may either be running or started thereafter. Resource B is subsequently shut down/fails and does not restart. Then resource A requires that at this moment its placement be re-evaluated and it be moved to C. This is somewhat similar to the AUTOSTART attribute of the resource profile, with the dependent resource’s state change acting as a trigger as opposed to a server joining the cluster.
A possible parameter to this relation is whether or not resources in intermediate state should be counted as running thus exhibit attraction or not.
If resource A excludes resource B, this means that starting resource A on a server where B is running will be impossible. However, please see the dependency’s namesake for STOP to find out how B may be stopped/relocated so A may start.
- Pull-up
If a resource A needs to be auto-started whenever resource B is started, this dependency is used. Note that the dependency will only affect A if it is not already running. As is the case for other dependency types, pull-up may cause the dependent resource to start on any or the same server, which is parameterized. Another possible parameter to this dependency would allow resource B to go to either in ONLINE or INTERMEDIATE state to trigger pull-up of A. Such a variation is sometimes referred to as the intermediate dependency. Note that if resource A has pull-up relation to resources B and C, then it will only be pulled up when both B and C are started. In other words, the meaning of resources mentioned in the pull-up specification is interpreted as a Boolean AND.
Another variation in this dependency is if the value of the TARGET of resource A plays a role: in some cases, a resource needs to be pulled-up irrespective of its TARGET while in others only if the value of TARGET is ONLINE. To accommodate both needs, the relation offers a modifier to let users specify if the value of the TARGET is irrelevant; by default, pull-up will only start resources if their TARGET is ONLINE. Note that this modifier is on the relation, not on any of the targets as it applies to the entire relation.
If the dependency is on a resource type, as opposed to a concrete resource, this should be interpreted as “any resource of the type”. The aforementioned modifiers for locality/state still apply accordingly.
- Dispersion
The property between two resources that desire to avoid being co-located, if there’s no alternative other than one of them being stopped, is described by the use of the dispersion relation. In other words, if resource A prefers to run on a different server than the one occupied by resource B, then resource A is said to have a dispersion relation to resource B at start time. This sort of relation between resources has an advisory effect, much like that of attraction: it is not binding as the two resources may still end up on the same server.
A special variation on this relation is whether or not crsd is allowed/expected to disperse resources, once it is possible, that are already running. In other words, normally, crsd will not disperse co-located resources when, for example, a new server becomes online: it will not actively relocate resources once they are running, only disperse them when starting them. However, if the dispersion is ‘active’, then crsd will try to relocate one of the resources that disperse to the newly available server.
A possible parameter to this relation is whether or not resources in intermediate state should be counted as running thus exhibit attraction or not.
4. Fast Application Notification (FAN)
- Event Sources
在11.2,CRSD所有者组织了大量的事件,RLB事件是来源于数据库。如果eONS 没有在运行,ReporterModule尝试缓存事件直到eONS启动。事件确保发送和接收发生动作发生的顺序。
- Event Processing architecture in oraagent
- database / ONS / eONS agents
- eONS subscriber threads
在oraagent日志里,可以通过字符串”Thread:[EonsSub ONS]”, “Thread:[EonsSub EONS]” 和”Thread:[EonsSub FAN]”辨识出eONS subscriber线程。在下面的例子中,一个服务已经停止,这个节点的crsd oraagent程序和三个eONS会受到这个事件:
2009-05-26 23:36:40.479: [AGENTUSR][2868419488][UNKNOWN] Thread:[EonsSub FAN]
process {
2009-05-26 23:36:40.500: [AGENTUSR][2868419488][UNKNOWN] Thread:[EonsSub FAN]
process }
2009-05-26 23:36:40.540: [AGENTUSR][2934963104][UNKNOWN] Thread:[EonsSub ONS]
process }
2009-05-26 23:36:40.558: [AGENTUSR][2934963104][UNKNOWN] Thread:[EonsSub ONS]
process {
2009-05-26 23:36:40.563: [AGENTUSR][2924329888][UNKNOWN] Thread:[EonsSub EONS]
process {
2009-05-26 23:36:40.564: [AGENTUSR][2924329888][UNKNOWN] Thread:[EonsSub EONS]
process }
- Event Publishers/processors in general
On one node of the cluster, the eONS subscriber of the following agents also assumes the role of a publisher or processor or master (pick your favorite terminology):
- One dbagent’s eONS subscriber assumes the role “CLSN.FAN.pommi.FANPROC”; this subscriber is responsible for publishing ONS events (FAN events) to the HA alerts queue for database ‘pommi’. There is one FAN publisher per database in the
- One onsagent’s eONS subscriber assumes the role “CLSN.ONS.ONSPROC”, publisher for ONS events; this subscriber is responsible for sending eONS events to ONS clients.
- Each eonsagent’s eONS subscriber on every node publishes eONS events as user callouts. There is no single eONS publisher in the cluster. User callouts are no longer produced by
The publishers/processors can be identified by searching for “got lock”:
staiu01/agent/crsd/oraagent_spommere/oraagent_spommere.l01:2009-05-26 19:51:41.549: [AGENTUSR][2934959008][UNKNOWN] CssLock::tryLock, got lock CLSN.ONS.ONSPROC
staiu02/agent/crsd/oraagent_spommere/oraagent_spommere.l01:2009-05-26 19:51:41.626: [AGENTUSR][3992972192][UNKNOWN] CssLock::tryLock, got lock CLSN.ONS.ONSNETPROC
staiu03/agent/crsd/oraagent_spommere/oraagent_spommere.l01:2009-05-26 20:00:21.214: [AGENTUSR][2856319904][UNKNOWN] CssLock::tryLock, got lock
staiu02/agent/crsd/oraagent_spommere/oraagent_spommere.l01:2009-05-26 20:00:27.108: [AGENTUSR][3926576032][UNKNOWN] CssLock::tryLock, got lock CLSN.FAN.pommi.FANPROC
These CSS-based locks work in such a way that any node can grab the lock if it is not already held. If the process of the lock holder goes away, or CSS thinks the node went away, the lock is released and someone else tries to get the lock. The different processors try to grab the lock whenever they see an event. If a processor previously was holding the lock, it doesn’t have to acquire it again. There is currently no implementation of a “backup” or designated failover-publisher.
In a cluster of 2 or more nodes, one onsagent’s eONS subscriber will also assume the role of CLSN.ONS.ONSNETPROC, i.e. is responsible for just publishing network down events. The publishers with the roles of CLSN.ONS.ONSPROC and CLSN.ONS.ONSNETPROC cannot and will not run on the same node, i.e. they must run on distinct nodes.
If both the CLSN.ONS.ONSPROC and CLSN.ONS.ONSNETPROC simultaneously get their public network interface pulled down, there may not be any event.
- RLB publisher
Another additional thread tied to the dbagent thread in the oraagent process of only one node in the cluster, is ” Thread:[RLB:dbname]”, and it dequeues the LBA/RLB/affinity event from the SYS$SERVICE_METRICS queue, and publishes the event to eONS clients. It assumes the lock role of CLSN.RLB.dbname. The CLSN.RLB.dbname publisher can run on any node, and is not related to the location of the MMON master (who enqueues LBA events into the SYS$SERVICE_METRICS queue. So since the RLB publisher (RLB.dbname) can run on a different node than the ONS publisher (ONSPROC), RLB events can be dequeued on one node, and published to ONS on another node. There is one RLB publisher per database in the cluster
Sample trace, where Node 3 is the RLB publisher, and Node 2 has the ONSPROC role:
– Node 3:
2009-05-28 19:29:10.754: [AGENTUSR][2857368480][UNKNOWN]
Thread:[RLB:pommi] publishing message srvname = rlb
2009-05-28 19:29:10.754: [AGENTUSR][2857368480][UNKNOWN]
Thread:[RLB:pommi] publishing message payload = VERSION=1.0 database=pommi service=rlb { {instance=pommi_3 percent=25 flag=UNKNOWN aff=FALSE}{instance=pommi_4 percent=25 flag=UNKNOWN aff=FALSE}{instance=pommi_2 percent=25 flag=UNKNOWN aff=FALSE}{instance=pommi_1 percent=25 flag=UNKNOWN aff=FALSE} } timestamp=2009-05-28 19:29:10
The RLB events will be received by the eONS subscriber of the ONS publisher (ONSPROC) who then posts the event to ONS:
– Node 2:
2009-05-28 19:29:40.773: [AGENTUSR][3992976288][UNKNOWN] Publishing the
ONS event type database/event/servicemetrics/rlb
- Example
- Node 1
- assumes role of FAN/AQ publisher CLSN.FAN.dbname.FANPROC, enqueues HA events into HA alerts queue
- assumes role of eONS publisher to generate user callouts MMON enqueues RLB events into SYS$SERVICE_METRICS queue
- Node 2
- assumes role of ONS publisher CLSN.ONS.ONSPROC to publish ONS and RLB events to ONS subscribers (listener, JDBC ICC/UCP)
- assumes role of eONS publisher to generate user callouts
- Node 3
- assumes role of ONSNET publisher CLSN.ONS.ONSNETPROC to publish ONS events to ONS subscribers (listener, JDBC ICC/UCP)
- assumes role of eONS publisher to generate user callouts
- Node 4
- assumes role of RLB publisher CLSN.RLB.dbname, dequeues RLB events from SYS$SERVICE_METRICS queue and posts them to eONS
- assumes role of eONS publisher to generate user callouts
- Coming up in 2.0.2
The above description is only valid for In, the eONS proxy a.k.a eONS server will be removed, and its functionality will be assumed by evmd. In addition, the tracing as described above, will change significantly. The major reason for this change was the high resource usage of the eONS JVM.
In order to find the publishers in the oraagent.log in, search for these patterns:
“ONS.ONSNETPROC CssLockMM::tryMaster I am the master” “ONS.ONSPROC CssLockMM::tryMaster I am the master” “FAN.<dbname> CssLockMM::tryMaster I am the master” “RLB.<dbname> CssSemMM::tryMaster I am the master”
5. Configuration best practices
- Cluster interconnect
可以用oifcfg delif / setif修改集群的网络互连,也可以修改集群的私有网络互连,在集群件重启时生效。
Oracle RAC网络互连必须用相同的接口。不要配置私有的网络互连在集群件上没有定义的不同接口。
参见: Oracle Clusterware Administration and Deployment Guide, “Changing Network Addresses on Manually Configured Networks” for more information.
- misscount
# crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.
6 Clusterware Diagnostics and Debugging
6.1 Check Cluster Health
本地节点的OHASD已经启动和如果守护进程是健康运行的,就可以进行‘crsctl check has’检查。
# crsctl check has
CRS-4638: Oracle High Availability Services is online
‘crsctl check crs’可以检查OHASD,CRSD,OCSSD和EVM守护进程。
# crsctl check crs
CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
‘crsctl check cluster –all’将检查集群里所有节点的所有守护进程
# crsctl check cluster –all
************************************************************** node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
************************************************************** node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
在用crsctl start cluster命令启动集群时,监控输出,尝试启动所有资源应该是成功的,如果有资源启动失败,到相应的日志里查找错误信息。
# crsctl start cluster
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘node1’
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘node1’
CRS-2672: Attempting to start ‘ora.diskmon’ on ‘node1’
CRS-2676: Start of ‘ora.diskmon’ on ‘node1’ succeeded
CRS-2676: Start of ‘ora.cssd’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.ctssd’ on ‘node1’
CRS-2676: Start of ‘ora.ctssd’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.evmd’ on ‘node1’
CRS-2672: Attempting to start ‘ora.asm’ on ‘node1’
CRS-2676: Start of ‘ora.evmd’ on ‘node1’ succeeded
CRS-2676: Start of ‘ora.asm’ on ‘node1’ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘node1’
CRS-2676: Start of ‘ora.crsd’ on ‘node1’ succeeded
6.2 crsctl command line tool
Oracle 集群管理工具有命令可以用来管理集群框架下的所有实体。包括集群的守护进程,钱夹管理在集群的所有节点上。
- 启动和停止集群资源
- 启动和停止集群守护进程
- 检查集群的监控状态
- 代表第三方应用管理资源
- 整合集群的智能平台管理接口(IPMI),提供故障隔离支持和集群完整性。
- 调试Oracle集群组件
参见:Oracle Clusterware Administration and Deployment Guide, “CRSCTL Utility Reference” for more information about using crsctl.
可以在root用户下用crsctl set log命令启动动态调试CRS,CSS,EVM和集群的子构件。你可以动态修改调试级别用crsctl debug命令。调试信息保存在OCR中,在下次启动时使用。你可以始终开启资源调试。
调试性能和选项的完整列表在“Oracle Clusterware Administration and Deployment Guide”的“Troubleshooting and Diagnostic Output”章节里有列出。
6.3 Trace File Infrastructure and Location
Oracle集群在日志文件里使用循环的方法。如果你不能在文件里找到指定的告警细节信息,那么这个文件可能被循环成一个循环版本,典型的结尾是 *.lnumber,这个数字从01开始,产生更多的日志时这个数字会增长,总是能够不同的日志对应于不同的日志文件。一般不需要参考下面这些文件除非Oracle支持提出要求。你可以在文件里查看循环版本的日志文件。日志的保留策略,The log retention policy, however, foresees that older logs are be purged as required by the amount of logs generated
GRID_HOME/log/<host>/diskmon – Disk Monitor Daemon
GRID_HOME/log/<host>/client – OCRDUMP, OCRCHECK, OCRCONFIG, CRSCTL – edit the
GRID_HOME/srvm/admin/ocrlog.ini file to increase the trace level
GRID_HOME/log/<host>/admin – not used
GRID_HOME/log/<host>/ctssd – Cluster Time Synchronization Service
GRID_HOME/log/<host>/gipcd – Grid Interprocess Communication Daemon
GRID_HOME/log/<host>/ohasd – Oracle High Availability Services Daemon
GRID_HOME/log/<host>/crsd – Cluster Ready Services Daemon
GRID_HOME/log/<host>/gpnpd – Grid Plug and Play Daemon
GRID_HOME/log/<host>/mdnsd – Mulitcast Domain Name Service Daemon
GRID_HOME/log/<host>/evmd – Event Manager Daemon GRID_HOME/log/<host>/racg/racgmain – RAC RACG
GRID_HOME/log/<host>/racg/racgeut – RAC RACG GRID_HOME/log/<host>/racg/racgevtf – RAC RACG
GRID_HOME/log/<host>/racg – RAC RACG (only used if pre-11.1 database is installed)
GRID_HOME/log/<host>/cssd – Cluster Synchronization Service Daemon GRID_HOME/log/<host>/srvm – Server Manager
GRID_HOME/log/<host>/agent/ohasd/oraagent_oracle11 – HA Service Daemon Agent
GRID_HOME/log/<host>/agent/ohasd/oracssdagent_root – HA Service Daemon CSS Agent
GRID_HOME/log/<host>/agent/ohasd/oracssdmonitor_root – HA Service Daemon ocssdMonitor Agent
GRID_HOME/log/<host>/agent/ohasd/orarootagent_root – HA Service Daemon Oracle Root Agent
GRID_HOME/log/<host>/agent/crsd/oraagent_oracle11 – CRS Daemon Oracle Agent
GRID_HOME/log/<host>/agent/crsd/orarootagent_root – CRS Daemon Oracle Root Agent
GRID_HOME/log/<host>/agent/crsd/ora_oc4j_type_oracle11 – CRS Daemon OC4J Agent ( feature and not used in
GRID_HOME/log/<host>/gnsd – Grid Naming Services Daemon
6.4 Diagcollection
获取某个事件所有的相关的跟踪文件最好的路径是Grid_home/bin/diagcollection.pl。收集所有的trace和root用户在所有节点运行一个OCRDUMP命令“diagcollection.pl –collect –crshome <GRID_HOME>”。
# Grid_home/bin/diagcollection.pl
Production Copyright 2004, 2008, Oracle. All rights reserved Cluster Ready Services (CRS) diagnostic collection tool diagcollection
[–crs] For collecting crs diag information
[–adr] For collecting diag information for ADR [–ipd] For collecting IPD-OS data
[–all] Default.For collecting all diag information.
[–core] UNIX only. Package core files with CRS data
[–afterdate] UNIX only. Collects archives from the specified date. Specify in mm/dd/yyyy format
[–aftertime] Supported with -adr option. Collects archives after the specified time. Specify in YYYYMMDDHHMISS24 format
[–beforetime] Supported with -adr option. Collects archives before the specified date. Specify in YYYYMMDDHHMISS24 format
[–crshome] Argument that specifies the CRS Home location
[–incidenttime] Collects IPD data from the specified time.Specify in MM/DD/YYYY24HH:MM:SS format If not specified, IPD data generated in the past 2 hours are collected
[–incidentduration] Collects IPD data for the duration after the specified time. Specify in HH:MM format.If not specified, all IPD data after incidenttime are collected
- You can also do the following
./diagcollection.pl –collect –crs –crshome <CRS Home>
–clean cleans up the diagnosability information gathered by this script
–coreanalyze UNIX only. Extracts information from core files and stores it in a text file
6.5 Alert Messages Using Diagnostic Record Unique IDs
从11.2开始。有些集群的信息里有用”(:” and “:)”用包含起来的文本。通常情况下,和下面的例子类似,这个标识符在文件中以”Details in…”开始和包含日志文件路径。这个标识符叫做DRUID或者诊断记录的唯一ID:
2009-07-16 00:18:44.472
‘/scratch/11.2/grid/bin/orarootagent_root’ disconnected from server.
Details at (:CRSAGF00117:) in
6.6 OUI / SRVM / JAVA related GUI tracing
“setenv SRVM_TRACE true” (or “export SRVM_TRACE=true”)
“setenv SRVM_TRACE_LEVEL 2” (or “export SRVM_TRACE_LEVEL=2″)
在OUI安装出错时可用运行-debug选项(如安装时执行”./runInstaller -debug”
6.7 Reboot Advisory
在11.2集群里有个新特性叫Reboot Advisory,用来提高保留集群重启的说明内容。这时集群发生了重启,一个短的解释性消息会产生和试图在下面两种途径发布:
这些操作是并行的并有时间限制,因此不会对重启有延时。尝试多个磁盘和网络来获取这些信息,至少有一个会成功,往往是都成功的。成功的存储和发送Reboot Advisory信息,最后出现在集群的一个或多个节点上的alert日志里。
当网络广播Reboot Advisory信息成功后,在集群的其他节点的告警日志里就会出现相关的信息。这个事情是稍纵即逝的,因此立马就能看到和确定产生重启的原因。这些消息包含要进行重启节点的主机名用来区别集群里的其他节点。只是同一个集群里的失败节点会显示这些信息。
如果Reboot Advisory成功的吧信息写到一个磁盘文件里,在这个节点下次启动集群时,在告警日志的前面会产生相关的信息。
Reboot Advisory 有一个时间戳,3天内的启动都会扫描这些文件。这个扫描不能是空文件或被标记成已经公布了的文件,因此如果3天内在一个节点上多次重启,那么同一个Reboot Advisory会在告警日志里多次出现。
Reboot Advisories用相同的告警日志,一般有两个部分。一部分是CRS-8011,显示重启节点的主机名和时间戳(重启的大约时间点)。例如:
[ohasd(24687)]CRS-8011:reboot advisory message from host: sta00129, component: CSSMON, with timestamp: L-2009-05-05-10:03:25.340
[ohasd(24687)]CRS-8013:reboot advisory message text: Rebooting after limit 28500 exceeded; disk timeout 27630, network timeout 28500, last heartbeat from ocssd at epoch seconds 1241543005.340, 4294967295 milliseconds ago based on invariant clock value of 93235653
在某些情况下,Reboot Advisories可能会在文本信息里添加二进制诊断数据。那么可能就会出现CRS-8014和一个或多个CRS-8015信息。这些二进制文件只要在重启问题报告给Oracle解决时有用。
不同的组件可以在同一时间往告警日志里写数据,因此关于Reboot Advisory的信息可能会出现在其他信息的中间。然而不同的Reboot Advisory参数的信息不会交叉在一起,一个Reboot Advisory产生的所有信息会在另一个Reboot Advisory产生的信息之前。
更多的信息,可以参照Oracle Errors manual discussion of messages CRS- 8011 and –8013。
7. Other Tools
7.1 ocrpatch
ocrpatch 开发于2005年,是为了给开发和支持人员提供一个修复错误和修改OCR的工具,当官方的工具如ocrcofig或crsctl无法处理这些变化时。ocrpatch不是集群版本的一部分。ocrpathd的功能描述在单独的文档里,因此在这里我们不会深入细节,ocrpatch文档的位置在stcontent的public RAC Performance Group Folder 里。
7.2 vdpatch
$ vdpatch
VD Patch Tool Version 11.2 (20090724) Oracle Clusterware Release
Copyright (c) 2008, 2009, Oracle. All rights reserved. [FATAL] not privileged
[OK] Exiting due to fatal error …
投票文件的名字和路径可以通过’crsctl query css votedisk’命令获取。这个命令只能是OCSSD是运行状态下执行。如果OCSSD没有启动,crsct将没有信号
# crsctl query css votedisk
Unable to communicate with the Cluster Synchronization Services daemon.
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
— —– —————– ——— ———
- ONLINE 0909c24b14da4f89bfbaf025cd228109 (/dev/raw/raw100) [VDDG]
- ONLINE 9c74b39a1cfd4f84bf27559638812106 (/dev/raw/raw104) [VDDG]
- ONLINE 1bb06db216434fadbfa3336b720da252 (/dev/raw/raw108) [VDDG]
Located 3 voting file(s).
# vdpatch
VD Patch Tool Version 11.2 (20090724)
Oracle Clusterware Release
Copyright (c) 2008, 2009, Oracle. All rights reserved.
vdpatch> op /dev/raw/raw100
[OK] Opened /dev/raw/raw100, type: ASM
- ONLINE 9f862a63239b4f52bfdbce6d262dc349 (/dev/raw/raw134) [] Located 3 voting file(s).
# vdpatch
VD Patch Tool Version 11.2 (20090724) Oracle Clusterware Release
Copyright (c) 2008, 2009, Oracle. All rights reserved. vdpatch> op /dev/raw/raw126
[OK] Opened /dev/raw/raw126, type: Raw/FS
vdpatch> op /dev/raw/raw126
[OK] Opened /dev/raw/raw126, type: Raw/FS
vdpatch> op /dev/raw/raw130
[INFO] closing voting file /dev/raw/raw126
[OK] Opened /dev/raw/raw130, type: Raw/FS
vdpatch> h
Usage: vdpatch
BLOCK operations
op <path to voting file> open voting file
rb <block#> read block by block# rb status|kill|lease <index> read named block
index=[0..n] => Devenv nodes 1..(n-1) index=[1..n] => shiphome nodes 1..n
rb toc|info|op|ccin|pcin|limbo read named block
du dump native block from offset
di display interpreted block
of <offset> set offset in block, range 0-511 MISC operations
i show parameters, version, info
h this help screen
exit / quit exit vdpatch
- Common Use Case
投票文件块可以读块号和块类型名。TOC, INFO, OP, CCIN, PCIN 和 LIMBO类型只会出现在投票文件的块上,因此读一个块可以执行如’rb toc’; 将输出512比特块的十六进制/ASCII 的dump文件,解释块的内容:
vdpatch> rb toc [OK] Read block 4
[INFO] clssnmvtoc block
0 73734C63 6B636F54 01040000 00020000 00000000 ssLckcoT…………
20 00000000 40A00000 00020000 00000000 10000000 ….@……………
40 05000000 10000000 00020000 10020000 00020000 ………………..
420 00000000 00000000 00000000 00000000 00000000 ………………..
440 00000000 00000000 00000000 00000000 00000000 ………………..
460 00000000 00000000 00000000 00000000 00000000 ………………..
480 00000000 00000000 00000000 00000000 00000000 ………………..
500 00000000 00000000 00000000 …………
[OK] Displayed block 4 at offset 0, length 512 [INFO] clssnmvtoc block
magic1_clssnmvtoc: 0x634c7373 – 1665954675
magic2_clssnmvtoc: 0x546f636b – 1416586091
fmtvmaj_clssnmvtoc: 0x01 – 1
fmtvmin_clssnmvtoc: 0x04 – 4
resrvd_clssnmvtoc: 0x0000 – 0
maxnodes_clssnmvtoc: 0x00000200 – 512
incarn1_clssnmvtoc: 0x00000000 – 0
incarn2_clssnmvtoc: 0x00000000 – 0
filesz_clssnmvtoc: 0x0000a040 – 41024
blocksz_clssnmvtoc: 0x00000200 – 512
hdroff_clssnmvtoc: 0x00000000 – 0
hdrsz_clssnmvtoc: 0x00000010 – 16
opoff_clssnmvtoc: 0x00000005 – 5
statusoff_clssnmvtoc: 0x00000010 – 16
statussz_clssnmvtoc: 0x00000200 – 512
killoff_clssnmvtoc: 0x00000210 – 528
killsz_clssnmvtoc: 0x00000200 – 512
leaseoff_clssnmvtoc: 0x0410 – 1040
leasesz_clssnmvtoc: 0x0200 – 512
ccinoff_clssnmvtoc: 0x0006 – 6
pcinoff_clssnmvtoc: 0x0008 – 8
limbooff_clssnmvtoc: 0x000a – 10
volinfooff_clssnmvtoc: 0x0003 – 3
块类型STATUS, KILL 和LEASE,存在一个块在每个集群节点上,因此用’rb‘命令必须包括一个十六进制的数来表示节点号。在开发环境,十六进制从0开始,在生产环境,十六进制从1开始。因此要读开发环境下第五个节点的KILL块,执行’rb kill 4’,在生成环境就要执行’rb kill 5’。
vdpatch> rb status 2 [OK] Read block 18
[INFO] clssnmdsknodei vote block
0 65746F56 02000000 01040B02 00000000 73746169 etoV…………stai
20 75303300 00000000 00000000 00000000 00000000 u03……………..
40 00000000 00000000 00000000 00000000 00000000 ………………..
60 00000000 00000000 00000000 00000000 00000000 ………………..
80 00000000 3EC40609 8A340200 03000000 03030303 ….> 4……….
100 00000000 00000000 00000000 00000000 00000000 ………………..
120 00000000 00000000 00000000 00000000 00000000 ………………..
140 00000000 00000000 00000000 00000000 00000000 ………………..
160 00000000 00000000 00000000 00000000 00000000 ………………..
180 00000000 00000000 00000000 00000000 00000000 ………………..
200 00000000 00000000 00000000 00000000 00000000 ………………..
220 00000000 00000000 00000000 00000000 00000000 ………………..
240 00000000 00000000 00000000 00000000 00000000 ………………..
260 00000000 00000000 00000000 00000000 00000000 ………………..
280 00000000 00000000 00000000 00000000 00000000 ………………..
300 00000000 00000000 00000000 00000000 00000000 ………………..
320 00000000 00000000 00000000 00000000 00000000 ………………..
340 00000000 00000000 00000000 8E53DF4A ACE84A91 ………….S.J..J.
360 E4350200 00000000 03000000 441DDD4A 6051DF4A .5……….D..J`Q.J
380 00000000 00000000 00000000 00000000 00000000 ………………..
400 00000000 00000000 00000000 00000000 00000000 ………………..
420 00000000 00000000 00000000 00000000 00000000 ………………..
440 00000000 00000000 00000000 00000000 00000000 ………………..
460 00000000 00000000 00000000 00000000 00000000 ………………..
480 00000000 00000000 00000000 00000000 00000000 ………………..
500 00000000 00000000 00000000 …………
[OK] Displayed block 18 at offset 0, length 512
[INFO] clssnmdsknodei vote block
magic_clssnmdsknodei: 0x566f7465 – 1450144869
nodeNum_clssnmdsknodei: 0x00000002 – 2
fmtvmaj_clssnmdsknodei: 0x01 – 1
fmtvmin_clssnmdsknodei: 0x04 – 4
prodvmaj_clssnmdsknodei: 0x0b – 11
prodvmin_clssnmdsknodei: 0x02 – 2
killtime_clssnmdsknodei: 0x00000000 – 0
nodeName_clssnmdsknodei: staiu03
inSync_clssnmdsknodei: 0x00000000 – 0
reconfigGen_clssnmdsknodei: 0x0906c43e – 151438398
dskWrtCnt_clssnmdsknodei: 0x0002348a – 144522
nodeStatus_clssnmdsknodei: 0x00000003 – 3
node 0: 0x03 – 3 – MEMBER
node 1: 0x03 – 3 – MEMBER
node 2: 0x03 – 3 – MEMBER
node 3: 0x03 – 3 – MEMBER
timing_clssnmdsknodei.sts_clssnmTimingStmp: 0x4adf538e – 1256149902 – Wed Oct 21 11:31:42 2009
timing_clssnmdsknodei.stms_clssnmTimingStmp: 0x914ae8ac – 2437605548
timing_clssnmdsknodei.stc_clssnmTimingStmp: 0x000235e4 – 144868
timing_clssnmdsknodei.stsi_clssnmTimingStmp: 0x00000000 – 0
timing_clssnmdsknodei.flags_clssnmdsknodei: 0x00000003 – 3
unique_clssnmdsknodei.eptime_clssnmunique: 0x4add1d44 – 1256004932 – Mon Oct 19 19:15:32 2009
ccinid_clssnmdsknodei.cin_clssnmcinid: 0x4adf5160 – 1256149344 – Wed Oct 21 11:22:24 2009
ccinid_clssnmdsknodei.unique_clssnmcinid: 0x00000000 – 0
pcinid_clssnmdsknodei.cin_clssnmcinid: 0x00000000 – 0 – Wed Dec 31 16:00:00 1969
pcinid_clssnmdsknodei.unique_clssnmcinid: 0x00000000 – 0
7.3 Appvipcfg – adding an application VIP
Production Copyright 2007, 2008, Oracle.All rights reserved Usage:
appvipcfg create -network=<network_number>
delete -vipname=<vipname>
appvipcfg命令行工具在默认的网络上(默认创建的资源ora.net.network)只能创建一个应用VIP。如果要创建应用 VIP在不同的网络或子网上,必须进行手工配置。
srvctl add vip -n node1 -k 2 -A appsvip1/
crsctl add type coldfailover.vip.type -basetype ora.cluster_vip_net2.type
crsctl add resource coldfailover.vip -type coldfailover.vip.type -attr \
START_DEPENDENCIES=hard(ora.net2.network)pullup(ora.net2.network), \
STOP_DEPENDENCIES=hard(ora.net2.network), \
– 8623900 srvctl remove vip -i <ora.vipname> is removing the associated ora.netx.network
– 8620119 appvipcfg should be expanded to create a network resource
– 8632344 srvctl modify nodeapps -a will modify the vip even if the interface is not valid
– 8703112 appsvip should have the same behavior as ora.vip like vip failback
– 8758455 uservip start failed and orarootagent core dump in clsn_agent::agentassert
– 8761666 appsvipcfg should respect /etc/hosts entry for apps ip even if gns is configured
– 8820801 using a second network (k 2) I’m able to add and start the same ip twice
7.4 Application and Script Agent
- Action Entry Points
start : 启动资源的行动Actions to be taken to start the resource
stop : 关闭资源的温和的行动Actions to gracefully stop the resource
check : 检查资源状态的行动Actions taken to check the status of the resource
clean : 强制关闭资源的行动Actions to forcefully stop the resource.
这些行动入口点可以用C++ 代码来定义或在脚本里。如果这些行动没有明确的定义,集群件假定他们默认的定义在脚本里。这些脚本位置通过ACTION_SCRIPT属性。因此它可能是混合的代理,一些行动的图库点用脚本,其他的一些用C++。
- Sample Agents
On startup : 创建文件Create the file.
On shutdown : 温和的删除文件Gracefully delete the file.
On check command:检查是否存在改文件 Detect whether the file is present or not.
On clean command: 强制删除文件Forcefully delete the file.
一旦资源类型定义,完成所需的任务一个专门的代理有几种选择写-代理可以写成一个脚本,作为C / C ++程序或混合。
- Shell script agent
Grid_home/crs/demo/demo脚本文件是一个已经包含所有行动入口点和资源文件代理的shell 脚本。通过实现下面几步来测试这个脚本:
$ crsctl add type test_type1 -basetype cluster_resource -attr \
“ATTRIBUTE=PATH_NAME,TYPE=string,DEFAULT_VALUE=default.txt” -attr \
$ crsctl add resource r1 -type test_type1 -attr “PATH_NAME=/tmp/r1.txt”
$ crsctl add resource r2 -type test_type1 -attr “PATH_NAME=/tmp/r2.txt”
$ crsctl start res r1
$ crsctl start res r2
$ crsctl check res r1
$ crsctl stop res r2
- Option 2: C++ agent
$ crsctl add type test_type1 -basetype cluster_resource -attr \
“ATTRIBUTE=PATH_NAME,TYPE=string,DEFAULT_VALUE=default.txt” -attr \
$ crsctl add resource r3 -type test_type1 -attr “PATH_NAME=/tmp/r1.txt”
$ crsctl add resource r4 -type test_type1 -attr “PATH_NAME=/tmp/r2.txt”
$ crsctl start res r3
$ crsctl start res r4
$ crsctl check res r3
$ crsctl stop res r4
- Option 3: Hybrid agent
在Grid_home/crs/demo目录下,Oracle提供了一个demoagent2.cpp。这是一个简单的C++程序,功能和上面的shell脚本类似。这个程序还监控本地机器上指定的文件。不过,这一方案只定义了检查行动的切入点 – 所有其他操作入口点没有定义,并从ACTION_SCRIPT属性被读取。测试这个程序,执行下面的步骤:
$ crsctl add type test_type1 -basetype cluster_resource -attr \
“ATTRIBUTE=PATH_NAME,TYPE=string,DEFAULT_VALUE=default.txt” -attr \
$ crsctl add resource r5 -type test_type1 -attr “PATH_NAME=/tmp/r1.txt”
$ crsctl add resource r6 -type test_type1 -attr “PATH_NAME=/tmp/r2.txt”
$ crsctl start res r5
$ crsctl start res r6
$ crsctl check res r5
$ crsctl stop res r6
7.5 Oracle Cluster Health Monitor – OS Tool (IPD/OS)
此工具(以前称为瞬时问题检测工具)是用来检测和分析操作系统(OS)和集群资源相关的退化和失败,以带来更多的解释许多Oracle Clusterware和Oracle RAC的问题,如节点驱逐。
http://www.oracle.com/technology/products/database/clustering/ipd_download_homepag e.html
- Install the Oracle Cluster Health Monitor
– 解压包
– 创建用户crfuser:oinstall在所有节点上
– 确保crfuser的家目录在所有节点上是相同的
– 设置crfuser密码在所有节点上
– 身份登录crfuser和运行crfinst.pl用适当的选项
– 要完成安装,以root身份登录,并安装的所有节点上运行crfinst.pl-f
– 在Linux上 CRF_home设置为/ usr / lib目录/ oracrf
- Running the OS Tool stack
# /etc/init.d/init.crfd disable
- Overview of Monitoring Process (osysmond)
– 监控和定期采集系统指标
– 运行作为实时过程
– 违背了系统指标验证规则
– 基于阈值标记颜色编码的警报
– 将数据发送到主记录器守护进程
– 记录数据到本地磁盘失败的情况下发送
- CPU usage < 5%
- CPU Iowait > 50%
- MemFree < 25%
- # Disk IOs persec < 10% of max possible Disk IOs persec
- # bytes of outbound n/w traffic limited to data sent by SYSMOND
- # tasks node-wide > 1024
Oracle集群健康监控器附带有两个数据检索工具,一个是CRF GUI,这是主要的GUI显示。
Usage: crfgui [-m <node>] [-d <time>] [-r <sec>] [-h <sec>]
[-W <sec>] [-i] [-f <name>] [-D <int>]
-m <node> Name of the master node (tmp)
-d <time> Delayed at a past time point
-r <sec> Refresh rate
-h <sec> Highlight rate
-W <sec> Maximal poll time for connection
-I interactive with cmd prompt
-f <name> read from file, “.trc” added if no suffix given
-D <int> sets an internal debug level
一个命令行工具包括在可用于查询Berkeley DB后端打印出到终端节点的特定指标为指定的时间周期的包。该工具还支持在查询过程中一个特定的时间内打印一个节点上的资源持续时间和状态。这些状态是基于为每个资源度量的预定义的阈值和被表示为红,橙,黄,绿,指示减小临界的顺序。例如,你可以要求显示节点“节点1”在CPU的最后1小时内保持红色状态多少秒。 Oclumon也可用于执行复杂的管理任务,如改变调试级别,查询工具的版本,改变度量数据库大小等。
该oclumon的使用帮助可以通过oclumon -h打印。要获得有关每个动词选项运行oclumon <动词> -h了解更多信息。
showtrail, showobjects, dumpnodeview, manage, version, debug, quit 和help。
oclumon showobjects –n node –time “2009-10-07 15:11:00”
oclumon dumpnodeview –n node
oclumon showgaps -n node1 -s “2009-10-07 02:40:00” \
-e “2009-10-07 03:59:00”
Number of gaps found = 0
oclumon showtrail -n node1 -diskid sde qlen totalwaittime \
-s “2009-07-09 03:40:00” -e “2009-07-09 03:50:00” \
-c “red” “yellow” “green”
2009-07-09 03:40:00 TO 2009-07-09 03:41:31 GREEN
2009-07-09 03:41:31 TO 2009-07-09 03:45:21 GREEN
2009-07-09 03:45:21 TO 2009-07-09 03:49:18 GREEN
2009-07-09 03:49:18 TO 2009-07-09 03:50:00 GREEN
oclumon showtrail -n node1 -sys cpuqlen -s \
“2009-07-09 03:40:00” -e “2009-07-09 03:50:00” \
-c “red” “yellow” “green” Parameter=CPU QUEUELENGTH
2009-07-09 03:40:00 TO 2009-07-09 03:41:31 GREEN
2009-07-09 03:41:31 TO 2009-07-09 03:45:21 GREEN
2009-07-09 03:45:21 TO 2009-07-09 03:49:18 GREEN
2009-07-09 03:49:18 TO 2009-07-09 03:50:00 GREEN
- What to collect for cluster related issues
– 在IPD的所有者下执行’Grid_home/bin/diagcollection.pl –collect –ipd –incidenttime <inc time> — incidentduration <duration>’ 命令 ,LOGGERD node, where – incidenttime格式为 MM/DD/YYYY24HH:MM:SS, –incidentduration 格式为HH:MM
– 用/usr/lib/oracrf/bin/oclumon manage -getkey “MASTER=”命令辨认出OGGERD 节点. 在Grid_home/bin目录下启动11.2.0.2的oclumonStarting 。
– 最少收集时间前后30分钟的数据。masterloggerhost:$./bin/diagcollection.pl –collect –ipd –incidenttime 10/05/200909:10:11 –incidentduration 02:00 Starting with and the CRS integrated IPD/OS the syntax to get the IPD data collected is “masterloggerhost:$./bin/diagcollection.pl –collect –crshome /scratch/grid_home_11.2/ –ipdhome /scratch/grid_home_11.2/ –ipd — incidenttime 01/14/201001:00:00 –incidentduration 04:00”
– IPD数据文件看起来像: ipdData_<hostname>_<curr time>.tar.gz ipdData_node1_20091006_2321.tar.gz
– 需要多长时间来运行diagcollect?
4 node cluster, 4 hour data – 10 min
32 node cluster, 1 hour data – 20 min
- Debugging
为了开启osysmond或loggerd 的调试,root用户执行‘oclumon debug log all allcomp:5’。这将打开调试的所有组件。
开启的11.2.0.2 的 IPD/CH日志文件会在: Grid_home/log/<hostname>/crfmond Grid_home/log/<hostname>/crfproxy Grid_home/log/<hostname>/crflogd
- For ADE users
$ cd crfutl && make setup && runcrf
osysmond通常会立即启动,而这可能需要几秒钟(几分钟,如果你的I / O子系统比较慢)为ologgerd和oproxyd启动,由于Berkeley数据库(BDB)的初始化。第一个节点称之为“runcrf’将被配置为主。主后第一个节点运行’runcrf“将被配置为复制品。从那里,如果需要的话事情会移动。守护进程看出来的是:osysmond(所有节点),ologgerd(主服务器和副本节点),oproxyd(所有节点上)。
在开发环境中,IPD/ OS进程不以root或实时运行。
Leave a Reply