【Oracle ASM数据恢复】ERROR: no PST quorum in group 1: required 2, found 0问题解析

首先来了解下PST quorum 是什么意思:

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:[email protected]

 

Partner and Status Table

 

一般来说aun=1 是保留给Partner and Status Table(PST)的拷贝使用的。 一般5个ASM DISK将包含一份PST拷贝。多数的PST内容必须相同且验证有效。否则无法判断哪些ASM DISK实际拥有相关数据。

在 PST中每一条记录对应Diskgroup中的一个ASM DISK。每一条记录会对一个ASM disk枚举其partners的ASM DISK。同时会有一个flag来表示该DISK是否是ONLINE可读写的。这些信息对recovery是否能做很重要。

PST表的Blkn=0是PST的header,存放了如下的信息:

  • Timestamp to indicate PST is valid
  • Version number to compare with other PST copies
  • List of disks containing PST copies
  • Bit map for shadow paging updates

PST的最后一个块是heartbeat block,当diskgroup mount时其每3秒心跳更新一次。

 

以下为PST header

kfed read /oracleasm/asm-disk01 aun=1 blkn=0 aus=4194304 |less 

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                    1024 ; 0x004: blk=1024
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3813974007 ; 0x00c: 0xe3549ff7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:32999670 ; 0x000: HOUR=0x16 DAYS=0x7 MNTH=0x2 YEAR=0x7de
kfdpHdrPairBv1.first.super.time.lo:1788841984 ; 0x004: USEC=0x0 MSEC=0x3e4 SECS=0x29 MINS=0x1a
kfdpHdrPairBv1.first.super.last:      2 ; 0x008: 0x00000002
kfdpHdrPairBv1.first.super.next:      2 ; 0x00c: 0x00000002
kfdpHdrPairBv1.first.super.copyCnt:   5 ; 0x010: 0x05
kfdpHdrPairBv1.first.super.version:   1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare:  0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn:    1 ; 0x014: 0x00000001
kfdpHdrPairBv1.first.super.copy[0]:   0 ; 0x018: 0x0000
kfdpHdrPairBv1.first.super.copy[1]:   1 ; 0x01a: 0x0001
kfdpHdrPairBv1.first.super.copy[2]:   2 ; 0x01c: 0x0002
kfdpHdrPairBv1.first.super.copy[3]:   3 ; 0x01e: 0x0003
kfdpHdrPairBv1.first.super.copy[4]:   4 ; 0x020: 0x0004
kfdpHdrPairBv1.first.super.dtaSz:    15 ; 0x022: 0x000f
kfdpHdrPairBv1.first.asmCompat:186646528 ; 0x024: 0x0b200000
kfdpHdrPairBv1.first.newCopy[0]:      0 ; 0x028: 0x0000
kfdpHdrPairBv1.first.newCopy[1]:      0 ; 0x02a: 0x0000
kfdpHdrPairBv1.first.newCopy[2]:      0 ; 0x02c: 0x0000
kfdpHdrPairBv1.first.newCopy[3]:      0 ; 0x02e: 0x0000
kfdpHdrPairBv1.first.newCopy[4]:      0 ; 0x030: 0x0000
kfdpHdrPairBv1.first.newCopyCnt:      0 ; 0x032: 0x00
kfdpHdrPairBv1.first.contType:        1 ; 0x033: 0x01
kfdpHdrPairBv1.first.spares[0]:       0 ; 0x034: 0x00000000
kfdpHdrPairBv1.first.spares[1]:       0 ; 0x038: 0x00000000
kfdpHdrPairBv1.first.spares[2]:       0 ; 0x03c: 0x00000000
kfdpHdrPairBv1.first.spares[3]:       0 ; 0x040: 0x00000000
kfdpHdrPairBv1.first.spares[4]:       0 ; 0x044: 0x00000000
kfdpHdrPairBv1.first.spares[5]:       0 ; 0x048: 0x00000000
kfdpHdrPairBv1.first.spares[6]:       0 ; 0x04c: 0x00000000
kfdpHdrPairBv1.first.spares[7]:       0 ; 0x050: 0x00000000
kfdpHdrPairBv1.first.spares[8]:       0 ; 0x054: 0x00000000
kfdpHdrPairBv1.first.spares[9]:       0 ; 0x058: 0x00000000
kfdpHdrPairBv1.first.spares[10]:      0 ; 0x05c: 0x00000000
kfdpHdrPairBv1.first.spares[11]:      0 ; 0x060: 0x00000000
kfdpHdrPairBv1.first.spares[12]:      0 ; 0x064: 0x00000000
kfdpHdrPairBv1.first.spares[13]:      0 ; 0x068: 0x00000000
kfdpHdrPairBv1.first.spares[14]:      0 ; 0x06c: 0x00000000
kfdpHdrPairBv1.first.spares[15]:      0 ; 0x070: 0x00000000
kfdpHdrPairBv1.first.spares[16]:      0 ; 0x074: 0x00000000
kfdpHdrPairBv1.first.spares[17]:      0 ; 0x078: 0x00000000
kfdpHdrPairBv1.first.spares[18]:      0 ; 0x07c: 0x00000000
kfdpHdrPairBv1.first.spares[19]:      0 ; 0x080: 0x00000000

 

  • super.time wall clock time of last PST commit
  • super.last  last committed content version number
  • super.next next available content version number
  • super.copyCnt  # of disks holding PST copies
  • super.version   version of PST header format
  • super.ub2spare  pad to ub4 align
  • super.incarn incarnation of <copy> list
  • super.copy[0]  disks holding the PST copies
  • super.dtaSz  data entries in PST
  • newCopy[0]   new disks holding PST copies
  • newCopyCnt  new # disks holding PST copies

 

以下为PST table block:

 

kfed read /oracleasm/asm-disk02 aun=1 blkn=3 aus=4194304 |less 

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           18 ; 0x002: KFBTYP_PST_DTA
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                    1027 ; 0x004: blk=1027
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                  4204644293 ; 0x00c: 0xfa9dc7c5
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpDtaEv1[0].status:               127 ; 0x000: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[0].fgNum:                  1 ; 0x002: 0x0001
kfdpDtaEv1[0].addTs:         2022663849 ; 0x004: 0x788f66a9
kfdpDtaEv1[0].partner[0]:         49154 ; 0x008: P=1 P=1 PART=0x2
kfdpDtaEv1[0].partner[1]:         49153 ; 0x00a: P=1 P=1 PART=0x1
kfdpDtaEv1[0].partner[2]:         49155 ; 0x00c: P=1 P=1 PART=0x3
kfdpDtaEv1[0].partner[3]:         49166 ; 0x00e: P=1 P=1 PART=0xe
kfdpDtaEv1[0].partner[4]:         49165 ; 0x010: P=1 P=1 PART=0xd
kfdpDtaEv1[0].partner[5]:         49164 ; 0x012: P=1 P=1 PART=0xc
kfdpDtaEv1[0].partner[6]:         49156 ; 0x014: P=1 P=1 PART=0x4
kfdpDtaEv1[0].partner[7]:         49163 ; 0x016: P=1 P=1 PART=0xb
kfdpDtaEv1[0].partner[8]:         10000 ; 0x018: P=0 P=0 PART=0x2710
kfdpDtaEv1[0].partner[9]:             0 ; 0x01a: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[10]:            0 ; 0x01c: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[11]:            0 ; 0x01e: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[12]:            0 ; 0x020: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[13]:            0 ; 0x022: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[14]:            0 ; 0x024: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[15]:            0 ; 0x026: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[16]:            0 ; 0x028: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[17]:            0 ; 0x02a: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[18]:            0 ; 0x02c: P=0 P=0 PART=0x0
kfdpDtaEv1[0].partner[19]:            0 ; 0x02e: P=0 P=0 PART=0x0
kfdpDtaEv1[1].status:               127 ; 0x030: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[1].fgNum:                  2 ; 0x032: 0x0002
kfdpDtaEv1[1].addTs:         2022663849 ; 0x034: 0x788f66a9
kfdpDtaEv1[1].partner[0]:         49155 ; 0x038: P=1 P=1 PART=0x3
kfdpDtaEv1[1].partner[1]:         49152 ; 0x03a: P=1 P=1 PART=0x0
kfdpDtaEv1[1].partner[2]:         49154 ; 0x03c: P=1 P=1 PART=0x2
kfdpDtaEv1[1].partner[3]:         49166 ; 0x03e: P=1 P=1 PART=0xe
kfdpDtaEv1[1].partner[4]:         49157 ; 0x040: P=1 P=1 PART=0x5
kfdpDtaEv1[1].partner[5]:         49156 ; 0x042: P=1 P=1 PART=0x4
kfdpDtaEv1[1].partner[6]:         49165 ; 0x044: P=1 P=1 PART=0xd
kfdpDtaEv1[1].partner[7]:         49164 ; 0x046: P=1 P=1 PART=0xc
kfdpDtaEv1[1].partner[8]:         10000 ; 0x048: P=0 P=0 PART=0x2710
kfdpDtaEv1[1].partner[9]:             0 ; 0x04a: P=0 P=0 PART=0x0
kfdpDtaEv1[1].partner[10]:            0 ; 0x04c: P=0 P=0 PART=0x0
kfdpDtaEv1[1].partner[11]:            0 ; 0x04e: P=0 P=0 PART=0x0
kfdpDtaEv1[1].partner[12]:            0 ; 0x050: P=0 P=0 PART=0x0
kfdpDtaEv1[1].partner[13]:            0 ; 0x052: P=0 P=0 PART=0x0
kfdpDtaEv1[1].partner[14]:            0 ; 0x054: P=0 P=0 PART=0x0
kfdpDtaEv1[1].partner[15]:            0 ; 0x056: P=0 P=0 PART=0x0
kfdpDtaEv1[1].partner[16]:            0 ; 0x058: P=0 P=0 PART=0x0

 

 

  • kfdpDtaEv1[0].status: 127 ; 0×000: I=1 V=1 V=1 P=1 P=1 A=1 D=1 disk status
  • fgNum   fail group number
  • addTs   timestamp of the addition to the diskgroup
  • kfdpDtaEv1[0].partner[0]:         49154 ; 0×008: P=1 P=1 PART=0×2  partner list

 

AUN=1 的最后一个block为KFBTYP_HBEAT 心跳表:

 

[oracle@mlab2 hzy]$ kfed read /oracleasm/asm-disk02 aun=1 blkn=1023 aus=4194304 |less  
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           19 ; 0x002: KFBTYP_HBEAT
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                    2047 ; 0x004: blk=2047
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                  1479766671 ; 0x00c: 0x5833728f
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpHbeatB.instance:                  1 ; 0x000: 0x00000001
kfdpHbeatB.ts.hi:              32999734 ; 0x004: HOUR=0x16 DAYS=0x9 MNTH=0x2 YEAR=0x7de
kfdpHbeatB.ts.lo:            3968041984 ; 0x008: USEC=0x0 MSEC=0xe1 SECS=0x8 MINS=0x3b
kfdpHbeatB.rnd[0]:           1065296177 ; 0x00c: 0x3f7f2131
kfdpHbeatB.rnd[1]:            857037208 ; 0x010: 0x33155998
kfdpHbeatB.rnd[2]:           2779184235 ; 0x014: 0xa5a6fc6b
kfdpHbeatB.rnd[3]:           2660793989 ; 0x018: 0x9e987e85

 

 

  • kfdpHbeatB.instance   instance id
  • kfdpHbeatB.ts.hi timestamp
  • kfdpHbeatB.rnd[0]  随机加盐

 

  •  External Redundancy一般有一个PST
  • Normal Redundancy至多有个3个PST
  • High Redundancy 至多有5个PST

 

如下场景中PST 可能被重定位:

  • 存有PST的ASM DISK不可用了(当ASM启东时)
  • ASM DISK OFFLINE了
  • 当对PST的读写发生了I/O错误
  • disk被正常DROP了

 

  •  在读取其他ASM metadata之前会先检查PST
  • 当ASM实例被要求mount diskgroup时,GMON进程会读取diskgroup中所有磁盘去找到和确认PST拷贝
  • 如果他发现有足够的PST,那么会mount diskgroup
  • 之后,PST会被缓存在ASM缓存中,以及GMON的PGA中并使用排他的PT.n.0锁保护
  • 同集群中的其他ASM实例也将缓存PST到GMON的PGA,并使用共享PT.n.o锁保护
  • 仅仅那个持有排他锁的GMON能更新磁盘上的PST信息
  • 每一个ASM DISK上的AUN=1均为PST保留,但只有几个磁盘上真的有PST数据

 

 

如果出现diskgroup 无法mount的错误,且alert.log中出现如下信息,则可能是丢失了必要数量的PST了:

 

Wed Jan 01 21:34:37 IST 2014
SQL> ALTER DISKGROUP ALL MOUNT 
Wed Jan 01 21:34:38 IST 2014
NOTE: cache registered group DATA number=1 incarn=0x1c58c060
Wed Jan 01 21:34:38 IST 2014
ERROR: no PST quorum in group 1: required 2, found 0 >>>>>>>>>>>>>>>>> HERE
Wed Jan 01 21:34:38 IST 2014
NOTE: cache dismounting group 1/0x1C58C060 (DATA) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATA was not mounted
Wed Jan 01 22:37:51 IST 2014

 

 

该PST quorum丢失的问题常由以下几个原因导致:

 

  1. ASM DISK丢失
  2. ASM DISK corrupted损坏
  3.  部分ASM DISK的AUN=1 PST部分损坏,或者被数据不完整
  4. 不当的ASM_DISKSTRING参数设置
  5. 不当的ASM DISK权限设置

 

对于该no PST quorum问题的常见对策:

  • 重建diskgroup
  • 手动修复PST(十分复杂)

 

 

必要的诊断数据收集如下:

 

1. The complete ASM alertfile (please not only a part). That will help me to understand the history.

2. Have you used multipath devices (e.g. using Linux Device Mapper) as base device for creating
   your asmlib disks ? Or have you just used single path devices ('sd' devices) for your asmlib disks ?

3. Which oracle user's do you have created in the affected environment ?
   Please show all oracle users together them with their groups.

4. Please upload a spoolfile with the output from the next commands commands:

 $> cat /etc/*release
 $> uname -a
 $> rpm -qa |grep oracleasm
 $> df -ha
 $> ls -l /dev/oracleasm/disks/*

 $> /etc/init.d/oracleasm status
 $> /usr/sbin/oracleasm-discover
 $> /usr/sbin/oracleasm-discover 'ORCL:*'

 $> /etc/init.d/oracleasm scandisks
 $> /etc/init.d/oracleasm listdisks
 $> /etc/init.d/oracleasm querydisk 

 $> ls -ltr $ORACLE_HOME/bin/oracle
 $> id -a oracle
 $> cat /etc/group

5. Upload the next files:

=> /var/log/messages
=> /etc/sysconfig/oracleasm
  /etc/sysconfig/oracleasm-_dev_oracleasm

6. Provide a spoolfile with the next 'kfod' output:

$GRID_HOME/bin/kfod asm_diskstring='ORCL:*' disks=all
$GRID_HOME/bin/kfod disks=all
$GRID_HOME/bin/kfod asm_diskstring='/dev/oracleasm/disks/*' disks=all

7. Provide anather spoolfile with the output of the next commands (to run as GRID user):

 $GRID_HOME/crsctl check crs
 $GRID_HOME/crsctl stat res -t
 $GRID_HOME/crsctl stat res -t -init

8. To see the ASM_discovery string both in the ASM instance spfile and in the CRS profile (gpnp)
provide the next informations:

a) Via ASMCMD:

    $ ./asmcmd dsget

b) Via GPNPTOOL:

   $ ./gpnptool get

c) $ cat $ORACLE_HOME/gpnp/*/profiles/peer/profile.xml

 


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *