绝大多数教科书或网络教材都告诉初学者, Oracle中丢失Controlfile控制文件会造成不可挽回的损失, Oracle Instance实例会立即崩溃crash。
但是实际很多同学在实际操作时发现,在Unix/Linux上删除所有的Controlfile控制文件 后 实例并不会在第一时间crash, 相反居然还可以顺利完成FULL checkpoint 。
这是为什么呢?
注意这个问题仅适用于Unix/Linux, 在Windows平台上不允许删除 已经被其他进程打开的文件。
究其根本原因是在Linux/Unix上Read 或 Write一个文件,进程都会打开一个与此文件相关联的 打开文件描述符 Open File Descriptors (a file descriptor (FD) is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems.In POSIX, a file descriptor is an integer, specifically of the C type int.)
在C语言里 这样的一个File Descriptor以整形的形式出现, 举例来说:
进程A 在运行过程中 用 open 函数打开了一个文件/etc/hosts:
int open("/etc/hosts",O_RDONLY); 此时若open成功则会返回一个 文件描述符, 例如是该进程打开的第一个文件 那么 就返回1 若出错则返回 -1 之后进程A 就可以利用如 open ,write 这样的读写函数 要求对 文件描述符 1相关的实际文件进行读写 当进程不再需要读写文件或 进程退出时 一般使用close函数关闭open File Descriptor close(1); -- 例如上面获得的 描述符是 1 ,这里代入1就可以关闭本进程对/etc/hosts的FD
而在Linux/Unix上当一个文件被删除时,它只是简单地被”unlink”即删除一个硬链接hard link; 实际包含 文件数据的inode 在所有与该文件关联的 打开文件描述符(Open File Descriptors)被关闭之前 都不会被实际删除, 已经获得 打开文件描述符的进程 只要不用close函数关闭掉这些描述符, 都可以继续正常地对文件进行读写。 当所有该文件 管理的 打开文件符都被 关闭时 文件才真正意义上被删除。
“When a file is deleted in linux, it is simply ‘unlinked’. The inode, which contains the file’s data, is not deleted until all processes have finished with it. This is why processes can carry on writing to deleted files.
In Unix, when you remove a file, the file isn’t actually deleted until all open file descriptors to it are closed.”
我们来实际还原一下 这个问题现场:
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi PL/SQL Release 10.2.0.1.0 - Production CORE 10.2.0.1.0 Production TNS for Linux: Version 10.2.0.1.0 - Production NLSRTL Version 10.2.0.1.0 - Production SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- www.askmac.cn SQL> show parameter control_files NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ control_files string /s01/oradata/G10R21/controlfil e/o1_mf_7ch80zvk_.ctl SQL> select open_mode from v$database; OPEN_MODE ---------- READ WRITE
以上环境版本为10.2.0.1,只使用一个控制文件, 数据库已打开到READ WRITE MODE。
我们找到 Oracle中持续对Controlfile 每3 s做一次 heartbeat心跳的CKPT checkpoint进程的 SPID ,并dump其进程信息:
[oracle@vrh8 ~]$ ps -ef|grep ckpt|grep -v grep oracle 3617 1 0 21:35 ? 00:00:00 ora_ckpt_G10R21 SQL> oradebug setospid 3617; Oracle pid: 7, Unix process pid: 3617, image: [email protected] (CKPT) SQL> oradebug dump errorstack 4; Statement processed. SQL> oradebug tracefile_name; /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Data Mining options
检查ckpt进程的errorstack dump可以发现一些有趣的内容
[oracle@vrh8 ~]$ vi /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc Process global information: process: 0x84e58dd0, call: 0x84f96b18, xact: (nil), curses: 0x84f785b0, usrses: 0x84f785b0 ---------------------------------------- SO: 0x84e58dd0, type: 2, owner: (nil), flag: INIT/-/-/0x00 (process) Oracle pid=7, calls cur/top: 0x84f96b18/0x84f96b18, flag: (6) SYSTEM int error: 0, call error: 0, sess error: 0, txn error 0 (post info) last post received: 0 0 24 last post received-location: ksasnd last process to post me: 84e5cd10 5 0 last post sent: 0 0 25 last post sent-location: ksasnr last process posted by me: 84e5cd10 5 0 (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: 0x84ea05f8 O/S info: user: oracle, term: UNKNOWN, ospid: 3617 OSD pid info: Unix process pid: 3617, image: [email protected] (CKPT) (FOB) flags=2 fib=0x83995e10 incno=0 pending i/o cnt=0 fname=/s01/oradata/G10R21/datafile/o1_mf_system_7ch8004t_.dbf fno=1 lblksz=4 fsiz=86309888 ---------------------------------------- SO: 0x83995a70, type: 6, owner: 0x84e58dd0, flag: INIT/-/-/0x00 (FIB) flags=0 reference cnt=1 incno=0 seqno=1 fname=/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl fno=0 lblksz=4 fsiz=86309888 (FOB) flags=393226 fib=0x83995a70 incno=0 pending i/o cnt=0 fname=/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl fno=0 lblksz=6 fsiz=86309888
可以发现这里存在一个FOB 对象其 fname=/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl ==>指向 控制文件
在Linux下我们可以通过 proc文件系统了解CKPT进程当前所有 已打开的文件描述符(FD):
[oracle@vrh8 ~]$ cd /proc/3617/fd
[oracle@vrh8 fd]$ ls -l
total 0
lr-x------ 1 oracle oinstall 64 Mar 3 21:44 0 -> /dev/null
lr-x------ 1 oracle oinstall 64 Mar 3 21:44 1 -> /dev/null
l-wx------ 1 oracle oinstall 64 Mar 3 21:44 10 -> /s01/admin/G10R21/bdump/alert_G10R21.log
lr-x------ 1 oracle oinstall 64 Mar 3 21:44 11 -> /dev/zero
lr-x------ 1 oracle oinstall 64 Mar 3 21:44 12 -> /dev/zero
lr-x------ 1 oracle oinstall 64 Mar 3 21:44 13 -> /s01/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb
lrwx------ 1 oracle oinstall 64 Mar 3 21:44 14 -> /s01/oracle/product/10.2.0/db_1/dbs/hc_G10R21.dat
lrwx------ 1 oracle oinstall 64 Mar 3 21:44 15 -> /s01/oracle/product/10.2.0/db_1/dbs/lkG10R21
lrwx------ 1 oracle oinstall 64 Mar 3 21:44 16 -> /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl
lr-x------ 1 oracle oinstall 64 Mar 3 21:44 17 -> /s01/oracle/product/10.2.0/db_1/rdbms/mesg/oraus.msb
lrwx------ 1 oracle oinstall 64 Mar 3 21:44 18 -> /s01/oradata/G10R21/datafile/o1_mf_system_7ch8004t_.dbf
l-wx------ 1 oracle oinstall 64 Mar 3 21:44 2 -> /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc
lr-x------ 1 oracle oinstall 64 Mar 3 21:44 3 -> /dev/null
lr-x------ 1 oracle oinstall 64 Mar 3 21:44 4 -> /dev/null
l-wx------ 1 oracle oinstall 64 Mar 3 21:44 5 -> /s01/admin/G10R21/udump/g10r21_ora_3605.trc
lrwx------ 1 oracle oinstall 64 Mar 3 21:44 6 -> /s01/oracle/product/10.2.0/db_1/dbs/lkinstG10R21 (deleted)
lrwx------ 1 oracle oinstall 64 Mar 3 21:44 7 -> /s01/admin/G10R21/adump/ora_3605.aud
l-wx------ 1 oracle oinstall 64 Mar 3 21:44 8 -> /s01/admin/G10R21/bdump/alert_G10R21.log
lrwx------ 1 oracle oinstall 64 Mar 3 21:44 9 -> /s01/oracle/product/10.2.0/db_1/dbs/hc_G10R21.dat
可以看到 o1_mf_7ch80zvk_.ctl 这个控制文件的 FD number是 16
我们开始利用strace 追踪 CKPT的 SYSTEM CALL , 并尝试删除唯一的一个控制文件
[root@vrh8 yum.repos.d]# strace -o ckpt.log -p 3617 Process 3617 attached - interrupt to quit [oracle@vrh8 fd]$ rm /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl oracle@vrh8 fd]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.1.0 - Production on Sat Mar 3 21:50:29 2012 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Data Mining options SQL> alter system checkpoint; System altered. SQL> / System altered.
以上可以看到”ALTER SYSTEM CHECKPOINT” 正常完成 ,说明此时做FULL CHECKPOINT完全检查点也没问题。
此时用 lsof 命令可以 观察到有那几个进程已打开了o1_mf_7ch80zvk_.ctl这个控制文件这个控制文件的 文件描述符:
[root@vrh8 ~]# lsof|grep o1_mf_7ch80zvk_.ctl oracle 3613 oracle 16u REG 253,0 12402688 6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted) oracle 3615 oracle 16u REG 253,0 12402688 6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted) oracle 3617 oracle 16uW REG 253,0 12402688 6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted) oracle 3625 oracle 19u REG 253,0 12402688 6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted) oracle 3644 oracle 19u REG 253,0 12402688 6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted) oracle 3646 oracle 19u REG 253,0 12402688 6547239 /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted) [oracle@vrh8 ~]$ cd /proc/3617/fd [oracle@vrh8 fd]$ ls -l 16 lrwx------ 1 oracle oinstall 64 Mar 3 21:44 16 -> /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted) 可以看到(deleted)的标记,说明该文件已经被unlinked,但是其数据仍完好
此时我们甚至还可以通过DD 命令将 已被”unlinked”的 控制文件的内容复制出来, 这也是很多 意外删除场景中 实例未crash或重启时一种常用的恢复手段, 很多朋友可能觉得这个方法不可思议, 其实只要了解了原理就可以明白它的局限性。
[oracle@vrh8 ~]$ cd /proc/3617/fd [oracle@vrh8 fd]$ ls -l 16 lrwx------ 1 oracle oinstall 64 Mar 3 21:44 16 -> /s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl (deleted) [oracle@vrh8 fd]$ dd if=16 of=/tmp/controlfile 24224+0 records in 24224+0 records out 12402688 bytes (12 MB) copied, 0.0592615 seconds, 209 MB/s [oracle@vrh8 fd]$ file /tmp/controlfile /tmp/controlfile: data
我们通过之前一直在做的strace log 可以看到 CKPT仍正常地针对控制文件Controlfiles做 3s一次的heartbeat:
gettimeofday({1330830450, 587535}, NULL) = 0 times(NULL) = 429642464 pwrite(16, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4\211\273\0\0\2\0\0\0\0\0\0\0\f\0\0\0"..., 16384, 49152) = 16384 times(NULL) = 429642464 gettimeofday({1330830450, 588689}, NULL) = 0 gettimeofday({1330830450, 588905}, NULL) = 0
这里肯定有同学要问了: ” 那么到底何时Oracle的后台进程如CKPT才会释放OPEN FD 打开文件描述符 呢?”
答案是:” 视乎后台进程的实际需求,仅在需要时Close Open Descriptor; 实际上close Open Descriptor的机会并不多,甚至于在我们的测试里半个小时内都没有出现close , CKPT运行良好”。
实际上我们可以手动促成 后台进程释放指定的一个FD, 注意不要在产品数据库环境中做以下操作, 这仅仅是为了研究:
之前我们已经知道了在CKPT进程中 o1_mf_7ch80zvk_.ctl 控制文件的FD number是 16 , 而CKPT的SPID操作系统进程号是3617;
SQL> oradebug setospid 3617; Oracle pid: 7, Unix process pid: 3617, image: [email protected] (CKPT) SQL> oradebug call close 16; Function returned 0
以上使用oradebug call在CKPT进程中强制调用close(16),将关闭16 号文件描述符; 关闭后CKPT无法再使用原有的 16 FD,也没有办法再open已经被unlinked的 控制文件, 可以在 strace 日志和 ckpt trace 及 alert告警日志中看到 一连串反应:
ckpt strace :
close(16) = 0
gettimeofday({1330830800, 213316}, NULL) = 0
times(NULL) = 429677421
pwrite(16, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4L\273\0\0\2\0\0\0\0\0\0\0\v\0\0\0"..., 16384, 49152) = -1 EBADF (Bad file descriptor)
gettimeofday({1330830800, 213988}, NULL) = 0
==========================================================================================
之后ckpt 会终止实例 , kill掉其他后台进程
times(NULL) = 429677424
open("/proc/3670/stat", O_RDONLY) = 16
read(16, "3670 (oracle) S 1 3670 3670 0 -1"..., 999) = 239
close(16) = 0
kill(3670, SIGKILL) = 0
kill(3670, SIGCONT) = -1 ESRCH (No such process)
open("/proc/3673/stat", O_RDONLY) = 16
read(16, "3673 (oracle) S 1 3673 3673 0 -1"..., 999) = 239
close(16) = 0
kill(3673, SIGKILL) = 0
kill(3673, SIGCONT) = -1 ESRCH (No such process)
open("/proc/3650/stat", O_RDONLY) = 16
read(16, "3650 (oracle) S 1 3650 3650 0 -1"..., 999) = 239
..................................................................
open("/s01/admin/G10R21/bdump/alert_G10R21.log", O_WRONLY|O_CREAT|O_APPEND, 0664) = 8
writev(8, [{"Instance terminated by CKPT, pid"..., 39}, {"\n", 1}], 2) = 40
semctl(262145, 0, IPC_RMID, 0) = 0
close(17) = 0
close(13) = 0
shmctl(2129924, IPC_RMID, 0) = 0
shmdt(0x60000000) = 0
open("/proc/3617/stat", O_RDONLY) = 13
read(13, "3617 (oracle) R 1 3617 3617 0 -1"..., 999) = 224
close(13) = 0
kill(3617, SIGKILL
ckpt trace:
*** 2012-03-03 22:13:17.711
Received ORADEBUG command 'call close 16' from process Unix process pid: 3914, image:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl'
ORA-27072: File I/O error
Linux-x86_64 Error: 9: Bad file descriptor
Additional information: 4
Additional information: 3
Additional information: -1
error 221 detected in background process
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl'
ORA-27072: File I/O error
Linux-x86_64 Error: 9: Bad file descriptor
Additional information: 4
Additional information: 3
Additional information: -1
alert log:
Errors in file /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl'
ORA-27072: File I/O error
Linux-x86_64 Error: 9: Bad file descriptor
Additional information: 4
Additional information: 3
Additional information: -1
Sat Mar 3 22:13:20 2012
Errors in file /s01/admin/G10R21/bdump/g10r21_ckpt_3617.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '/s01/oradata/G10R21/controlfile/o1_mf_7ch80zvk_.ctl'
ORA-27072: File I/O error
Linux-x86_64 Error: 9: Bad file descriptor
Additional information: 4
Additional information: 3
Additional information: -1
Sat Mar 3 22:13:20 2012
CKPT: terminating instance due to error 221
Sat Mar 3 22:13:20 2012
Errors in file /s01/admin/G10R21/bdump/g10r21_psp0_3609.trc:
ORA-00221: error on write to control file
Sat Mar 3 22:13:20 2012
Errors in file /s01/admin/G10R21/bdump/g10r21_mman_3611.trc:
ORA-00221: error on write to control file
Instance terminated by CKPT, pid = 3617
这个例子告诉我们 oracle数据库实例虽然很复杂, 有时候甚至”神奇” , 但是在Unix上它本质上和其他简单如shell的程序一样是作为用户向OS 申请资源,处理IO的。 要充分理解Oracle为什么这样设计,在运行时又为什么那样处理, 那么肯定要对Unix环境的原理有所了解。 包括: 文件I/O 、进程控制和环境、信号、进程间通信等等知识。 我本人推荐在学习Oracle之余 可以对Unix上的C语言编程有所涉猎, 推荐大家读一读 Richard Stevens的《Unix环境高级编程(第2版)》,这本书教会了我 Unix上的思考方式。
Leave a Reply