一套Linux x86-64上的11.2.0.2数据库在startup启动阶段遭遇了ORA-27103:internal error内部错误,其出错日志如下:
SQL> startup nomount; ORA-27103: internal error Linux-x86_64 Error: 2: No such file or directory Additional information: 9404423 Additional information: 2 oerr 27103 Usage: oerr facility error Facility is identified by the prefix string in the error message. For example, if you get ORA-7300, "ora" is the facility and "7300" is the error. So you should type "oerr ora 7300". If you get LCD-111, type "oerr lcd 111", and so on. ================= alert.log ==================== This instance was first to mount 2011-05-02 21:49:47.009000 +08:00 Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_asmb_14386.trc: ORA-04031: unable to allocate 393240 bytes of shared memory ("large pool","unknown object","large pool","ASM map operations hashtable") ASMB (ospid: 14386): terminating the instance due to error 4031 System state dump requested by (instance=1, osid=14386 (ASMB)), summary=[abnormal instance termination]. System State dumped to trace file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_diag_14346.trc Dumping diagnostic data in directory=[cdmp_20110502214947], requested by (instance=1, osid=14386 (ASMB)), summary=[abnormal instance termination]. Instance terminated by ASMB, pid = 14386 =============================system state dump============================ PROCESS 24: ASMB ---------------------------------------- SO: 0x92c955c8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x92c955c8, name=process, file=ksu.h LINE:12451, pg=0 (process) Oracle pid:24, ser:1, calls cur/top: 0x9288c778/0x9288c778 flags : (0x6) SYSTEM flags2: (0x0), flags3: (0x0) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 2296 0 2 last post received-location: ksl2.h LINE:2293 ID:kslpsr last process to post me: 92c8e248 1 6 last post sent: 0 0 26 last post sent-location: ksa2.h LINE:282 ID:ksasnd last process posted by me: 92c8e248 1 6 (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: 0x92d24ae0 O/S info: user: oracle, term: UNKNOWN, ospid: 14386 OSD pid info: Unix process pid: 14386, image: [email protected] (ASMB) ---------------------------------------- SO: 0x92e80a58, type: 4, owner: 0x92c955c8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x92c955c8, name=session, file=ksu.h LINE:12459, pg=0 (session) sid: 13 ser: 1 trans: (nil), creator: 0x92c955c8 flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x408) -/- DID: , short-term DID: txn branch: (nil) oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS ksuxds FALSE at location: 0 service name: SYS$BACKGROUND Current Wait Stack: Not in wait; last wait ended 1.501399 sec ago Wait State: fixed_waits=0 flags=0x21 boundary=(nil)/-1 Session Wait History: elapsed time of 1.501469 sec since last wait 0: waited for 'SGA: allocation forcing component growth' =0x0, =0x0, =0x0 wait_id=14 seq_num=27 snap_id=7 wait times: snap=0.000000 sec, exc=0.305374 sec, total=0.305390 sec wait times: max=infinite wait counts: calls=6 os=6 occurred after 0.000000 sec of elapsed time 1: waited for 'SGA: allocation forcing component growth' =0x0, =0x0, =0x0 wait_id=20 seq_num=26 snap_id=1 wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000000 sec of elapsed time 2: waited for 'SGA: allocation forcing component growth' =0x0, =0x0, =0x0 wait_id=14 seq_num=25 snap_id=6
在没有阅读告警日志前我的第一反应可能是上次shutdown时Oracle进程没有被清理干净,导致shared memory segments一直没有释放,从而造成了以上ORA-27103错误。
不过其实这个问题告警日志里有明确的信息,即RDBMS Instance数据库实例在mount阶段asmb进程(负责db instance与asm instance的交互)试图从large pool大池中分配390k的空间,但遭遇了ORA-04031错误,如果asmb后台进程无法正常工作将直接导致db实例无法找到asm存储上的必要Extent,因此导致出出现了”ORA-27103: internal error:Linux-x86_64 Error: 2: No such file or directory”。
换而言之ORA-04031错误才是罪魁祸首,我们来是看该实例初始化的内存参数:
[oracle@rh2 dbs]$ strings spfilePROD1.ora |egrep "sga|memory|pool" PROD1.__large_pool_size=16777216 *.memory_target=943718400 *.shared_pool_size=314572800 *.streams_pool_size=0
因为是11g的实例所以采用了automatic memory management特性管理直接设置了memory_target参数为900M,并设置了1号实例的large pool最小为16M,900M的大小对于10g的实例而言仍是绰绰有余的,但是显然在11gr2中设置memory_target为900M是不足以驱动这样一个”庞然大物”的。我们需要配置更多的内存,亦或者可以通过设置更大的large pool来解决令人郁闷的ORA-04031错误:
[oracle@rh2 dbs]$ strings spfilePROD1.ora > initPROD1.ora [oracle@rh2 dbs]$ rm spfilePROD1.ora [oracle@rh2 dbs]$ vi initPROD1.ora /* 修改memory_target为至少912M */ *.memory_target=1200M
/* 成功启动! */
SQL> startup ; ORACLE instance started. Total System Global Area 1252663296 bytes Fixed Size 2226072 bytes Variable Size 687868008 bytes Database Buffers 553648128 bytes Redo Buffers 8921088 bytes Database mounted. Database opened. Private Interface 'eth1:1' configured from GPnP for use as a private interconnect. [name='eth1:1', type=1, ip=169.254.236.169, mac=94-0c-6d-71-8c-c2, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62] Public Interface 'eth0' configured from GPnP for use as a public interface. [name='eth0', type=1, ip=192.168.1.121, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1] Public Interface 'eth0:1' configured from GPnP for use as a public interface. [name='eth0:1', type=1, ip=192.168.1.133, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1] Public Interface 'eth0:2' configured from GPnP for use as a public interface. [name='eth0:2', type=1, ip=192.168.1.122, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1] Picked latch-free SCN scheme 3 2011-05-02 22:28:04.408000 +08:00 WARNING: db_recovery_file_dest is same as db_create_file_dest Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options. Using parameter settings in server-side pfile /s01/oracle/product/11.2.0/dbhome_1/dbs/initPROD1.ora System parameters with non-default values: processes = 150 shared_pool_size = 304M streams_pool_size = 0 memory_target = 1200M control_files = "+DATA/prod/controlfile/current.261.747100215" control_files = "+DATA/prod/controlfile/current.260.747100215" db_block_size = 8192 db_flash_cache_file = "/flashcard/prod1cache.dsk" db_flash_cache_size = 20G compatible = "11.2.0.0.0" log_archive_dest_1 = "location=+DATA" cluster_database = TRUE db_create_file_dest = "+DATA" db_recovery_file_dest = "+DATA" db_recovery_file_dest_size= 40320M thread = 1 undo_tablespace = "UNDOTBS1" instance_number = 1 db_domain = "" dispatchers = "(PROTOCOL=TCP) (SERVICE=PRODXDB)" remote_listener = "rh-cluster-scan:1521" remote_listener = "*.remote_login_pas" audit_file_dest = "/s01/orabase/admin/PROD/adump" audit_trail = "DB" db_name = "PROD" open_cursors = 300 diagnostic_dest = "/s01/orabase" Cluster communication is configured to use the following interface(s) for this instance 169.254.236.169 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2 2011-05-02 22:28:07.675000 +08:00 ORA-00132: syntax error or unresolved network name '*.remote_login_pas' PMON started with pid=2, OS id=19807 PSP0 started with pid=3, OS id=19809 2011-05-02 22:28:08.754000 +08:00 VKTM started with pid=4, OS id=19811 at elevated priority GEN0 started with pid=5, OS id=19815 VKTM running at (1)millisec precision with DBRM quantum (100)ms DIAG started with pid=6, OS id=19817 DBRM started with pid=7, OS id=19819 PING started with pid=8, OS id=19821 ACMS started with pid=9, OS id=19823 DIA0 started with pid=10, OS id=19825 LMON started with pid=11, OS id=19827 LMD0 started with pid=12, OS id=19829 LMS0 started with pid=13, OS id=19831 at elevated priority RMS0 started with pid=14, OS id=19835 LMHB started with pid=15, OS id=19837 MMAN started with pid=16, OS id=19839 * Load Monitor used for high load check * New Low - High Load Threshold Range = [1920 - 2560] LGWR started with pid=18, OS id=19843 DBW0 started with pid=17, OS id=19841 CKPT started with pid=19, OS id=19845 SMON started with pid=20, OS id=19847 RECO started with pid=21, OS id=19849 RBAL started with pid=22, OS id=19851 ASMB started with pid=23, OS id=19853 MMON started with pid=24, OS id=19855 starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'... MMNL started with pid=25, OS id=19857 starting up 1 shared server(s) ... lmon registered with NM - instance number 1 (internal mem no 0) 2011-05-02 22:28:09.825000 +08:00 NOTE: initiating MARK startup Starting background process MARK MARK started with pid=28, OS id=19866 NOTE: MARK has subscribed Reconfiguration started (old inc 0, new inc 2) List of instances: 1 (myinst: 1) Global Resource Directory frozen * allocate domain 0, invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Submitted all GCS remote-cache requests Post SMON to start 1st pass IR Fix write in gcs resources Reconfiguration complete LCK0 started with pid=30, OS id=19872 Starting background process RSMN RSMN started with pid=31, OS id=19874 ORACLE_BASE from environment = /s01/orabase 2011-05-02 22:28:12.112000 +08:00 ALTER DATABASE MOUNT This instance was first to mount 2011-05-02 22:28:13.202000 +08:00 NOTE: Loaded library: System ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.122)(PORT=1521))))' SCOPE=MEMORY SID='PROD1'; SUCCESS: diskgroup DATA was mounted NOTE: dependency between database PROD and diskgroup resource ora.DATA.dg is established /* 也可以直接增大large_pool_size来解决上述问题 */ large_pool_size=30M memory_target=912M
Leave a Reply