http://www.dbaleet.org/exadata_performance_monitoring_swiss_army_knife_cellsrvstat/
如果需要查找Exadata cell(存储节点)的offloading/smart scan/storage index的信息,通常我们可以在数据库端通过过滤查找v$sql, v$sysstat之类的动态性能视图得到,有没有更简单的方法呢?
从某一个版本开始在每个Exadata存储节点中加入了一个叫做cellsrvstat的小工具,这个工具收集针对当前cell节点进行收集,并且收集的信息非常全面,堪称Exadata上的“上古神器”。
[root@slca04cel01 ~]# cellsrvstat ===Current Time=== Fri Aug 23 08:12:19 2013 == Input/Output related stats == Number of hard disk block IO read requests 0 1823855 Number of hard disk block IO write requests 0 849658 Hard disk block IO reads (KB) 0 1390317 Hard disk block IO writes (KB) 0 424990 Number of flash disk block IO read requests 0 0 Number of flash disk block IO write requests 0 0 Flash disk block IO reads (KB) 0 0 Flash disk block IO writes (KB) 0 0 Number of disk IO errors 0 0 Number of reads from flash cache 0 0 Number of writes to flash cache 0 0 Flash cache reads (KB) 0 0 Flash cache writes (KB) 0 0 Number of flash cache IO errors 0 0 Size of eviction from flash cache (KB) 0 0 Number of outstanding large flash IOs 0 0 Number of latency threshold warnings during job 0 33 Number of latency threshold warnings by checker 0 0 Number of latency threshold warnings for smart IO 0 0 Number of latency threshold warnings for redo log writes 0 0 Current read block IO to be issued (KB) 0 0 Total read block IO to be issued (KB) 0 1446974 Current write block IO to be issued (KB) 0 0 Total write block IO to be issued (KB) 0 424990 Current read blocks in IO (KB) 0 0 Total read block IO issued (KB) 0 1446974 Current write blocks in IO (KB) 0 0 Total write block IO issued (KB) 0 424990 Current read block IO in network send (KB) 0 0 Total read block IO in network send (KB) 0 1446974 Current write block IO in network send (KB) 0 0 Total write block IO in network send (KB) 0 424990 Current block IO being populated in flash (KB) 0 0 Total block IO KB populated in flash (KB) 0 0 == Memory related stats == SGA heap used - kgh statistics (KB) 0 438098 SGA heap free - cellsrv statistics (KB) 0 20655 OS memory allocated to SGA (KB) 0 458754 SGA heap used - cellsrv statistics - KB 0 438099 OS memory allocated to PGA (KB) 0 898 PGA heap used - cellsrv statistics (KB) 0 376 OS memory allocated to cellsrv (KB) 0 5754818 Top 5 SGA consumers (KB) storidx::arraySeqRIDX 0 88719 SUBHEAP Networ 0 81937 storidx:arrayRIDX 0 73816 Thread IO Lat Stats 0 35158 RemoteSendPort Fixed Size 0 33935 Top 5 SGA subheap consumers (KB) Network mem 0 81925 Network heap chunk 0 2462 Number of allocation failures in 512 bytes pool 0 0 Number of allocation failures in 2KB pool 0 0 Number of allocation failures in 4KB pool 0 0 Number of allocation failures in 8KB pool 0 0 Number of allocation failures in 16KB pool 0 0 Number of allocation failures in 32KB pool 0 0 Number of allocation failures in 64KB pool 0 0 Number of allocation failures in 1MB pool 0 0 Allocation hwm in 512 bytes pool 0 620 Allocation hwm in 2KB pool 0 602 Allocation hwm in 4KB pool 0 620 Allocation hwm in 8KB pool 0 1002 Allocation hwm in 16KB pool 0 602 Allocation hwm in 32KB pool 0 601 Allocation hwm in 64KB pool 0 601 Allocation hwm in 1MB pool 0 55 Number of low memory threshold failures 0 0 Number of no memory threshold failures 0 0 Dynamic buffer allocation requests 0 0 Dynamic buffer allocation failures 0 0 Dynamic buffer allocation failures due to low mem 0 0 Dynamic buffer allocated size (KB) 0 0 Dynamic buffer allocation hwm (KB) 0 0 == Execution related stats == Incarnation number 0 5 Number of module version failures 0 0 Number of threads working 0 1 Number of threads waiting for network 0 19 Number of threads waiting for resource 0 0 Number of threads waiting for a mutex 0 0 Number of Jobs executed for each job type CacheGet 0 1838056 CachePut 0 849658 CloseDisk 0 711757 OpenDisk 0 712141 ProcessIoctl 0 14062328 PredicateDiskRead 0 0 PredicateDiskWrite 0 0 PredicateFilter 0 0 PredicateCacheGet 0 0 PredicateCachePut 0 0 FlashCacheMetadataWrite 0 0 RemoteListenerJob 0 0 FlashCacheResilveringTableUpdate 0 0 CellDiskMetadataPrepare 0 0 SQL ids consuming the most CPU other 0000000000000 2 END SQL ids consuming the most CPU == Network related stats == Total bytes received from the network 0 804684378 Total bytes transmitted to the network 0 7721296 Total bytes retransmitted to the network 0 0 Number of active sendports 0 7 Hwm of active sendports 0 15 Number of active remote open infos 0 6 HWM of remote open infos 0 65 == SmartIO related stats == Number of active smart IO sessions 0 0 High water mark of smart IO sessions 0 0 Number of completed smart IO sessions 0 0 Smart IO offload efficiency (percentage) 0 0 Size of IO avoided due to storage index (KB) 0 0 Current smart IO to be issued (KB) 0 0 Total smart IO to be issued (KB) 0 0 Current smart IO in IO (KB) 0 0 Total smart IO in IO (KB) 0 0 Current smart IO being cached in flash (KB) 0 0 Total smart IO being cached in flash (KB) 0 0 Current smart IO with IO completed (KB) 0 0 Total smart IO with IO completed (KB) 0 0 Current smart IO being filtered (KB) 0 0 Total smart IO being filtered (KB) 0 0 Current smart IO filtering completed (KB) 0 0 Total smart IO filtering completed (KB) 0 0 Current smart IO filtered size (KB) 0 0 Total smart IO filtered (KB) 0 0 Total cpu passthru output IO size (KB) 0 0 Total passthru output IO size (KB) 0 0 Current smart IO with results in send (KB) 0 0 Total smart IO with results in send (KB) 0 0 Current smart IO filtered in send (KB) 0 0 Total smart IO filtered in send (KB) 0 0 Total smart IO read from flash (KB) 0 0 Total smart IO initiated flash population (KB) 0 0 Total smart IO read from hard disk (KB) 0 0 Total smart IO writes (fcre) to hard disk (KB) 0 0 Number of smart IO requests < 512KB 0 0 Number of smart IO requests >= 512KB and < 1MB 0 0 Number of smart IO requests >= 1MB and < 2MB 0 0 Number of smart IO requests >= 2MB and < 4MB 0 0 Number of smart IO requests >= 4MB and < 8MB 0 0 Number of smart IO requests >= 8MB 0 0 Number of times smart IO buffer reserve failures 0 0 Number of times smart IO request misses 0 0 Number of times IO for smart IO not allowed to be issued 0 0 Number of times smart IO prefetch limit was reached 0 0 Number of times smart scan used unoptimized mode 0 0 Number of times smart fcre used unoptimized mode 0 0 Number of times smart backup used unoptimized mode 0 0
可以看到cellsrvstat收集这么几类信息:
- I/O相关的统计信息;
- 内存相关的统计信息;
- 执行相关的统计信息;
- 网络相关的统计信息;
- smart I/O相关的统计信息。
单纯运行cellsrv显示的是当前值。 我们可以通过加上-list参数来查询共有哪些metrics:
[root@dm01cel01 ~]# cellsrvstat -list Statistic Groups: io Input/Output related stats mem Memory related stats exec Execution related stats net Network related stats smartio SmartIO related stats Statistics: [ * - Absolute values. Indicates no delta computation in tabular format] io_nbiorr_hdd Number of hard disk block IO read requests io_nbiowr_hdd Number of hard disk block IO write requests io_nbiorb_hdd Hard disk block IO reads (KB) io_nbiowb_hdd Hard disk block IO writes (KB) io_nbiorr_flash Number of flash disk block IO read requests io_nbiowr_flash Number of flash disk block IO write requests io_nbiorb_flash Flash disk block IO reads (KB) io_nbiowb_flash Flash disk block IO writes (KB) io_ndioerr Number of disk IO errors io_nrfc Number of reads from flash cache io_nwfc Number of writes to flash cache io_fcrb Flash cache reads (KB) io_fcwb Flash cache writes (KB) io_nfioerr Number of flash cache IO errors io_nbpfce Size of eviction from flash cache (KB) io_nolfio Number of outstanding large flash IOs io_ltow Number of latency threshold warnings during job io_ltcw Number of latency threshold warnings by checker io_ltsiow Number of latency threshold warnings for smart IO io_ltrlw Number of latency threshold warnings for redo log writes io_bcrti Current read block IO to be issued (KB) * io_btrti Total read block IO to be issued (KB) io_bcwti Current write block IO to be issued (KB) * io_btwti Total write block IO to be issued (KB) io_bcrii Current read blocks in IO (KB) * io_btrii Total read block IO issued (KB) io_bcwii Current write blocks in IO (KB) * io_btwii Total write block IO issued (KB) io_bcrsi Current read block IO in network send (KB) * io_btrsi Total read block IO in network send (KB) io_bcwsi Current write block IO in network send (KB) * io_btwsi Total write block IO in network send (KB) io_bcfp Current block IO being populated in flash (KB) * io_btfp Total block IO KB populated in flash (KB) mem_sgahu SGA heap used - kgh statistics (KB) mem_sgahf SGA heap free - cellsrv statistics (KB) mem_sgaos OS memory allocated to SGA (KB) mem_sgahuc SGA heap used - cellsrv statistics - KB mem_pgaos OS memory allocated to PGA (KB) mem_pgahuc PGA heap used - cellsrv statistics (KB) mem_allos OS memory allocated to cellsrv (KB) mem_sgatop Top 5 SGA consumers (KB) * mem_sgasubtop Top 5 SGA subheap consumers (KB) * mem_halfkaf Number of allocation failures in 512 bytes pool mem_2kaf Number of allocation failures in 2KB pool mem_4kaf Number of allocation failures in 4KB pool mem_8kaf Number of allocation failures in 8KB pool mem_16kaf Number of allocation failures in 16KB pool mem_32kaf Number of allocation failures in 32KB pool mem_64kaf Number of allocation failures in 64KB pool mem_1maf Number of allocation failures in 1MB pool mem_halfkhwm Allocation hwm in 512 bytes pool mem_2khwm Allocation hwm in 2KB pool mem_4khwm Allocation hwm in 4KB pool mem_8khwm Allocation hwm in 8KB pool mem_16khwm Allocation hwm in 16KB pool mem_32khwm Allocation hwm in 32KB pool mem_64khwm Allocation hwm in 64KB pool mem_1mhwm Allocation hwm in 1MB pool mem_lmtf Number of low memory threshold failures mem_nmtf Number of no memory threshold failures mem_dynar Dynamic buffer allocation requests mem_dynaf Dynamic buffer allocation failures mem_dynafl Dynamic buffer allocation failures due to low mem mem_dynam Dynamic buffer allocated size (KB) mem_dynamh Dynamic buffer allocation hwm (KB) exec_incno Incarnation number * exec_versf Number of module version failures * exec_ntwork Number of threads working * exec_ntnetwait Number of threads waiting for network * exec_ntreswait Number of threads waiting for resource * exec_ntmutexwait Number of threads waiting for a mutex * exec_njx Number of Jobs executed for each job type exec_topcpusqlid SQL ids consuming the most CPU net_rxb Total bytes received from the network net_txb Total bytes transmitted to the network net_rtxb Total bytes retransmitted to the network net_sps Number of active sendports net_sph Hwm of active sendports net_rois Number of active remote open infos net_roih HWM of remote open infos sio_ns Number of active smart IO sessions * sio_hs High water mark of smart IO sessions * sio_ncs Number of completed smart IO sessions sio_oe Smart IO offload efficiency (percentage) * sio_sis Size of IO avoided due to storage index (KB) sio_ctb Current smart IO to be issued (KB) * sio_ttb Total smart IO to be issued (KB) sio_cii Current smart IO in IO (KB) * sio_tii Total smart IO in IO (KB) sio_cfp Current smart IO being cached in flash (KB) * sio_tfp Total smart IO being cached in flash (KB) sio_cic Current smart IO with IO completed (KB) * sio_tic Total smart IO with IO completed (KB) sio_cif Current smart IO being filtered (KB) * sio_tif Total smart IO being filtered (KB) sio_cfc Current smart IO filtering completed (KB) * sio_tfc Total smart IO filtering completed (KB) sio_cfo Current smart IO filtered size (KB) * sio_tfo Total smart IO filtered (KB) sio_tcpo Total cpu passthru output IO size (KB) sio_tpo Total passthru output IO size (KB) sio_cis Current smart IO with results in send (KB) * sio_tis Total smart IO with results in send (KB) sio_ciso Current smart IO filtered in send (KB) * sio_tiso Total smart IO filtered in send (KB) sio_fcr Total smart IO read from flash (KB) sio_fcw Total smart IO initiated flash population (KB) sio_hdr Total smart IO read from hard disk (KB) sio_hdw Total smart IO writes (fcre) to hard disk (KB) sio_n512kb Number of smart IO requests < 512KB sio_n1mb Number of smart IO requests >= 512KB and < 1MB sio_n2mb Number of smart IO requests >= 1MB and < 2MB sio_n4mb Number of smart IO requests >= 2MB and < 4MB sio_n8mb Number of smart IO requests >= 4MB and < 8MB sio_ngt8mb Number of smart IO requests >= 8MB sio_nbrf Number of times smart IO buffer reserve failures sio_nrm Number of times smart IO request misses sio_ncio Number of times IO for smart IO not allowed to be issued sio_nplr Number of times smart IO prefetch limit was reached sio_nssuo Number of times smart scan used unoptimized mode sio_nfcuo Number of times smart fcre used unoptimized mode sio_nsbuo Number of times smart backup used unoptimized mode
我们可以通过加上-h来查看其帮助选项:
[root@dm01cel01 ~]# cellsrvstat -h Usage: cellsrvstat [-stat_group=<group name>,<group name>,] [-stat=<stat name>,<stat name>,] [-interval=<interval>] [-count=<count>] [-table] [-short] [-list] stat A comma separated list of short strings representing the stats. Default is all. (unless - stat_group is specified. The -list option displays all stats. Example: -stat=io_nbiorr_hdd,io_nbiowr_hdd stat_group A comma separated list of short strings representing groups of stats. Default: all (unless -stat is specified). Currently valid options are: io, mem, exec, net. Example: -stat_group=io,mem interval At what interval the stats should be obtained and printed (in seconds). Default is 1 second. count How many times the stats should be printed. Default is once. list List all metric abbreviations and their descriptions. All other options are ignored. table Use a tabular format for output. This option will be ignored if all metrics specified are not integer based metrics. short Use abbreviated metric name instead of descriptive ones. error_out An output file to print error messages to, mostly for debugging. In non-tabular mode, The output has three columns. The first column is the name of the metric, the second one is the difference between the last and the current value(delta), and the third column is the absolute value. In Tabular mode absolute values are printed as is without delta. cellsrvstat -list command points out the statistics that are absolute values
-stat_group=后面接统计信息的组名,例如上面提到的io, mem, exec, net。
-stat=后面接根据-list参数查找出来的统计信息的名称,例如io_nbiorr_hdd,io_nbiowr_hdd。
-interval=后面接统计信息采样的间隔
-count=后面接统计信息采样的次数
-table 表示使用统计信息简写的方式代替真实的名称 。
举一个例子:例如我们需要收集sio_ttb 和 sio_tii 两项信息,采样的频率为一秒一次,一共采样十次:
[root@dm01cel01 ~]# cellsrvsta -table -interval=1 -count=10 -stat=sio_ttb,sio_tii ===Current Time=== sio_ttb sio_tii Fri Aug 23 08:29:46 2013 0 0 Fri Aug 23 08:29:47 2013 0 0 Fri Aug 23 08:29:48 2013 0 0 Fri Aug 23 08:29:49 2013 0 0 Fri Aug 23 08:29:50 2013 0 0 Fri Aug 23 08:29:51 2013 0 0 Fri Aug 23 08:29:52 2013 0 0 Fri Aug 23 08:29:53 2013 0 0 Fri Aug 23 08:29:54 2013 0 0 Fri Aug 23 08:29:55 2013 0 0
去掉-table选项则输出完整的信息:
[root@dm01cel01 ~]# cellsrvstat -interval=1 -count=10 -stat=sio_ttb,sio_tii ===Current Time=== Fri Aug 23 08:30:25 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:26 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:27 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:28 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:29 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:30 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:31 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:32 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:33 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0 ===Current Time=== Fri Aug 23 08:30:34 2013 == SmartIO related stats == Total smart IO to be issued (KB) 0 0 Total smart IO in IO (KB) 0 0
实际在oswatcher中会默认调用这个脚本:
[root@dm01cel01 ~]# ps -ef | grep osw root 5219 17360 0 08:38 pts/0 00:00:00 grep osw root 12914 23131 0 08:00 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_cellsrvstat.sh root 31625 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_vmstat.sh root 31626 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_mpstat.sh root 31627 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_netstat.sh root 31628 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_iostat.sh root 31629 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_diskstats.sh root 31633 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq ./Exadata_top.sh root 31643 23131 0 04:02 ? 00:00:00 /bin/ksh ./oswsub.sh HighFreq /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh root 31656 31643 0 04:02 ? 00:00:03 /bin/bash /opt/oracle.oswatcher/osw/ExadataRdsInfo.sh HighFreq
[root@slca04cel01 osw]# cat /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.sh #!/bin/bash # Copyright (c) 2009, 2011, Oracle and/or its affiliates. All rights reserved. out_file= zip_prog= declare -i self_count=1 declare -i sample_interval=1 declare -i sample_duration=3 declare -i sample_count=1 /bin/touch /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.lock echo $$ > /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.lock while [ -e /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.lock ]; do if [ -f "archive/oswcellsrvstat/$1" ]; then if [ ! -z "$out_file" ] && [ ! -z "$zip_prog" ]; then $zip_prog $out_file & fi out_file=`/bin/cat archive/oswcellsrvstat/$1 | /bin/cut -d ' ' -f 1` if [ $? -ne 0 ]; then /bin/echo "[ERROR] archive/oswcellsrvstat/$1 not found or it is empty" exit 1 fi zip_prog=`/bin/cat archive/oswcellsrvstat/$1 | /bin/cut -d ' ' -f 2` if [ $? -ne 0 ]; then /bin/echo "[ERROR] archive/oswcellsrvstat/$1 not found or it is empty" exit 1 fi sample_interval=`/bin/cat archive/oswcellsrvstat/$1 | /bin/cut -d ' ' -f 3` if [ $? -ne 0 ]; then /bin/echo "[ERROR] archive/oswcellsrvstat/$1 not found or it is empty" exit 1 fi sample_duration=`/bin/cat archive/oswcellsrvstat/$1 | /bin/cut -d ' ' -f 4` if [ $? -ne 0 ]; then /bin/echo "[ERROR] archive/oswcellsrvstat/$1 not found or it is empty" exit 1 fi /bin/rm -f "archive/oswcellsrvstat/$1" else break fi if [ ! -z "$out_file" ]; then if [ $sample_interval -gt 0 ] && [ $sample_duration -gt 0 ] && [ $sample_duration -gt $sample_interval ]; then sample_count=$((sample_duration / sample_interval)) /bin/echo "zzz ***"`date`" Sample interval: $sample_interval secconds" >> ${out_file} $OSS_BIN/cellsrvstat -interval=$sample_interval -count=$sample_count >> ${out_file} bzip2 ${out_file} /bin/rm -f ${out_file} else /bin/echo "[ERROR] Invalid arguments for sample_duration and sample_interval" break fi fi done /bin/rm -f /opt/oracle.oswatcher/osw/Exadata_cellsrvstat.lock exit 0
Leave a Reply