One of the main new features associated with Exadata systems is that I/O can be offloaded to storage servers rather than be done on the database server. Each of the storage servers may get a piece of the SQL statement to operate on, so the processing is also parallelized at the same time. This saves valuable database server processing cycles for other non-I/O related activities and can dramatically reduce response times. Smart Scan is another term that essentially means the same thing.
This feature is arguably the single most important part of the Exadata architecture and significantly reduces I/O bottlenecks. This is especially true in data warehouse or other larger database applications where significant amounts of data need to be moved from disk subsystems into the database server. Also, this I/O is offloaded to a storage server which is designed for this purpose.
There are several primary Smart Scan optimizations in Exadata for SQL statement processing:
- Column Projection – only return the data for columns that are contained in the SELECT list or required for joins.
- Predicate Filtering – return only the rows of interest to the database server. Since predicate information is sent to the Storage Server, it can filter the result sets before sending the data back to the database server. For example, in a standard database server, a query like “select count(1) from table1” will return all rows to the database server. In an offloading scenario, only the row count will be returned, thus saving extreme amounts of processing time for the database server, not to mention much less strain on the buffer cache.
- Storage Indexes – in-memory structures on the Storage Cell servers that holds min and max values for each MB of disk storage which limits the physical I/O that must be done. This is essentially a filtering process. By reading through these memory structures, the storage cell will understand which disk regions will or will not contain the data being requested. Think of this as something akin to partitioning for a table in the database server. For example, if the query “select count(1) from table1 where col1 > 0”, the storage cell would use the storage indexes to know which portions of the disk it needs to read to satisfy the “col1 > 0” criteria. Also, and probably more importantly, the storage cell knows which portions of the disk it DOES NOT need to read.
- Function Offloading – SQL functions can be broken up into two main categories, single row and multi-row. Examples of single row functions include: SIN, COS, REPLACE, TRIM, TO_CHAR, TO_DATE, etc. Most of these functions can be offloaded to the storage cells. Examples of multi-row functions include AVG, COUNT, SUM, etc. None of the multi-row functions can be offloaded to the storage cells because they work on the entire result set which no one storage cell has. A view named V$SQLFN_METADATA includes more information about the specific functions that can/cannot be offloaded.
There are other features that help in the offloading portion but the above list seems to be used by almost all databases types Those other features include, joins or bloom filters, hybrid columnar compression (HCC), encryption and decryption and virtual columns.
Leave a Reply