Tuesday, 2 March 2010

Oracle Text

Oracle LIKE clause searches with text indexes


One serious SQL performance problem occurs when you use the SQL "LIKE clause" operator to find a string within a large Oracle table column (e.g. VARCHAR(2000), CLOB, BLOB):


Select stuff from bigtab where text_column like '%ipod%';
Select stuff from bigtab where full_name like '%JONES';


Because standard Oracle cannot index into a large column, there "like" queries cause full-table scans, and Oracle must examine every row in the table, even when the result set is very small. These unnecessary full-table scans are a problem:


  1. Large-table full-table scans increase the load on the disk I/O sub-system

  2. Small table full table scans (in the data buffer) cause high consistent gets and drive-up CPU consumption

For details, see Oracle SQL "like clause" and index access.

Oracle*Text Indexes


The Oracle*Text utility (formally called Oracle ConText and Oracle Intermedia) allows us to parse through a large text column and index on the words within the column.

Unlike ordinary b-tree or bitmap indexes, Oracle context, ctxcat and ctxrule indexes can be set not to update as content is changed. Since most standard Oracle databases will use the ctxcat index with standard relational tables, you must decide on a refresh interval. Oracle provides the SYNC operator for this. The default is SY^NC=MANUAL and you must manually synchronize the index with CTX_DDL.SYNC_INDEX.

SYNC (MANUAL | EVERY "interval-string" | ON COMMIT)]

Hence, Oracle Text indexes are only useful for removing full-table scans when the tables are largely read-only and/or the end-users don't mind not having 100% search recall:

· The target table is relatively static (e.g. nightly batch updates)

· Your end-users would not mind "missing" the latest row data

Oracle Text works with traditional data columns and also with XML, MS-Word docs and Adobe PDF files that are stored within Oracle. Oracle Text has several index types:

CTXCAT Indexes - A CTXCAT index is best for smaller text fragments that must be indexed along with other standard relational data (VARCHAR2).

WHERE CATSEARCH(text_column, 'ipod')> 0;

CONTEXT Indexes - The CONTEXT index type is used to index large amounts of text such as Word, PDF, XML, HTML or plain text documents.

WHERE CONTAINS(test_column, 'ipod', 1) > 0

CTXRULE Indexes - A CTXRULE index can be used to build document classification applications.
These types of indexes allow you to replace the old-fashioned SQL "LIKE" syntax with "CONTAINS" or "CATSEARCH" SQL syntax:

When we execute the query with the new index we see that the full-table scan is replaced with a index scan, greatly reducing execution speed and improving hardware stress:

Execution Plan

0 SELECT STATEMENT Optimizer=FIRST_ROWS

1 1 0 SORT (ORDER BY)

2 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BIGTAB'

3 3 2 DOMAIN INDEX OF 'TEXT-COLUMN_IDX'

Index re-synchronization

Because rebuilding an Oracle Text index (context, ctxcat, ctxrule) requires a full-table scan and lots of internal parsing, it is not practical to use triggers for instantaneous index updates.

Updating Oracle Text indexes is easy and they can be schedules using dbms_job or the Oracle 10g dbms_scheduler utility package: Oracle text provides a CTX_DDL package with the sync_index and optimize_index procedures:

SQL> EXEC CTX_DDL.SYNC_INDEX('text_column_idx');
SQL> EXEC CTX_DDL.OPTIMIZE_INDEX('text_column_idx','FULL');

For example, if you create a nightly dbms_scheduler job to call sync_index, your index will be refreshed, but the structure will become sub-optimal over time. Oracle recommends that you periodically use the optimize_index package to periodically re-build the whole index from scratch. Index optimization can be performed in three modes (FAST, FULL or TOKEN).

In sum, the Oracle Text indexes are great for removing unnecessary full-table scans from static Oracle tables and they can reduce I/O by several orders of magnitude, greatly improving overall SQL performance.


OK, I clearly had to try something else, so I decided to trace 1 session and go through the TKPROF output. I found that the app related SQL statements all executed in less than 5 second, but one statement stood out at 60 seconds elapsed time. That SQL statement was;

begin ctxsys.drvdml.com_sync_index(:idxname, :idxmem, :partname); end;

Schema ctxsys means Oracle Text is in use and the Vendor then confirmed that Oracle Text indexes where build on an app related audit table almost 2 months ago. The TKPROF also showed INSERTs into the app audit table, so we were finally on the right track.

This information was uploaded in the SR and I had to check the SYNC intervals via a select from dr$index and confirmed that the 2 audit indexes had SYNC ON COMMIT. We changed the COMMIT interval to MANUAL , just to proof if this problem is related to the SYNC ON COMMIT. Once we changed the commit interval to MANUAL, the wait event disappeared.

exec CTX_DDL.REPLACE_INDEX_METADATA('XX','REPLACE METADATA SYNC (MANUAL)');

The SR engineer then supplied the following script to change the SYNC to every 5 minutes.

exec CTX_DDL.REPLACE_INDEX_METADATA('XX','REPLACE METADATA SYNC (every "SYSDATE+5/1440")');

So the “moral of the story” is that as a DBA you have more than one tool to solve a problem and you should use as many of them as possible until the root cause of the problem is clear.


1 comment:

Unknown said...

Hi Hemesh,

I tried changing synchronization method for a CTXCAT index and it worked correctly. When I asked Oracle for confirmation that it was a supported method they said the change was not possible. Can you please share your SR number so that I could refer Oracle engineer to it?

Thanks,

Pawel