Monday 28 June 2010

RMAN effective use



Restore Preview

The PREVIEW option of the RESTORE command allows you to identify the backups required to complete a specific restore operation. The output generated by the command is in the same format as the LIST command. In addition the PREVIEW SUMMARYcommand can be used to produce a summary report with the same format as the LIST SUMMARY command. The following examples show how these commands are used:
# Preview 
RESTORE DATABASE PREVIEW; 
RESTORE TABLESPACE users PREVIEW;
# Preview Summary 
RESTORE DATABASE PREVIEW SUMMARY;

RESTORE DATABASE UNTIL TIME 'sysdate-2' PREVIEW SUMMARY; 
          RESTORE ARCHIVELOG UNTIL TIME 'sysdate-3' PREVIEW SUMMARY; 
     
          RESTORE TABLESPACE users PREVIEW SUMMARY;

##################################Boon Statement ###################################
 RESTORE DATABASE UNTIL TIME "to_date('26-01-2012 18:00:00','dd-mm-yyyy hh24:mi:ss')" PREVIEW SUMMARY;
#################################################################################


Oracle 11g Release 2 – RMAN compression comparison

Oracle 11g Release 2 – RMAN compression comparison

http://blog.ronnyegner-consulting.de/2009/11/04/oracle-11g-release-2-rman-compression-comparison/

Oracle datapump uses direct path load

Does Oracle datapump uses direct path load?


 

Yes. This is one of feature that makes impdp or expdp more faster than conventional export and import. To use direct path loading through oracle datapump, one has follow certain condition. Alternatively it can can be used by external table method by which we unload the data on flat file on file system of database server and after user can use those flat file as simple data source in its SELECT statement.


 

EXPDP will use DIRECT_PATH mode if:



The structure of a table allows a Direct Path unload, i.e.: 

     - The table does not have fine-grained access control enabled for SELECT. 

     - The table is not a queue table. 

     - The table does not contain one or more columns of type BFILE or opaque, or an object type containing opaque columns. 

     - The table does not contain encrypted columns. 

     - The table does not contain a column of an evolved type that needs upgrading. 

     - If the table has a column of datatype LONG or LONG RAW, then this column is the last column. 



The parameters QUERY, SAMPLE, or REMAP_DATA parameter were not used for the specified table in the Export Data Pump job. 



The table or partition is relatively small (up to 250 Mb), or the table or partition is larger, but the job cannot run in parallel because the parameter PARALLEL was not specified (or was set to 1).


 

IMPDP will use DIRECT_PATH if:



The structure of a table allows a Direct Path load, i.e.: 

     - A global index does not exist on a multipartition table during a single-partition load. This includes object tables that are partitioned. 

     - A domain index does not exist for a LOB column. 

     - The table is not in a cluster. 

     - The table does not have BFILE columns or columns of opaque types. 

     - The table does not have VARRAY columns with an embedded opaque type. 

     - The table does not have encrypted columns. 

     - Supplemental logging is not enabled and the table does not have a LOB column. 

     - The table into which data is being imported is a pre-existing table and: 

        – There is not an active trigger, and: 

        – The table is partitioned and has an index, and: 

        – Fine-grained access control for INSERT mode is not enabled, and: 

        – A constraint other than table check does not exist, and: 

        – A unique index does not exist. 



The parameters QUERY, REMAP_DATA parameter were not used for the specified table in the Import Data Pump job. 



The table or partition is relatively small (up to 250 Mb), or the table or partition is larger, but the job cannot run in parallel because the parameter PARALLEL was not specified (or was set to 1).


 

How to enforce a specific load/unload method ?


 

In very specific situations, the undocumented parameter ACCESS_METHOD can be used to enforce a specific method to unload or load the data. Example:


 

%expdp system/manager ... ACCESS_METHOD=DIRECT_PATH  

%expdp system/manager ... ACCESS_METHOD=EXTERNAL_TABLE  



or: 



%impdp system/manager ... ACCESS_METHOD=DIRECT_PATH  

%impdp system/manager ... ACCESS_METHOD=EXTERNAL_TABLE 


 

Important Need-To-Know's when the parameter ACCESS_METHOD is specified for a job:


 


 

  • The parameter ACCESS_METHOD is an undocumented parameter and should only be used when requested by Oracle Support.


     

  • If the parameter is not specified, then Data Pump will automatically choose the best method to load or unload the data.


     

  • If import Data Pump cannot choose due to conflicting restrictions, an error will be reported:

    ORA-31696: unable to export/import TABLE_DATA:"SCOTT"."EMP" using client specified AUTOMATIC method


     

  • The parameter can only be specified when the Data Pump job is initially started (i.e. the parameter cannot be specified when the job is restarted).


     

  • If the parameter is specified, the method of loading or unloading the data is enforced on all tables that need to be loaded or unloaded with the job.


     

  • Enforcing a specific method may result in a slower performance of the overall Data Pump job, or errors such as:


     


 

... 

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA 

ORA-31696: unable to export/import TABLE_DATA:"SCOTT"."MY_TAB" using client specified DIRECT_PATH method 

...


 


 

  • To determine which access method is used, a Worker trace file can be created, e.g.:


     


 

%expdp system/manager DIRECTORY=my_dir \ 

DUMPFILE=expdp_s.dmp LOGFILE=expdp_s.log \ 

TABLES=scott.my_tab TRACE=400300


 

The Worker trace file shows the method with which the data was loaded (or unloaded for Import Data Pump):


 

... 

KUPW:14:57:14.289: 1: object: TABLE_DATA:"SCOTT"."MY_TAB" 

KUPW:14:57:14.289: 1: TABLE_DATA:"SCOTT"."MY_TAB" external table, parallel: 1 

...


 

EXPDP will use EXTERNAL_TABLE mode if:



Data cannot be unloaded in Direct Path mode, because of the structure of the table, i.e.: 

     - Fine-grained access control for SELECT is enabled for the table. 

     - The table is a queue table. 

     - The table contains one or more columns of type BFILE or opaque, or an object type containing opaque columns. 

     - The table contains encrypted columns. 

     - The table contains a column of an evolved type that needs upgrading. 

     - The table contains a column of type LONG or LONG RAW that is not last. 



Data could also have been unloaded in "Direct Path" mode, but the parameters QUERY, SAMPLE, or REMAP_DATA were used for the specified table in the Export Data Pump job. 



Data could also have been unloaded in "Direct Path" mode, but the table or partition is relatively large (> 250 Mb) and parallel SQL can be used to speed up the unload even more.


 

IMPDP will use EXTERNAL_TABLE if:



Data cannot be loaded in Direct Path mode, because at least one of the following conditions exists: 

     - A global index on multipartition tables exists during a single-partition load. This includes object tables that are partitioned. 

     - A domain index exists for a LOB column. 

     - A table is in a cluster. 

     - A table has BFILE columns or columns of opaque types. 

     - A table has VARRAY columns with an embedded opaque type. 

     - The table has encrypted columns. 

     - Supplemental logging is enabled and the table has at least one LOB column. 

     - The table into which data is being imported is a pre-existing table and at least one of the following conditions exists: 

        – There is an active trigger 

        – The table is partitioned and does not have any indexes 

        – Fine-grained access control for INSERT mode is enabled for the table. 

        – An enabled constraint exists (other than table check constraints) 

        – A unique index exists 



Data could also have been loaded in "Direct Path" mode, but the parameters QUERY, or REMAP_DATA were used for the specified table in the Import Data Pump job. 



Data could also have been loaded in "Direct Path" mode, but the table or partition is relatively large (> 250 Mb) and parallel SQL can be used to speed up the load even more.

Thursday 24 June 2010

Automatic PGA Memory Management [ID 223730.1]

Automatic PGA Memory Management [ID 223730.1]


RELATED DOCUMENTS

-----------------

http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/initparams157.htm

http://www.comp.dit.ie/btierney/oracle11gdoc/server.111/b28320/initparams177.htm


http://www.cse.ust.hk/vldb2002/VLDB2002-proceedings/papers/S29P03.pdf



- Note: 147806.1 Oracle9i New Feature Automated SQL Execution Memory Management

- Note: 148346.1 Oracle9i Monitoring Automated SQL Execution Memory Management

- Note: 175216.1 Oracle9i Release Notes Release 1 (9.0.1) for Alpha OpenVMS

- Note: 43507.1 ALERT HP-UX Patch Levels Advised

- Oracle 9i Database Performance tuning Guide and reference, Chapter 14

- Note:443746.1 Automatic Memory Management(AMM) on 11g

Known RMAN Performance Problems [ID 247611.1]

PURPOSE

-------


 

Give an overview of known performance problems with RMAN Backups

and Restores


 


 

SCOPE & APPLICATION

-------------------


 

This document is intended for DBA's who are facing performance issues

during the backup or restore via RMAN.


 

Its intention is to give an overview of the know performance

issues with RMAN and not to give any diagnosis. The diagnosis and

investigation is handled more in depth in other documents as

referenced at the bottom of this document.


 


 

KNOW RMAN PERFORMANCE ISSUES

-----------------------------


 

Note :

Generic Recommendation for Oracle 10G

Generate statistics on the fixed objects, using

DBMS_STATS.GATHER_FIXED_OBJECTS_STATS.


 

This should be done when the database has been running

for awhile with generic workload, so the information

in the fixed objects reflect a resonable state of the database load.


 

Gather the statistics with :

SQL> exec dbms_stats.gather_fixed_objects_stats


 

Alert Note 463227.1 Oracle10g RMAN Recovery Catalog Known Performance Issues


 

Bug 6051693 SLOW RESYNC AND SHOW ALL IS TAKING AROUND 2 MINUTES

Duplicate of Bug 5620640

Note 820856.1 High Waits On "Control File Sequential Read" During Rman Backup

Version affected: 10.2.0.1 to 10.2.0.3

Version fixed : some one-off backports for 10.2.0.3 , fixed in 10.2.0.4

Affects : Target database

Workaround : none


 


 

Unpublished Bug 8239313 Duplicate database for a very large database of around 3000 datafiles is very slow.

Version affected: 10.2.0.3 and higher

Version fixed : 11.2

Affects : Target database

Workaround : none


 


 

Bug 5251842 Poor plan against V$DATAFILE

Version affected: 10.2.0.3 and higher

Version fixed : 11.1.0.7 and 11.2 and some one-off for 10.2.0.3 | 4

Affects : Target database

Workaround : none


 

Bug 7023147: V$LOG_HISTORY queries can be slow

Version affected: 10.2.0.2 and higher

Version fixed : 10.2.0.5 and 11.2

Affects : Target database

Workaround : Recreate the controlfile

Note : 882154.1 RMAN LIST BACKUP command takes a long time querying

v$log_history


 


 

Bug 7595777 RMAN TAKES A LONG TIME BEFORE STARTING THE BACKUP

Version affected: 10.2.0.3 and higher

Version fixed : Unknown yet (FEBR-2009)

Affects : Target database

Workaround : Replace wasresynced function ?/rdbms/admin/recover.bsq with the following:

---------------------------------

function wasresynced(until_stamp IN number

,high_stamp IN number) return number is

nodups number; -- number of duplicates

high number;

low number;

resyncstamp number;

begin

high := high_stamp;

low := until_stamp;

nodups := 0;

resyncstamp := 0;

deb('resync', 'wasresynced high_stamp=' || high_stamp ||

' high_date=' || stamp2date(high_stamp), dbtype);


 

for duprec in duprec_c(low, high) loop

if (dbms_rcvcat.isDuplicateRecord(recid => duprec.recid

,stamp => duprec.stamp

,type => duprec.type)) then

if (resyncstamp = 0) then

resyncstamp := duprec.stamp;

end if;


 

nodups := nodups + 1;

if (nodups >= maxdups) then

deb('resync', 'wasresynced resyncstamp=' || resyncstamp ||

' resyncdate=' || stamp2date(resyncstamp), dbtype);

return resyncstamp;

end if;

else -- couldn't find 16 consecutive duplicate records.

deb('resync', 'wasresynced could not find record recid=' ||

duprec.recid || ' stamp=' || duprec.stamp || ' type=' ||

duprec.type || ' maxdups=' || nodups, dbtype);

return 0;

end if;

end loop;


 

-- Timestamp range not enough to satisfy the number of duplicates.

-- Retry using a higher timestamp

deb('resync', 'timestamp range not enough - nodups=' || nodups, dbtype);

return -1;

end;

-------------------------------------


 


 

Bug 7206538 PERFORMANCE ISSUE WITH RMAN BACKUP

Bug 6412947 DRA PERFORMANCE ISSUES

Version affected: 11.1.0.6

Version fixed : 11.1.0.7

Affects : Target database

Workaround : set parameter "_dra_enable_offline_dictionary"=false

Reference : Note 605557.1 RMAN BACKUPS CAN TAKE A LONG TIME IN 11.1.0.6


 


 

Bug 7173341 - CLEANUPRSR IS TAKING VERY LONG TIME CAUSING SLOW RESYNC

Version affected: 10.2.0.4

Version fixed : 11.2

Affects : Catalog Schema

Workaround : Change the 2 following SQL-statements in the recover.bsq


 

Replace the 'SELECT max(rsr_key)' statement in cleanupROUT:

SELECT max(rsr_key) into high_session_key

FROM rsr, dbinc

WHERE dbinc.db_key = this_db_key

AND rsr.dbinc_key = dbinc.dbinc_key

AND rsr.rsr_stamp < high_stamp;


 

Replace the 'DELETE FROM rsr' statement in cleanupRSR:

    DELETE FROM rsr

    WHERE rsr_end < nowTime-60

    AND rsr.dbinc_key IN

    (select dbinc_key from dbinc

    where dbinc.db_key = this_db_key);


 

    Upgrade the catalog to pick up the change in recover.bsq file.


 

Bug 6476935 - RMAN Resyncs may take a long time to complete

Version affected : 10.2.0.3

Version fixed : 11.2

Affects : Catalog Schema

Workaround : create index rsr_i_stamp on rsr(rsr_sstamp, rsr_srecid);


 


 

Note 433335.1 - RMAN Restore Of Incremental Backups Will Not Parallelise

and Uses Only One Channel

Version affected : 10.1.0.0 to 10.2.0.4

Versions fixed : Expected behaviour - media managers can influence how we do

Backups @10g

Affects : Target database

Workaround : SET PARALLELMEDIARESTORE OFF;

See Note 433335.1


 

Note 413098.1 - Extremely Poor RMAN Backup Performance to NFS After Upgrade

to 10.2 on Solaris

Version affected : 10.2.0.0 to 10.2.0.4

Version fixed : 10.2.0.3 for NON-RAC applications only:

Affects : Target database

Diagnosis : 10.2 forces the use of directio when writing to NFS in a

RAC implementation, this is done regardless of whether or

Not you use the forcedirectio nfs mount option: this

results in very poor performance

Workaround : None - do not write backups to NFS at 10.2 when using RAC

See Note 413098.1


 

Bug 5219484 - CATALOG RESYNCS ARE VERY SLOW - ROUT TABLE HAS 6 MILLION ROWS +

Version affected : 10.1

Version fixed : 10.2.0.4

Affects : Catalog Schema

Diagnosis : Many rows in ROUT-table.

Workaround : See Note 378234.1 Rman Catalog Resync Operation is Very slow at 10G


 


 

Bug 5047758 - RMAN SLOW PERFORMANCE ON EXIT (UPDATING STATUS STATISTICS)

Version affected : 10.1

Version fixed : 10.2 due to base bug 3325292 OPTIMIZER NOT ABLE TO USE INDEXES IN QUERY USING V$SQL_PLAN

Affects : Target database

Diagnosis : Expensive statement =

select round(sum(MBYTES_PROCESSED)) ,round(sum(INPUT_BYTES)),

round(sum(OUTPUT_BYTES))

from V$RMAN_STATUS

start with (RECID=:b1 and STAMP=:b2)

connect by prior RECID=parent_recid

Workaround : RMAN>sql "alter session set optimizer_mode=RULE";

Remark : It might be that bug 5247609 is a followup / new occurence of this bug


 

Bug 5247609 RMAN SLOW PERFORMANCE DURING REGISTER DATABASE/OPEN RESETLOGS

Version affected : 10.2

Version fixed : 10.4

Affects : Target database

Diagnosis : Expensive statement =

select round(sum(MBYTES_PROCESSED)) ,round(sum(INPUT_BYTES)),

round(sum(OUTPUT_BYTES))

from V$RMAN_STATUS

start with (RECID=:b1 and STAMP=:b2)

connect by prior RECID=parent_recid

Workaround : RMAN>sql "alter session set optimizer_mode=RULE";


 

Bug 1551773 RMAN RESTORE RUNS VERY SLOWLY WHEN THERE ARE MANY ROWS IN THE CKP TABLE

Version affected : 8.1

Version fixed : 9i

Affects : Catalog Schema

Diagnosis : The table CKP in the RMAN-catalog has many rows.

Workaround : Call Oracle Support to supply a cleanup-script.

Reference : Note.209780.1 RMAN Hangs During Restore if CKP Table is Excessively Large


 

Bug 2803823 RESYNC CATALOG AND MAINTENANCE COMMANDS ARE SLOW

Version affected : 9.2

Version fixed : 9205, 10g

Affects : Catalog Schema

Diagnosis : Bad execution plan for SQL-statement : DELETE FROM CKP

Reference : Note 248361.1 RMAN Command 'Resync Catalog' is Very Slow


 

Bug 4013855 - RMAN CROSSCHECK IS TAKING A LONG TIME TO COMPLETE

Version affected : 9i

Version fixed : 9207, 10.1.0.5, 10gRelease2

Affects : Catalog Schema

Diagnosis : Large number of backupsets.

LIST/CROSSCHECK commands(in 9iR2 and above) taking more time.

REPORT/DELETE obsolete (in 9iR2) takes more time.


 

Note 339964.1 RMAN 8.1.7 Crosscheck Command Is Very Slow

Version affected : 8i

Version fixed : No fix available.

Affects : Catalog Schema

Diagnosis : See this note for details on how to confirm if you have hit this problem.

Workaround : RMAN> sql "alter session set optimizer_mode=RULE";

Cleanup backup history - see this note for hints on how to do this at 8i.


 

Bug 4548861 RMAN RESTORE ARCHIVELOG UNTIL SEQUENCE TAKES A LONG TIME TO COMPLETE

Version affected : 9i and 10G

Version fixed : 10.2.0.2, 11G

Affects : RMAN executable, Target database, Catalog Schema

Diagnosis : Restore archivelog takes long compilation time either during

RESTORE command or during RECOVER command.

Workaround : None


 

Note 342999.1 First Resync on Production Host After Running RMAN on

Backup Server Takes a Long Time

Version affected : 8i

Version fixed : No fix available.

Affects : RMAN executable, Target database, Catalog Schema

Diagnosis : See this note for details on how to confirm if you have hit this problem.

Workaround : Recreate the controlfile.


 

Bug 2876717 RMAN backup starts SLOW if READ-ONLY device is used

Version affected : 9.2

Version fixed : 9204, 10g

Affects : Target database

Diagnosis : READ_ONLY devices, like CD-rom, are used.


 

Bug 1887868 RMAN RESYNC takes a long time with many tablespaces

Version affected : 8.1

Version fixed : 9201, 10g

Affects : Catalog Schema

Diagnosis : RMAN full resync (either implicit resync, or explicit

resync requested by RESYNC CATALOG command) takes a

very long time when there are a lot of tablespaces

(probably 500 or more) in the database)


 

Internal Bug 1712720 RMAN MAINTENANCE COMMANDS ARE VERY SLOW

Version affected : 8.1, 9.0.1

Version fixed : 9201

Affects : Catalog Schema

Workaround : create index brl_i_dts on brl(dbinc_key, thread#, sequence#)


 

Bug 2174697 RMAN incremental backup performance degrades over time in NOCATALOG mode

Version affected : 8.1 9.0.1

Version fixed : 8174 9014 9201

Affects : Target database

Diagnosis : Many entries in X$KSFQP

Workaround : SQL> alter system set events 'immediate trace name ksfqp_limit level 1';


 

Bug 2385857 RMAN backup performance degrades over time for large DB_FILES

Version affected : 8.1 9.0.1

Version fixed : 9202 10g

Affects : Target database

Diagnosis : Contention of ksfqpl latch

Workaround : SQL> alter system set events 'immediate trace name ksfqp_limit level 1';

or in the init.ora :

event = "logon trace name ksfqp_limit level 1"


 

Bug 2484250 RMAN CUMULATIVE BACKUP / MAINTENENCE may appear to hang (spin)

Version affected : 8.1 9.0.1 9.2

Version fixed : 9203 10g

Affects : Catalog Schema

Workaround : dbms_utility.analyze_schema() on RMAN- catalog schema.


 

Bug 2610596 Huge CPU usage from RMAN server when TAPE not mounted with BACKUP_TAPE_IO_SLAVE=TRUE

Version affected : 8.1 9.0.1 9.2

Version fixed : 9203 10g

Affects : Target database

Workaround : Insert a tape in the tape drive, or

Set init.ora param BACKUP_TAPE_IO_SLAVES to FALSE


 

Bug 2968394 RESYNC TAKES A LONG TIME TO COMPLETE : LARGE BP TABLE

Version affected : 8.1 9i

Version fixed : 10g

Affects : Catalog Schema

Diagnosis : Many rows in tables BS and BP

Workaround : Create 2 additional indexes in the RMAN-catalog schema :

SQL> create index bs_i_1 on bs (db_key, bs_recid, bs_stamp);

create index db_i_2 on bp (db_key, bp_recid, bp_stamp);


 

Bug 1407413 NO INDEX TO ON RI CONSTRAINT BETWEEN BS AND BDF

Version affected : 8.1

Version fixed : 9i

Affects : Catalog Schema

Diagnosis : Many rows in tables BDF

Workaround : Create 1 additional index in the RMAN-catalog schema :

SQL> create index bdf_i_bs_key on bdf(bs_key);


 

Bug 4110489 RMAN SKIP READ ONLY TABLESPACE CLAUSE TAKING TOO MUCH TIME TO SKIP

Version affected : 9i

Version fixed : 10G Release2

Affects : RMAN Executable

Diagnosis :

1) SKIP READ ONLY or SKIP OFFLINE option takes too much

when NOT BACKED UP option is used in catalog mode.

2) SKIP READ ONLY or SKIP OFFLINE option takes too much when

BACKUP OPTIMIZATION is turned ON in catalog mode

Workaround : Run the backup in NOCATALOG mode and

prefrom a RESYNC CATALOG afterwards


 

Bug 3966722 CONCURRENT RMAN BACKUPS WAIT ON TM ENQUEUE ON CKP TABLE

Version affected : 9i, 10G

Version fixed : 9207, 10105, 10G Release2

Affects : Catalog Schema

Diagnosis :

Multiple resyncs are done in parallel for same/different databases.

There is no error signaled, but the other backup job will not continue

until the resync in first job is completed.

Workaround : Create indexes in recovery catalog:

For 9i and 10G:

SQL> CREATE INDEX dfatt_i_sck on dfatt(start_ckp_key);

CREATE INDEX dfatt_i_eck on dfatt(end_ckp_key);

CREATE INDEX tsatt_i_sck on tsatt(start_ckp_key);

CREATE INDEX tsatt_i_eck on tsatt(end_ckp_key);

CREATE INDEX ckp_i_dbinc on ckp(dbinc_key);

For 10G Release 1 and 2:

SQL> CREATE INDEX rsr_i_dbinc on rsr(dbinc_key);

For 10G Release 2:

SQL> CREATE INDEX rout_i_db on rout(db_key);

CREATE INDEX rout_i_rsr on rout(rsr_key);


 

Generic Performance Issues which effect RMAN :

----------------------------------------------

Bug 2710321 X$KCCFN / V$DATAFILE much slower in 9i than 8i with many datafiles

Version affected : 9

Version fixed : 9203

Affects : Target database

Diagnosis : Many datafiles


 

RELATED DOCUMENTS

-----------------


 

Note 145624.1 : RMAN: Resolving an RMAN Hung Job

Note 228838.1 : RMAN Diagnostic Checklist

Note 605557.1 : RMAN BACKUPS CAN TAKE A LONG TIME IN 11.1.0.6

Friday 11 June 2010

How to stop running RMAN jobs in OEM Grid Control

Life became very easy after Oracle's invention of OEM Grid Control. That is what Oracle promised us when they invented it. A couple of months ago one of my colleagues asked me to schedule backup jobs. In the past I made very nice OS scripts in order to make a backup. But now with OEM Grid Control being available for quite some time, I thought let's try making backups using OEM Grid Control.

 
 

And yes it works fine. Grid Control makes quite interesting RMAN scripts. You can schedule these RMAN scripts. At one glimpse you can see all your backup jobs and the status of these backup jobs in the job activity list. Also you can see if these backup jobs have successfully run. For script kiddies OEM Grid Control is bad news, because it makes scripts for you. But if you like to be wizard kiddy, you feel to be in heaven.

 
 

But after some while
, life with OEM Grid Control turned out not so nice. After some proper backup runs, I found a backup job which remains status running. The next day a new backup job for the same database was automatically started. But this job got immediately status "1 problems".  It turned out that the job was not started because Grid Control says: "An execution in one of the previous runs of the job was still running."

 
 

So I thought: let's stop the running job. So I did. Then Grid Control told me: "The job execution was stopped successfully. Currently running steps will not be stopped." So I thought: life is easy again. But it turned out that this was not true. The running job got status "Stop Pending" and remained this status.

 
 

So I thought: let's kill the running step. So I did. But then Grid Control says: "The step was not killed because it has already completed." But the job remains in status running.

 
 

So I thought: let's delete the job. (what else should you do if OEM Grid Control refuses to listen). So I did, but then OEM Grid Control says: "The specified job, job run or execution is still active. It must finish running, or be stopped before it can be deleted. Filter on status 'Active' to see active executions."

 
 

Doom scenarios as: I will never be able to make a backup anymore of this database, came in  my mind." But finally Oracle Support send me this script:

DECLARE
jguid RAW(16);
BEGIN
SELECT job_id
INTO jguid
FROM mgmt_job
WHERE job_name = '<name of your job>'
AND job_owner = '<owner of your job>'
;
mgmt_job_engine.stop_all_executions_with_id(jguid,TRUE);
COMMIT;
END;

You have to run this script under user sysman on the OEM Grid Control repository database. You can find the name and owner of the job in the job activity list. Using this script I was able to solve my problem. The running job was deleted.

However I had to schedule a new backup job for this database again. This script deletes all runs of this job and so also the next occurrences of this job.

So if you ever run in a similar kind of problem then you can solve it by running this script and schedule a new job again.

Now life is easy again using OEM Grid Control, but not that easy as they promised us at first. 

It is not clear to me why sometimes a job remains in status running.

Thursday 3 June 2010

ORA-04030


 

Applies to:

Oracle Server - Enterprise Edition - Version: 8.1.5.0 to 10.2.0.4 - Release: 8.1.5 to 10.2
Information in this document applies to any platform.

Purpose

This article is intended to

Common Bugs

Notes:
Backport possibilities are only to indicate technical backport or patchset exception may be possible.
Actual availability of backports/PSEs are subject to backport policies as per
Note 209768.1 Pub Database, FMW, and OCS Software Error Correction Support Policy

Bug

Reported

Fixed

Notes

Details

Bug 3130972

Versions < 10.1.0.2

9.2.0.6/10.1.0.2.0

Backports possible

The realfree allocator on Unix systems imposes a cap at 1Gb of memory per process. This fix relaxes that limit as
in some cases it is desirable to allow a process to use over 1Gb of private memory. If the limit is reached an
ORA-4030 occurs even though the system may have plenty of free memory available and can map that memory into the
process address space.
(The realfree allocator is used on Unix systems where
PGA_AGGREGATE_TARGET is in use)
Workaround:
Configure the system to allow less than 1Gb of memory per process to avoid ORA-4030 errors.

Bug 3565920

Versions < 10.1

9.2.0.8 and higher

Backports no longer

If the shared pool is resized then subsequeunt queries against views based on
X$KSMSP, such as V$SHARED_POOL_RESERVED, result in a memory leak in the cursor work heap which then fails with ORA-4030.

Bug 4475206

Versions < 10.2.0.4

10.2.0.4/11.1.0.6

Backports possible

The PGA memory consumed becomes significantly higher than the value of the
parameter pga_aggregate_target if a query has a very long chain of
hash-join operations. This chain must be right-deep, ie. the build is
performed on a base table and the probe is produced by an hash-join sub-tree.
This is not really a memory leak but excess memory use compared to what
should be used.

Bug 4625938

Versions < 10.2

10.2

10.1.x

A memory leak involving 'peihstdep' and 'PEIDEF' can
occur when TABLE functions are used and JDBC connection pooling is enabled.
Workaround:
Disable connection pooling.

Bug 5118748

Versions < 10.2.0.3

10.2.0.3/11.1.0.6

Backports possible

ORA 4030 or a memory leak can occur when using a lot of collections in PLSQL. Heapdumps of the "koh-kghu call " heap include multiple chunks of type "pmucp2upkl korfp

Bug 5220562

Versions < 10.2.0.4

10.2.0.4/11.1.0.6

Backports possible

An insert select DML session's process size may increases / ORA-4030
(with sort sub heap chunks) when there is concurrent DDL on the
partitioned tables / objects involved in the DML.
Workaround:
Avoid concurrent execution of the DML and DDL, if possible.

Bug 5389854

Versions < 10.2.0.4

10.2.0.4/11.1.0.6

Backports possible

Execution of a procedure using bulk insert with save exceptions will
consume large amounts of PGA memory during execution. If run with
extremely large number of rows or for a long period of time this can
lead to ORA-4030.
The memory shows up on the "callheap,DARWIN" subheap as "koh-kghu call" memory

Bug 5391505

Versions < 10.2.0.4

10.2.0.4/11.1.0.6

Backports possible

PGA memory may keep on increasing during query parsing and can reach a large
amount (sometimes even over 1G) when OR expansion occurs.
Ultimately ORA-4030 may be encountered as memory runs out.
The memory shows as "perm" space in the "kxs-heap-c" subheap.
Workaround:
alter session set "_no_or_expansion" = true

Bug 5464834

Versions < 10.2.0.4

10.2.0.4/11.1.0.6

Backports possible

ORA-4030 (kxs-heap-c,temporary memory) can occur when using EXPDP

Bug 5866410

Versions < 11

11.1.0.6

Backports possible

Bulk insert in PLSQL can consume a large amount of PGA memory
which can lead to ORA-4030 errors.
A heapdump will show lot of free memory in the free lists which
is not used but instead fresh allocations are made.
Workaround:
Chunk the FORALL loop. Do a hybrid of FOR & FORALL so that the
bulk_rowcount arrays doesnt grow abnormally large

Bug 5947623

Versions >= 10.2.0.1 but < 11.1.0.7

10.2.0.4/11.1.0.7

Backport possible

it is possible for a query to allocate too much memory executing a hash join
over large volumes of data with few distinct join key values.
The impact on 64-bit systems is greater.
This is not really a memory leak as the fix only makes the query to spill to
disk earlier.
Workaround:
set "_pga_max_size"

Bug 6011182

Versions >= 10.2.0.1 but < 10.2.0.4

10.2.0.4

Backport possible

High elapsed time and high memory consumption during parse
can occur for queries with large numbers of query blocks.
If you see high elapsed times and/or memory consumption during parse for a
query, and the query has a large number of query blocks (eg many views,
subqueries or UNION ALL branches) you may be hitting this bug.
For error ORA-04030
The path of the leak is: top call heap -> call heap -> typecheck
largest memory allocations w/comments:
"logdef: qcopCre", "kkoFroAnn: qksf", "frodef: qksfroC"

Bug 6052169

Versions < 11

11.1.0.6

Backports possible

Slave memory grows unbounded and finally fails with ORA-4030.
A heapdump of the memory shows 'Statement Alloc' string.

Bug 6061892

Versions >= 10.2.0.1 but < 10.2.0.4

10.2.0.4/11.1.0.6

Backports possible

It is possible to get error ORA-4030 and ORA-21780 when an application
that drops and/or recreates many plsql packages or top-level
Noteocedures or functions which are used in calls from SQL to PL/SQL.
The leak is found in the heaps:
pga heap -> session heap -> PLS non-lib hp.
Most of the hunks on heap 'PLS non-lib hp' are PEIDEF or peihstdep
Note:
This fix introduces the problem described in bug 6139254

Bug 6408917

Versions < 11

11.1.0.6

Backports possible

Excessive PGA memory consumption can be seen when using ref cursors returned
from Java stored procedures. Ultimately this can lead to out of memory
errors such as:
ORA-29532: java call terminated by uncaught java exception: java.lang.outofmemory

Bug 6414844

Versions < 11.2

11.2

Backports possible to 10.1

memory may be wasted in a subheap using kghu memory allocation.

A heapdump will show free chunks in each extent with a "free" chunk which is
not quite large enough to satisfy a request. This is not really a memory
leak but inefficient use of memory in certain circumstances.
eg:
EXTENT 0 addr=0xb3810008
Chunk b3810010 sz= 16272 free " "  Wasted space
Chunk b3813fa0 sz= 16416 freeable "koh-kghu call "
Chunk b3817fc0 sz= 16416 freeable "koh-kghu call " ds=0xb7a1daf0
Chunk b381bfe0 sz= 16416 freeable "koh-kghu call " ds=0xb7a1daf0





Questions and Answers

What is an ORA-4030?

Basically, an ORA-4030 indicates a limit has been reached with respect to the Oracle process private memory allocation.
Typical causes:

* OS Memory limit reached such as physical memory and/or swap/virtual paging
* OS limits reached such as kernel or user shell limits that limit overall, user level or process level memory usage
* OS limit on PGA memory size due to SGA attach address
Note 262540.1 Relocate SGABEG on 64-bit Oracle
* Oracle internal limit example Bug 3130972
* Application design causing limits to be reached
* Bug – space leaks, heap leaks

What is difference between 4030 and 4031?

An ORA-4030 error is an error in the PGA; Memory limitation related to an Oracle rdbms server process is reached.
This includes the database background processes and oracle shadow processes that are spawned directly by the database or the listener.

An ORA-4031 error is an error in the SGA; Memory limitation in SGA component such as shared pool,large pool, java pool, streams pool is reached.

What are the contents of Program Global Area memory?

A Program Global Area (PGA) is a memory region that contains data and control information for a server process. It is a non-shared memory area created by Oracle when a server process is started. Access to it is exclusive to that server process and is read and written only by Oracle code acting on behalf of it. The total PGA memory allocated by each server process attached to an Oracle instance is also referred to as the aggregated PGA memory allocated by the instance.

Components of the PGA vary depending on how the database is configured. The PGA is comprised of four memory areas

* User Session Memory

User session memory is the memory allocated to hold a session's variables (logon information) and other information related to the session. For a shared server, the session memory is shared and not private

* Private SQL Areas - contains data such as bind information and runtime buffers

A private SQL area contains data such as bind information and runtime memory structures. Each session that issues a SQL statement has a private SQL area. Each user that submits the same SQL statement has his or her own private SQL area that uses a single shared SQL area. Thus, many private SQL areas can be associated with the same shared SQL area.

The private SQL area of a cursor is itself divided into two areas whose lifetimes are different:

* The persistent area, which contains, for example, bind information. It is freed only when the cursor is closed.
* The run-time area, which is freed when the execution is terminated.

The location of a private SQL area depends on the type of connection established for a session. If a session is connected through a Dedicated Server, private SQL areas are located in the server process's PGA. However, if a session is connected through a Shared Server, part of the private SQL area (specifically, the peristent area) is kept in the SGA.

The runtime area contains information used while the SQL statement is being executed. The size of the runtime area depends on the type and complexity of the SQL statement being executed and on the sizes of the rows that are processed by the statement. In general, the runtime area is somewhat smaller for INSERT, UPDATE, and DELETE statements than it is for SELECT statements, particularly when the SELECT statement requires a sort.

Oracle creates the runtime area as the first step of an execute request. For INSERT, UPDATE, and DELETE statements, Oracle frees the runtime area after the statement has been executed. For queries, Oracle frees the runtime area only after all rows are fetched or the query is canceled.

The location of a private SQL area depends on the type of connection established for a session. If a session is connected via a dedicated server, private SQL areas are located in the user's PGA. However, if a session is connected via the multi-threaded server, the persistent areas and, for SELECT statements, the runtime areas, are kept in the SGA.

* SQL Work Areas

The application developer of an Oracle precompiler program or OCI program can explicitly open cursors, or handles to specific private SQL areas, and use them as a named resource throughout the execution of the program. Recursive cursors that Oracle issues implicitly for some SQL statements also use shared SQL areas.

The management of private SQL areas is the responsibility of the user process. The allocation and deallocation of private SQL areas depends largely on which application tool you are using, although the number of private SQL areas that a user process can allocate is always limited by the initialization parameter OPEN_CURSORS. The default value of this parameter is 50.

A private SQL area continues to exist until the corresponding cursor is closed or the statement handle is freed. Although Oracle frees the runtime area after the statement completes, the persistent area remains waiting. Application developers close all open cursors that will not be used again to free the persistent area and to minimize the amount of memory required for users of the application.

* Cursor and SQL Areas

For complex queries (for example, decision-support queries), a big portion of the runtime area is dedicated to work areas allocated by memory-intensive operators such as the following:

* Sort-based operators (order by, group-by, rollup, window function)
* Hash-join
* Bitmap merge
* Bitmap create

For example, a sort operator uses a work area (sometimes called the sort area) to perform the in-memory sort of a set of rows. Similarly, a hash-join operator uses a work area (also called the hash area) to build a hash table from its left input. If the amount of data to be processed by these two operators does not fit into a work area, then the input data is divided into smaller pieces. This allows some data pieces to be processed in memory while the rest are spilled to temporary disk storage to be processed later. Although bitmap operators do not spill to disk when their associated work area is too small, their complexity is inversely proportional to the size of their work area. Thus, these operators run faster with larger work area.

In shared server mode, portions of the UGA are stored in the following:

* Large pool if configured.
* Shared pool if large pool is not configured.

Note that as of 10g, the work areas controlled by PGA_AGGREGATE_TARGET are located in the pga for shared servers. On 9i shared server work areas are defined by the *_area_size parameters and located in the sga.

Why do I see processes growing larger than the PGA_AGGREGATE_TARGET?

Parameter PGA_AGGREGATE_TARGET does not set a hard limit on pga size. It is only a target value used to dynamically size the process work areas.
It also does not affect other areas of the pga that are allowed to grow beyond this limit. See below for explanation.

Can you control the size of a process?

As a database administrator you can control the size of a process within limitations.
For example, you can configure certain database initialization parameters that affect the size of a process work area.
If you choose, you can manually control the maximum size of SQL work areas by setting the following parameters: SORT_AREA_SIZE, HASH_AREA_SIZE, BITMAP_MERGE_AREA_SIZE and CREATE_BITMAP_AREA_SIZE. Setting these parameters is difficult, because the maximum work area size is ideally selected from the data input size and the total number of work areas active in the system. These two factors vary a lot from one work area to another and from one time to another. Thus, the various *_AREA_SIZE parameters are hard to tune under the best of circumstances.

With 9i and higher, you can automatically and globally manage the size of SQL work areas by specifying the total size dedicated to PGA memory for the Oracle instance by setting the initialization parameter PGA_AGGREGATE_TARGET. The specified number (for example, 2G) is a global target for the Oracle instance, and Oracle tries to ensure that the total amount of PGA memory allocated across all database server processes never exceeds this target. The key word in this statement is "tries". It is possible that PGA memory will grow beyond the "target".

The initialization parameter WORKAREA_SIZE_POLICY is a session- and system-level parameter that can be set to MANUAL or AUTO. The default is AUTO. You can switch back and forth between automatic and manual management of the work areas by setting PGA_AGGREGATE_TARGET and changing the WORKAREA_SIZE_POLICY parameter.

With PGA_AGGREGATE_TARGET, sizing of work areas for all dedicated sessions is automatic and all *_AREA_SIZE parameters are ignored for these sessions. Beginning with version 10 the PGA_AGGREGATE_TARGET is also applicable to shared server sessions.

At any given time, the total amount of PGA memory available to active work areas on the instance is automatically derived from the parameter PGA_AGGREGATE_TARGET. This amount is set to the value of PGA_AGGREGATE_TARGET minus the PGA memory allocated by other components of the system (for example, PGA memory allocated by sessions). The resulting PGA memory is then allotted to individual active work areas based on their specific memory requirement.

Keep in mind that the PGA_AGGREGATE_TARGET does not limit the amount of pga memory usage. It is only a target and is used to dynamically size the process work areas. It does not affect other areas of the pga that are allowed to grow beyond this limit.

There are certain areas of pga that cannot be controlled by initialization parameters. Such areas include pl/sql memory collections such as pl/sql tables and varrays.
Depending on the programming code and amount of data being handled these areas can grow very large (up to 20G internal limit on 10) and can consume large amounts of memory. This memory growth can be controlled by good programming practices. As an example, use LIMIT clause with BULK COLLECT.

See
http://www.oracle.com/technology/oramag/oracle/08-mar/o28plsql.html



Additionally, programming mistakes can also lead to excessive memory usage.
For example a recursive or infinite loop can use up memory. An errorstack will show this code as follows:

----- PL/SQL Call Stack -----
object line object
handle number name
70000003d2d6198 17 package body SCOTT.REGISTRY_PKG
70000003d2d6198 133 package body SCOTT.REGISTRY_PKG
70000003d2d6198 155 package body SCOTT.REGISTRY_PKG
70000003d2d6198 31 package body SCOTT.REGISTRY_PKG
70000003d2d6198 133 package body SCOTT.REGISTRY_PKG
70000003d2d6198 155 package body SCOTT.REGISTRY_PKG
PL/SQL call stack truncated after 1024 bytes.

Refer to following notes for more details on Automatic PGA Memory Management

Note 223730.1 Automatic PGA Memory Managment in 9i and 10g

Note 443746.1 Automatic Memory Management(AMM) on 11g

Can you limit the size of a process?

You can take steps to control the size of a process as discussed above.
However, from within the database framework you cannot place a hard limit on the size of a process by setting any initialization parameters or database configuration.
You can limit the size of a process from the OS side by setting kernel limits or user shell limits.
However, this leads to the ORA-4030 and will cause transaction rollback.


What information needs to be gathered to diagnose?

Resolving memory issues is typically an iterative process requiring gathering of information over a period of time to identify possible trends that typically leads to further data gathering and diagnosis.

As part of initial step some basic questions should be answered, because a memory issue is usually triggered by some change.
When did problem start?
What changes have been made to system, including OS, applications, database installation?
Can the problem be isolated to any particular query, job or application?
If problem can be isolated to particular code, does problem reproduce on other systems?
Does problem code work when run through local connection?

Then, need to gather data from the database and OS.

RDA - This will provide basic OS information that can be reviewed for memory/swap configuration, as well as database configuration.
Alert log - may list errors, trace files and non-default parameters for startup.
Related trace files - initial trace files are typically of not much use for memory issues; need to setup event for proper heapdump as discussed below.
Run database monitoring queries - the queries provided later in this note can be used to identify a particular process growing large or total pga and sga usage per instance.
Run OS monitoring queries - OSwatcher is a good start to see OS information related to memory,swap and relative process sizes
Get heapdumps - methods discussed below on how to gather



Why does my code give ORA-4030 when run through listener connection, but not local connection?

On most unix/linux OS, for local connections, the shadow processes spawned on the database server inherit the ulimits of the OS user that starts the database.
For processes spawned through a listener connection, it is the ulimits of the OS user that starts the listener that are inherited.

Need to verify that memory related resource limits as listed by ulimit -a are set large enough to prevent the ORA-4030.
See below discussion on ulimits.

When PAM authentication is being used on linux, the limits as set by pam may be an issue if starting the db or listener.
See Note 261220.1 Ulimit Command Unusable on RHEL 3.0

What to look at in RDA?

Need to review the ulimits for the OS oracle user that is starting the database and listener.
Typically an RDA may not be run as oracle user, but as root. Need to distinguish if that is the case and then ask for appropriate oracle user ulimits. On most OS, for local connections, the shadow processes spawned on the database server inherit the ulimits of the OS user that starts the database.
For processes spawned through a listener connection, it is the ulimits of the OS user that starts the listener that are inherited.

Look at physical ram and swap/paging configuration.
Depending on the OS, some information may show the free memory and paging used. Look in both the Overview->System Information link
and the Performance -> Overview link for information related to memory. This may include information from various OS commands/utilities such as swap,free,top,vmstat
For some OS, may need to request specific OS related information or try using the OSwatcher utility.

What kernel or shell limits need to be checked?

While Oracle support cannot address all cases due to the numerous OS releases and versions, can make the following general recommendations for typical platforms.

Check the ulimits of OS user starting the database and listener.
In particular we are interested in the data (d) and memory limits (m or v).
The stack (s) is typically not an issue unless set too high on a 32 bit system which can prevent database startup.

typically, all soft settings can be shown with
>ulimit -a

for soft limits of individual resource limit use S
example for data segment size limit
>ulimit -dS

for hard limits of individual resource limit use H
>ulimit -dH

Note that specifying unlimited may actually correspond to a numeric value, although unlimited is listed in the ulimit display.
May need to consult OS documentation or vendor to confirm what actual values may be. If in doubt can always set to a numeric value rather than unlimited.

Additionally, if on RAC, how are database/listener started?
Is lsrnctl used directly to start listener or using srvctl or CRS?

The difference relates back to the ulimits of user starting the listener, srvctl
or the root user limits starting the CRS.

Need to review the limits for data and memory of root user.
Note 758131.1 SOME Client Connection FAIL WITH ORA-4030 ERROR when
connecting using RAC
Note 753516.1 The difference between using srvctl vs using sqlplus for
start/stop one or more database nodes

For further information refer to Note 188149.1 How to Display and Change UNIX Process Resource Limits
or specific OS documentation.

For common kernel limits on various OS see Note 169706.1 Oracle® Database on AIX®,HP-UX®,Linux®,Mac OS® X,
Solaris���,Tru64 Unix® Operating Systems Installation and Configuration Requirements Quick Reference (8.0.5 to 11.1)

On Solaris,AIX and linux, it is the ulimit settings that determine process memory related resource limits. No kernel settings are used to control the process memory limits.


HP
still uses kernel parameters such as following that set hard limits
maxtsiz(_64bit) / maxdsiz(_64bit) / maxssiz(_64bit)

TRU64
make use of  kernel limits
/sbin/sysconfig -q proc
per-proc-data-size/max-per-proc-data-size
per-proc-address-space/max-per-proc-address-space
per-proc-stack-size/max-per-proc-stack-size
vm_rss_maxpercent

In some rare cases have seen some misleading process spawning or memory issues occurred due to other kernel settings that limited number of processes such as a low setting of
NPROC.  If such limits are set low and there are a large number of processes on the server consider increasing this.

How to monitor pga usage from within the database?

Queries can be run against the instance to show pga memory usage.
Depending on the time frame of the problem occurrence these queries need to be run at different time intervals to review for trends.

The queries are best run through a cron job, Oracle scheduler job or on MSwindows using the task scheduler.
An example unix shell script is contained in attachment memleak.sh that can be used to kick off the sql script at specified intervals. The script can be used to run pga_2010script1.sql containing pga memory queries.

For example start meamleak.sh collecting at 20 second increments

>./memleak.sh 20 > memleak.out


Monitor the pga usage for all processes related to an instance from v$sesstat.
Look at trends of individual processes growing in size.

REM v$sesstat pga/uga memory size
select p.spid, s.sid, substr(n.name,1,25) memory, s.value as Bytes from v$sesstat s, v$statname n, v$process p, v$session vs
where s.statistic# = n.statistic#
/* this query currently looks at both uga and pga, if only one of these is desired modify the like clause to pga or uga */
and n.name like '%ga memory%'
and s.sid=vs.sid
and vs.paddr=p.addr
/* --remove comment delimiters to view only certain sizes, i.e. over 10Mbytes
and s.value > 10000000 */
order by s.value asc;


List Largest process.

/* Do Not eliminate all background process because certain background processes do need to be monitored at times */
select pid,spid,substr(username,1,20) "USER" ,program,PGA_USED_MEM,PGA_ALLOC_MEM,PGA_FREEABLE_MEM,PGA_MAX_MEM
from v$process
where pga_alloc_mem=(select max(pga_alloc_mem) from v$process
where program not like '%LGWR%');


Summation of pga based on v$process

REM allocated includes free PGA memory not yet released to the operating system by the server process
select sum(pga_alloc_mem)/1024/1024 as "Mbytes allocated", sum(PGA_USED_MEM)/1024/1024 as "Mbytes used" from v$process;


Summation of pga memory based on v$sesstat

select sum(value)/1024/1024 as Mbytes from v$sesstat s, v$statname n
where
n.STATISTIC# = s.STATISTIC# and
n.name = 'session pga memory';


PGA stats from v$pgastat

select substr(name,1,30), value, unit from v$pgastat;



List all processes including pga size from v$process
Outer join will show if any defunct processes exist without associated session.

set linesize 120
set pagesize 120
column spid heading 'OSpid' format a8
column pid heading 'Orapid' format 999999
column sid heading 'Sess id' format 99999
column serial# heading 'Serial#' format 999999
column status heading 'Status' format a8
column pga_alloc_mem heading 'PGA alloc' format 99,999,999,999
column pga_used_mem heading 'PGA used' format 99,999,999,999
column username heading 'oracleuser' format a12
column osuser heading 'OS user' format a12
column program heading 'Program' format a20

SELECT
p.spid,
p.pid,
s.sid,
s.serial#,
s.status,
p.pga_alloc_mem,
p.PGA_USED_MEM,
s.username,
s.osuser,
s.program
FROM
v$process p,
v$session s
WHERE s.paddr ( + ) = p.addr
and p.BACKGROUND is null /* Remove if need to monitor background processes */
Order by p.pga_alloc_mem desc;


Summation of pga and sga gives a value of total memory usage by oracle instance
--look at total memory used by instance SGA and PGA

select sum(bytes)/1024/1024 as Mbytes from
(select value as bytes from v$sga
union all
select value as bytes from
v$sesstat s,
v$statname n
where
n.STATISTIC# = s.STATISTIC# and
n.name = 'session pga memory'
);

 
 

How to monitor memory usage from the OS on unix/linux?

Memory usage on the OS side needs to be monitored.
A good method on unix/linux boxes is to setup OSwatcher per note:
Note.301137.1 Ext/Pub OS Watcher User Guide

OS Watcher (OSW) is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues. OSW operates as a set of background processes on the server and gathers OS data on a regular basis, invoking such Unix utilities as top, vmstat, netstat and iostat.

Depending on the issue, set the rate data is gathered to reasonable value
If problem quickly reproduces may want low values such as 5 minutes or even lower.
If problem is unknown or long running may want to set to 30 minutes or higher.

Other methods may include customer scripts or 3rd party utilities that monitor OS memory usage.

How to monitor memory usage from the OS on MSwindows?

Run perfmon utility to monitor memory usage.
This tool shows the memory used by the Oracle process more accurately than the information in Task Manager.

Method to do this varies on the windows version, but typical method is as follows:
If you go to the Control Panel (under Start, Settings) and click on Adminstrative Tools you can startup the Performance monitor.
You can also start up a Windows command window and type 'Perfmon'.

This tool shows the memory used by the Oracle process more accurately than the information in Task Manager.
Under the console tree, select Performance Logs and Alerts. Right click on 'Counter Logs' and select 'New Log Settings' to start a new log.
Click on the Add Counters button to add new counters to the log.
Click on 'Process' under 'Performance Object'.
Select counters, 'Private Bytes', 'Thread Count', 'Handles', and 'Virtual Bytes' in the 'Select counters from list' window.
You need to select these counters for the process, 'Oracle' and 'Total' located in the 'Select instance from list' window. Click on the 'Add' button.
Also, select Performance object 'Memory' and select counter 'Available MBytes'. This will track
overall free memory available on the system.

Set the appropriate sample data interval. If problem occurs over a very long period of time such as several hours, then may set the interval to a few minutes.
If problem occurs relatively quickly on order of a few minutes, then set interval to a few seconds.
These logs can be uploaded to oracle support for review and can be imported into MSExcel for graphing to review for trends.
If the log is directly saved as a comma delimited file it can be imported directly. If saved as a binary file, it must first be opened with the permon tool and then saved as a comma delimited file for import into MSExcel.

Monitor memory usage on the 'Virtual Bytes' counter. 'Virtual Bytes' shows the actual "committed memory" that the process is allocated.

Why do we still get ORA-4030 or ORA-12500 on MSwindows 32 bit database after adding more ram?

In a windows 32 bit system the addressable memory is limited to 4G for a process. This translates to about 1.8G of actual memory usage that an oracle process can address including the SGA and PGA usage. This can be increased to about 3G with use of the /3GB switch configured in the boot.ini file.
So even if you have over 4G of ram, a single instance cannot address any more memory above this 4G. This includes the total memory usage of an SGA and the sum of all associated PGAs (threads associated with the oracle process).
An exception to this is Address Windowing Extensions (AWE).
AWE allows a process to address memory above the 4G limit with use of /PAE switch. Oracle uses the AWE to allow usage of additional buffer cache above the 4G limit
with setting of database parameter USE_INDIRECT_DATA_BUFFERS=TRUE.
The AWE window is allocated in the lower 4G addressable memory space and by default is 1G. AWE size can be explicitly set in registry to reduce this size below 1G subject to the calculation as determined by
Note 225349.1.
The threads attached to the oracle process(the PGAs) and other SGA components cannot make use of this AWE and thus cannot make use of any memory above the 4G. So need to consider that even if you setup the /3GB switch and use AWE it may reduce addressable memory available to the PGA.
See Note 225349.1 Implementing Address Windowing Extensions (AWE) or VLM on Windows Platforms that discusses configuration of /3GB and /PAE.

Also, note that with the 3GB switch set it can impact the amount of nonpaged pool space available to the OS. This space is used for async i/o and in an i/o intensive system could lead to i/o errors. The /USERVA switch may need to be set to increase available nonpaged pool space. The default value is 3070.
May need to set this lower to 3030 or lower such as 2090.

How to create a heapdump?

Three methods of gathering a heapdump are shown below depending on problem occurrence.

Method A. Set event in the pfile if general cause of error is unknown
Method B. Set event at session level if problem can be easily reproduced from session
Method C. Attach using oradebug to specific process

Method A
Set event at instance level to fire for any occurrence of the ORA-4030.
The following will set in memory.

ALTER SYSTEM SET EVENTS '4030 trace name heapdump level 536870917;name errorstack level 3';

To turn off these events:
ALTER SYSTEM SET EVENTS '4030 trace name heapdump off;name errorstack off';

If instance will be restarted, need to set in spfile to persist across startup.

ALTER SYSTEM SET EVENT='4030 trace name heapdump level 536870917;name errorstack level 3' scope=spfile;
or
place in pfile immediately after any other events
event= '4030 trace name heapdump level 536870917;name errorstack level 3'

Note that if on a RAC system need to also specify the instance or all instances with sid= in the alter statement.

Specify SID = '*' if you want Oracle Database to change the value of the parameter for all instances.

Specify SID = 'sid' if you want Oracle Database to change the value of the parameter only for the instance sid. This setting takes precedence over previous and subsequent ALTER SYSTEM SET statements that specify SID = '*'.

To remove just these events from spfile:
ALTER SYSTEM SET EVENT='4030 trace name heapdump off;name errorstack off' scope=spfile;

Method B
Set event only at session level.
This will set event at session level and create trace file when error occurs in that session only.

ALTER SESSION SET EVENTS '4030 trace name heapdump level 536870917;name errorstack level 3';

Method C
Can attach to a particular process using oradebug to get a heapdump.
This is useful and required if the process grows large, but does not cause an error.

Reproduce the problem, and monitor process memory growth.

Start a new sqlplus session to attach with oradebug

SQL>sqlplus /nolog
SQL>conn / as sysdba
SQL> alter system set max_dump_file_size=unlimited;
SQL> connect sys as sysdba
SQL> oradebug setospid <OSpid from query above for unix systems>

If on a MSwindows platform you will need to attach using the setorapid instead of setospid
SQL> oradebug setorapid <Orapid from query above>

SQL> oradebug unlimit
SQL> oradebug dump errorstack 10
SQL> oradebug dump heapdump 536870917
SQL> oradebug tracefile_name (shows the path and filename information)
SQL> oradebug close_trace (closes the trace file)
SQL>exit

What heapdump level to gather?

@Internal see note for heapdump level definitions

With an ORA-4030 or process memory growth we are interested in the following heaps:

                Level
top PGA         1 --on release 10 and higher includes top call heap
top UGA        4
Current call     8
User call         16

As of release 9.2.0.5 and higher, also interested in generating the five largest subheaps for each subheap with additional level 536870912

Typical level to gather would be 536870917
On releases lower than 9.2.0.5 will not be able to use this; use level 5

See high amount of 'perm' allocations. When to set 10235 event?

In some cases the heapdump may show large allocations of type 'perm'.
On releases 9.2.0.5 and higher can set event 10235 level 65536 in order to further diagnose what these perm allocations are.
However, this event should only be set under direction from Oracle Support.

This level causes comments to be included with each perm allocation where possible.
The comments show as "cprm" chunks under the "perm" chunks in the PERMANENT section of the heapdump.

IMPORTANT: Do NOT set this event at session level. It can cause ORA-600 dumps.
Set at instance level only.

ALTER SYSTEM SET EVENTS '10235 trace name context forever, level 65536';

To turn off:
ALTER SYSTEM SET EVENTS '10235 trace name context off';

If instance will be restarted, need to set in spfile to persist across startup.

ALTER SYSTEM SET EVENT='10235 trace name context forever, level 65536' scope=spfile;
or
place in pfile immediately after any other events
event = '10235 trace name context forever, level 65536'

Note that if on a RAC system need to also specify the instance or all instances with sid= in the alter statement.

To turn off:
ALTER SYSTEM SET EVENT='10235 trace name context off' scope=spfile;



Configurations leading to excessive memory usage problems

High swap usage on Solaris
-------------------------------
DISM may be in use.

Solaris Dynamic Intimate Shared Memory (DISM) provides shared memory that is dynamically resizable. DISM means that applications can respond to changes in memory availability by dynamically increasing or
decreasing the size of optimized shared memory segments. Oracle uses DISM for its dynamic System Global Area (SGA) capability.

DISM is available on Solaris 9 and higher.

Oracle database release 9i and higher uses DISM depending on certain database parameter settings.
Oracle 9i and higher, if SGA_MAX_SIZE > SGA_TARGET (or sum of sga compenents) then DISM is used.
On 11G, DISM is also used if MEMORY_TARGET or MEMORY_MAX_TARGET is set.
Otherwise, ISM is used if DISM is not enabled.

There are two issues to be aware of when DISM is being used:

First, DISM (unlike ISM), requires a swap reservation for all pages, whether or not they're allocated.
This basically means need to have at least as much swap as the SGA size for instances that
are using DISM. An ORA-4030 could result on systems with inadequate swap space.

Second, additional issue with DISM per Sun bug 6559612.
If the ora_dism daemon fails to start, or dies for any reason, that can cause excessive memory/swap space usage to occur relative to the amount of async IO activity. ORA-4030 errors may occur and you could also see other errors in the database alert log such as following:

ORA-17500: ODM err:ODM ERROR V-41-4-1-253-12 Not enough space

To determine if DISM is being used for an Oracle instance, you can look on the OS side with ps tool.
>ps -aef | grep dism

If dism is being used it will show as a process in format ora_dism_$ORACLE_SID

To disable use of DISM can simply unset SGA_MAX_SIZE, MEMORY_TARGET, MEMORY_MAX_TARGET
 or set sga_max_size=sga_target
or allow sga_max_size to default to sum of the sga components.


ZFS file system
---------------
Oracle processes such as exp utility writing to a ZFS file system appear to use all of system memory.
The ZFS adaptive replacement cache (ARC) tries to use most of a system's available memory to cache file system data.
The default is to use all of physical memory except 1 Gbyte. As memory pressure increases, the ARC relinquishes memory."
See the ZFS Best Practices Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_Administration_Considerations

or latest Solaris 10 documentation for configuring zfs_arc_max to limit cache memory.


Mounting Oracle_home location in concurrent I/O mode
----------------------------------------------------
Having an ORACLE_HOME on a filesystem mounted with "cio" option is not supported.
Such a configuration will cause, installation, relinking and other unexpected problems.
One such problem is high memory use for oracle background processes
CIO creates private memory allocations for the executable files in ORACLE_HOME that cannot be shared or reused.
This leads to "orphans" memory segments that are not shown from the database side (v$process),
but the OS shows more and more memory growth--particularly for background processes.


Unused free memory on linux system with hugepages configured
------------------------------------------------------------
If SGA size > HugePages size setting, then hugepages cannot be used.
This results in free memory never being used.
The sizing of hugepages should be just greater than the sga size in order for hugepages to be
implemented and to minimize wasted memory.

Verify if hugepages are being used.

>grep Huge /proc/meminfo

HugePages_Total: 1350
HugePages_Free: 5
Hugepagesize: 2048 kB

HugePages_Free should be much lower than HugePages_Total. If equivalent, then hugepages may not be
used because the SGA > hugepages.

Example:

sum of sga components or SGA_MAX_SIZE = 2648M

Need to configure at least that much space for hugepages.
Size for 2700M with a 2K hugepage size == 2764800Kb / 2048kb = 1350 pages required.
If you were to set the SGA larger than this the hugepages could not be used. You would need to
adjust the number of hugepages higher.

See Note 361323.1 HugePages on Linux for further information regarding hugepages.


Excessive memory/swap on AIX when large pages not configured correctly
----------------------------------------------------------------------------
If database parameter LOCK_SGA=true then large pages will be used.
If not configured properly, that can lead to excessive memory/swap usage.
Configure large pages for the SGA per
Note 372157.1 How to enable Large Page Feature on AIX-Based Systems
The utility svmon -G will show large page size=16 MB in use if properly configured.
Note, also that on 10.2.0.4 large pages are not used properly due to unpublished Bug 7226548.
See Note:740603.110.2.0.4 not using large pages on AIX.





References

NOTE:199746.1 - How to Resolve ORA-4030 Errors on UNIX
NOTE:233869.1 - Diagnosing and Resolving ORA-4030 errors