Friday, 31 May 2013

RMAN has deleted backupset but not freed up space to OS

Hi,

I am having strange issue in our production system. And it caused to fail rman subsequent backup. 
The issue was previous deleted rman backupset file still being access by other process. And to resolve this issue lsof +L1 become boon to me. 

Here I have resolve the issue.

[root@PRODB01 backupset]#
[root@PRODB01 backupset]# df -h /flash_recovery_area/
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg1
                      1.5T  1.1T  343G  77% /flash_recovery_area

[root@PRODB01 backupset]# cd /flash_recovery_area/
data_pump/  PRODDB/   lost+found/

[root@PRODB01 backupset]# cd /flash_recovery_area/

[root@PRODB01 flash_recovery_area]# du -sh *
337G    data_pump
386G    PRODDB
16K     lost+found

[root@PRODB01 flash_recovery_area]# lsof +L1 /flash_recovery_area | grep backupset
rsync     25269    root    3r   REG 253,10 396905644032     0  83329027 /flash_recovery_area/PRODDB/backupset/2013_05_28/o1_mf_nnndf_BACKUP_LIVEDB_8t9txcvw_.bkp (deleted)



[root@PRODB01 flash_recovery_area]#
[root@PRODB01 flash_recovery_area]# ps -ef | grep 25269
root      8980 14727  0 11:55 pts/6    00:00:00 grep 25269
root     25269 25242  0 May28 ?        00:27:58 rsync --verbose --archive --hard-links --force --delete --stats -progress --links --times --recursive --relative --exclude /proc --exclude /dev --exclude /mnt --exclude /rman2 --exclude /a --exclude /vol --exclude /tmp --exclude /rman --exclude /willow \ --exclude /var/spool/lpd --delete-excluded --exclude core --ignore-errors --partial --backup --backup-dir=/scorpio/scorpio9/PRODB01/28-05-13 -e ssh -v / scorpio:/scorpio/scorpio9/PRODB01/current
root     25277 25269  3 May28 ?        02:26:35 ssh scorpio rsync --server -vvblHogDtprRe.is --backup-dir /scorpio/scorpio9/PRODB01/28-05-13 --delete-excluded --force --partial --ignore-errors . /scorpio/scorpio9/PRODB01/current

[root@PRODB01 flash_recovery_area]# kill -9 25269
[root@PRODB01 flash_recovery_area]# df -h /flash_recovery_area/
 1.5T  744G  691G  52% /3parrman

Job Done ! Space revert it back !



Wednesday, 29 May 2013

Prereq check warning for missing rpm package libstdc++43 on SUSE 11

on SuSE 11 SP1 - 11.2.0.3 Grid Installation - ./runInstaller and prereq check is failing because compat-libstdc++-33.3.2.3-47.3 is not installed.

on SuSE 11 SP2 - 11.2.0.3 Grid Installation - ./runInstaller and prereq check is failing because the following packages are not installed:-

 Package: compat-libstdc++-33-3.2.3-47.3
 Package: libstdc++43-4.3.4_20091019-0.7.35 (s390x)
 Package: libgcc43-4.3.4_20091019-0.7.35.

This prereq issue is also seen on generic Linux SuSE 11 platforms.

Changes

SuSE 11 SP1 SP2
11.2.0.3

Cause

This is documented in unpublished Bug 13445763 and Bug 14366019
The rpm compat-libstdc++-33.3.2.3-47.3 is not available on  SuSE 11. The rpm libstdc++-33 provides the required files.  Both the 32-bit version and 64-bit versions are required.
This is the same as the 11.2.0.2 requirements for this platform.
On SuSE 11 SP2 the libstdc++43 and libgcc43 checks fail as the rpm has changed to libstdc++46 but providing both the 32-bit version and 64-bit versions of libstdc++43-devel rpms are installed - this is not a problem.

Solution

You can safely ignore these warnings - click on ignore button in the top right hand corner of the screen.

You should, however, ensure that you meet all the other rpm prereqs.  To assist you, please refer to Getting Started - 11gR2 Grid Infrastructure,SI(Single Instance), ASM and DB (IBM: Linux on System z) Document 1306465.1 - specifically the rpm validation section.

For SLES11 - you should download the attachment at the end of the above article:

- SLES 11 - 11.2. Grid Infrastructure, SIHA, DB Install

To assist further - here is the "rpm -qa" output from a SuSE 11 SP2 in-house system which resulted in a successful installation of 11.2.0.3 Grid Infrastructure.:-
zhost:/home/oracle # rpm -qa | grep libstdc
libstdc++46-32bit-4.6.1_20110701-0.13.9
libstdc++43-devel-4.3.4_20091019-0.22.17
libstdc++33-3.3.3-11.9
libstdc++46-4.6.1_20110701-0.13.9
libstdc++33-32bit-3.3.3-11.9
libstdc++43-devel-32bit-4.3.4_20091019-0.22.17
libstdc++-devel-4.3-62.198

zhost:/home/oracle # rpm -qa | grep libgcc
libgcc46-32bit-4.6.1_20110701-0.13.9
libgcc46-4.6.1_20110701-0.13.9

 which gcc
/usr/bin/gcc
zhost:/home/oracle # gcc --version
gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

zhost:/home/oracle/sjc # c++ --version
c++ (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Friday, 17 May 2013

TROUBLESHOOTING GUIDE (TSG) - ORA-20: MAXIMUM NUMBER OF PROCESSES (%S) EXCEEDED [ID 1287854.1]



Applies to:

Oracle Server - Enterprise Edition - Version 10.1.0.2 to 11.2.0.3 [Release 10.1 to 11.2]
Information in this document applies to any platform.

Symptoms

What are the recommended steps to diagnose an ORA-20 - maximum number of processes (%s) exceeded?

Cause

Connections to the database have caused the current number of operating system processes associated with the Oracle Instance to exceeed the PROCESSES database parameter

Solution

The typical remedy to an ORA-20 - maximum number of processes (%s) exceeded ... is to increase the PROCESSES parameter ... This is usually only a temporary solution

Here are the steps to diagnose the cause of the ORA-20

1) The first step is to gather information about the processes that exist when the ORA-20 occurs

connect ... as sysdba

set markup html on
set pagesize 30
spool processes_sessions.html

select
p.username "V$PROCESS - OS USERNAME",
p.terminal,
p.program,
s.username "V$SESSION - USERNAME",
s.command,
s.status,
s.server,
s.process,
s.machine,
s.port,
s.terminal,
s.program,
s.sid,
s.serial#,
p.spid
FROM v$session s,v$process p
WHERE p.addr=s.paddr
order by p.background desc;

spool off
exit;
NOTE .. depending on version ... S.PORT may need to be removed from the query 

2) Look for patterns in the output generated in step #1 (processes_sessions.html)
SAMPLE OUTPUT
V$PROCESS - OS USERNAME
TERMINAL
PROGRAM
V$SESSION - USERNAME
COMMAND
STATUS
SERVER
PROCESS
MACHINE
PORT
TERMINAL
PROGRAM
oracle
UNKNOWN
oracle@filnx10 (TNS V1-V3)
SYS
3
ACTIVE
DEDICATED
27140
filnx10
0
pts/2
sqlplus@filnx10 (TNS V1-V3)
oracle
UNKNOWN
oracle@filnx10 (TNS V1-V3)









oracle
UNKNOWN
oracle@filnx10 (Q001)

0
ACTIVE
DEDICATED
24739
filnx10
0
UNKNOWN
oracle@filnx10 (Q001)
oracle
UNKNOWN
oracle@filnx10 (VKTM)

0
ACTIVE
DEDICATED
24534
filnx10
0
UNKNOWN
oracle@filnx10 (VKTM)
oracle
UNKNOWN
oracle@filnx10 (GEN0)

0
ACTIVE
DEDICATED
24538
filnx10
0
UNKNOWN
oracle@filnx10 (GEN0)
oracle
UNKNOWN
oracle@filnx10 (DIAG)

0
ACTIVE
DEDICATED
24540
filnx10
0
UNKNOWN
oracle@filnx10 (DIAG)
oracle
UNKNOWN
oracle@filnx10 (DBRM)

0
ACTIVE
DEDICATED
24542
filnx10
0
UNKNOWN
oracle@filnx10 (DBRM)
oracle
UNKNOWN
oracle@filnx10 (DIA0)

0
ACTIVE
DEDICATED
24544
filnx10
0
UNKNOWN
oracle@filnx10 (DIA0)
oracle
UNKNOWN
oracle@filnx10 (MMAN)

0
ACTIVE
DEDICATED
24546
filnx10
0
UNKNOWN
oracle@filnx10 (MMAN)
oracle
UNKNOWN
oracle@filnx10 (DBW0)

0
ACTIVE
DEDICATED
24548
filnx10
0
UNKNOWN
oracle@filnx10 (DBW0)
oracle
UNKNOWN
oracle@filnx10 (LGWR)

0
ACTIVE
DEDICATED
24550
filnx10
0
UNKNOWN
oracle@filnx10 (LGWR)
oracle
UNKNOWN
oracle@filnx10 (CKPT)

0
ACTIVE
DEDICATED
24552
filnx10
0
UNKNOWN
oracle@filnx10 (CKPT)
oracle
UNKNOWN
oracle@filnx10 (SMON)

0
ACTIVE
DEDICATED
24554
filnx10
0
UNKNOWN
oracle@filnx10 (SMON)
oracle
UNKNOWN
oracle@filnx10 (RECO)

0
ACTIVE
DEDICATED
24556
filnx10
0
UNKNOWN
oracle@filnx10 (RECO)
oracle
UNKNOWN
oracle@filnx10 (MMON)

0
ACTIVE
DEDICATED
24558
filnx10
0
UNKNOWN
oracle@filnx10 (MMON)
oracle
UNKNOWN
oracle@filnx10 (MMNL)

0
ACTIVE
DEDICATED
24560
filnx10
0
UNKNOWN
oracle@filnx10 (MMNL)
oracle
UNKNOWN
oracle@filnx10 (QMNC)

0
ACTIVE
DEDICATED
24576
filnx10
0
UNKNOWN
oracle@filnx10 (QMNC)
oracle
UNKNOWN
oracle@filnx10 (SMCO)

0
ACTIVE
DEDICATED
26415
filnx10
0
UNKNOWN
oracle@filnx10 (SMCO)
oracle
UNKNOWN
oracle@filnx10 (W000)

0
ACTIVE
DEDICATED
26417
filnx10
0
UNKNOWN
oracle@filnx10 (W000)
oracle
UNKNOWN
oracle@filnx10 (CJQ0)

0
ACTIVE
DEDICATED
24617
filnx10
0
UNKNOWN
oracle@filnx10 (CJQ0)
oracle
UNKNOWN
oracle@filnx10 (Q000)

0
ACTIVE
DEDICATED
24737
filnx10
0
UNKNOWN
oracle@filnx10 (Q000)
oracle
UNKNOWN
oracle@filnx10 (PMON)

0
ACTIVE
DEDICATED
24530
filnx10
0
UNKNOWN
oracle@filnx10 (PMON)
oracle
UNKNOWN
oracle@filnx10 (PSP0)

0
ACTIVE
DEDICATED
24532
filnx10
0
UNKNOWN
oracle@filnx10 (PSP0)


     * If there are many V$SESSION.STATUS = INACTIVE ...

            This means that there are many user sessions that have connected but are not doing anything

            POSSIBLE SOLUTION : Enable dead connection detection (DCD) and user resource limits

                    A discussion of Dead Connection Detection, Resource Limits, V$SESSION, V$PROCESS and OS processes (Doc ID 601605.1)

     * If there are many V$SESSION.PROGRAM (and possibly many V$SESSION.STATUS = INACTIVE) entries from the same program ...

            POSSIBLE SOLUTIONS :

            1) Examine the program (often a web server) to see if the program has a setting to abandon a connection after X number of seconds and reconnect (a respawn event)

                If the program has such a setting ... a slowdown in either network or database performance should to be investigated ... as this would cause a timeout in the application .. 
                  and thus a reconnect ... setting a longer 'timeout' in the application often will resolve future occurrences of ORA-20

             2) Like the solution above ... setting DCD and Resource Limits ... often resolves these as well 

     * If there are many rows with data for V$PROCESS but no data for V$SESSION ...

            This is a problem on the operating system side and such sessions will need to be killed manually with KILL -# (UNIX / LINUX) or ORAKILL (Windows)

             This condition needs to be examined by an operating system expert (System Administrator or OS Vendor)

             WHY? - When an Oracle session is terminated ... whether by the user logging out ... a kill session by a privileged user ... or even the user reaching a resource limit set in their profile
                           Oracle SMON (or PMON) will first clean up the database resources (rollback transactions ... release locks etc)
                           Oracle then will remove the entry in V$SESSION as the session is now 'cleaned up'
                           Oracle will then request that the operating system terminate any OS processes associated with the former Oracle Session

               The Oracle Kernel cannot force the OS to terminate the processes .. this occurs at the OS level only ... all it can do is request that they be terminated
NOTE: This is a living document ... Oracle encourages comments as to other solutions / scenarios ... so please feel free to add a remark

References

NOTE:1050281.1 - Getting ORA-00020/ ORA-00018 With A High Number Of CPUs Regardless Of How High The Related DB Parameters Are Set.
NOTE:169706.1 - Oracle Database (RDBMS) on Unix AIX,HP-UX,Linux,Mac OS X,Solaris,Tru64 Unix Operating Systems Installation and Configuration Requirements Quick Reference (8.0.5 to 11.2)
NOTE:458527.1 - ORA-00020 on an ASM instance
NOTE:601605.1 - A discussion of Dead Connection Detection, Resource Limits, V$SESSION, V$PROCESS and OS processes

Large Number of Sessions in Database with Program Name Oraagent.bin

Applies to:
Oracle Net Services - Version 11.2.0.2 and later
Oracle Server - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.
***Checked for relevance on 26-DEC-2012***

Symptoms
11.2.0.2 RAC database shows many sessions with program name of oraagent.bin<hostname>, which are never removed.
Sample output :
SQL> select s.sid, s.logon_time, p.spid, s.process, s.program from
v$session s, v$process p where s.paddr = p.addr and s.program like '%oraagent%' and s.logon_time < SYSDATE-1 order by s.logon_time;
SID      LOGON_TIME         SPID      PROCESS   PROGRAM
-------- -------------------------- ----------- --------------- --------------------------------------------------
1015    09-03-2011:20:44:18   10822      10623          oraagent.bin@sample.com (TNS V1-V3)
293      09-03-2011:20:44:18   10824      10623          oraagent.bin@sample.com (TNS V1-V3)
871      09-03-2011:20:44:18   10833      10623          oraagent.bin@sample.com (TNS V1-V3)
11        09-03-2011:22:25:56   30794      30464          oraagent.bin@sample.com (TNS V1-V3)
436      09-03-2011:22:47:58   20707      20514          oraagent.bin@sample.com (TNS V1-V3)
152      10-03-2011:00:30:51   2228        2093            oraagent.bin@sample.com (TNS V1-V3)
1017    10-03-2011:00:30:51   2232        2093            oraagent.bin@sample.com (TNS V1-V3)
294      10-03-2011:01:33:23   14541      14307          oraagent.bin@sample.com (TNS V1-V3)
1013    10-03-2011:01:33:23   14545      14307          oraagent.bin@sample.com (TNS V1-V3)
151      10-03-2011:01:33:23   14546      14307          oraagent.bin@sample.com (TNS V1-V3)
295      10-03-2011:03:54:16   32539      32384          oraagent.bin@sample.com (TNS V1-V3)
730      10-03-2011:03:54:16   32541      32384          oraagent.bin@sample.com (TNS V1-V3)
1018    10-03-2011:04:53:25   20203      19882          oraagent.bin@sample.com (TNS V1-V3)
874      10-03-2011:04:53:25   20197      19882          oraagent.bin@sample.com (TNS V1-V3)
586      10-03-2011:04:53:25   20195      19882          oraagent.bin@sample.com (TNS V1-V3)
731      10-03-2011:06:11:05   12921      12728          oraagent.bin@sample.com (TNS V1-V3)
153      10-03-2011:06:43:13   4417        4034            oraagent.bin@sample.com (TNS V1-V3)
431      10-03-2011:06:43:13   4419        4034            oraagent.bin@sample.com (TNS V1-V3)
297      10-03-2011:06:43:13   4425        4034            oraagent.bin@sample.com (TNS V1-V3)
732      10-03-2011:07:21:51   1027        32500          oraagent.bin@sample.com (TNS V1-V3)
8          10-03-2011:09:31:04   19914      18710          oraagent.bin@sample.com (TNS V1-V3)
734      10-03-2011:12:53:47   22039      21506          oraagent.bin@sample.com (TNS V1-V3)
4          10-03-2011:12:53:47   22040      21506          oraagent.bin@sample.com (TNS V1-V3)
309      10-03-2011:13:20:12   7483        6420            oraagent.bin@sample.com (TNS V1-V3)
156      10-03-2011:13:20:12   7481        6420            oraagent.bin@sample.com (TNS V1-V3)
597      10-03-2011:15:36:50   28148      27237          oraagent.bin@sample.com (TNS V1-V3)
882      10-03-2011:15:36:50   28151      27237          oraagent.bin@sample.com (TNS V1-V3)
450      10-03-2011:19:54:06   29750      28921          oraagent.bin@sample.com (TNS V1-V3)
1030    10-03-2011:19:54:06   29749      28921          oraagent.bin@sample.com (TNS V1-V3)
884      10-03-2011:21:39:50   32278      31873          oraagent.bin@sample.com (TNS V1-V3)
741      10-03-2011:21:39:50   32272      31873          oraagent.bin@sample.com (TNS V1-V3)
1016    10-03-2011:22:43:32   16557      16077          oraagent.bin@sample.com (TNS V1-V3)
446      10-03-2011:23:17:10   5454        4483            oraagent.bin@sample.com (TNS V1-V3)
13        10-03-2011:23:17:10   5458        4483            oraagent.bin@sample.com (TNS V1-V3)
306      12-03-2011:15:35:10   24931      24769          oraagent.bin@sample.com (TNS V1-V3)
590      13-03-2011:02:18:55   30230      29814          oraagent.bin@sample.com (TNS V1-V3)
441      13-03-2011:02:18:55   30232      29814          oraagent.bin@sample.com (TNS V1-V3)
875      13-03-2011:02:41:28   17367      1044            oraagent.bin@sample.com (TNS V1-V3)

These hung processes numbers will build up over time and can cause issues along the lines of maximum  process limits to be exhausted on the system.ORA-00020: maximum number of processes (xxx) exceeded, might be triggered.
PS output shows the processes are local, ie bequeath connection from the same machine.
$ ps -ef |grep -i 1044
oracle 1044 1 0 Mar15 ? 00:00:00 oracleOITM21 (DESCRIPTION=(LOCAL =YES)(ADDRESS=(PROTOCOL=beq)))

LSOF on the process will show multiple pipes for the process.
$ lsof -p 1044 | grep pipe
oracle 1044 oracle 188r FIFO 0,6 187781116 pipe
oracle 1044 oracle 189w FIFO 0,6 187781116 pipe
oracle 1044 oracle 212r FIFO 0,6 187781117 pipe
oracle 1044 oracle 213r FIFO 0,6 187781062 pipe
oracle 1044 oracle 214w FIFO 0,6 187781062 pipe
oracle 1044 oracle 215r FIFO 0,6 187781063 pipe
oracle 1044 oracle 216w FIFO 0,6 187781063 pipe
oracle 1044 oracle 217w FIFO 0,6 187781117 pipe
oracle 1044 oracle 218r FIFO 0,6 187781139 pipe
oracle 1044 oracle 221w FIFO 0,6 187781140 pipe

Strace of hung process shows
1044 07:35:57.232994 read(30, <unfinished ...>

A further sympton can be that the operating system process is seen, but there is no entry in the dictionary for the session.
Changes
Upgrade to 11.2.0.2
Cause
The cause of this problem has been identified in RAC Bug:11877079 Hundreds of oraagent.bin@hostname sessions in 11.2.0.2 database.
Solution
As the hung process are local , they are using IPC protocol. Timeout parameters such as DCD or send and receive timeout, can not be used. These are only for TCP.

Workaround would be to remove the process from operating system level, ie a kill -9 <PID>
Issue will be fixed in the patch set release 11.2.0.3

Download patch for your operating system as required. Oracle Developement recommended more than one patch to be installed.

At the time of writing the following are available

Linux
o MergePatch:11938226 for 11.2.0.2.1 (11804954 11744313 11871469 11877079)
o MergePatch:12334600 for 11.2.0.2.1 GI Bundle 1 (10299006 11877079)
o MergePatch:12347844 for 11.2.0.2.2 (11744313 10299006 11069614 1168409 11871469 11977079)
o MergePatch:12531175 for 11.2.0.2.2 + PSU ( 11877079 10299006)

Linux and Solaris
o MergePatch:12362639 for 11.2.0.2.2 GI bundle 2 (11877079 10299006)
References
NOTE:1287496.1 - Many Connections From oraagent.bin to ASM or Database Instance (Likely ORA-00020)
BUG:11877079 - HUNDREDS OF ORAAGENT.BIN@HOSTNAME SESSSIONS IN 11.2.0.2 DATABASE