ORA-00600问题排查与分析实例


昨天处理了一起ORA-00600的错误,其中也经历了各种曲折,真是雾里看花,看透了之后发现很多问题都是有原因的。起初是开发说有一个job运行的时候报错了,Oracle数据库版本是11.2.0.2.0

等到问题提交到我这,客户已经检查了一些信息了。但是还是没有结论。
 对于这个问题,我还是照例开始检查数据库日志。
 在那个时间段内里出现了ora-00600的错误。
Wed Jun 10 13:47:17 2015
 Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p070_1200.trc  (incident=2124332):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2124332/PRODB_p070_1200_i2124332.trc
 Wed Jun 10 13:48:04 2015
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.

在这个ora-00600错误前后又碰到了几个奇怪的ora错误,我把错误日志按照时间先后来排列一下。

Thread 1 advanced to log sequence 82829 (LGWR switch)
  Current log# 1 seq# 82829 mem# 0: /dbccbPR1/oracle/PRODB/redolog_A1/redo/redo01A.log
  Current log# 1 seq# 82829 mem# 1: /dbccbPR1/oracle/PRODB/redolog_B1/redo/redo01B.log
 Archived Log entry 82894 added for thread 1 sequence 82828 ID 0xb8c6d509 dest 1:
 Wed Jun 10 07:10:17 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18508.trc.
 Wed Jun 10 07:10:21 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18593.trc.
 Wed Jun 10 07:12:14 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_16505.trc.
 Wed Jun 10 07:26:54 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18558.trc.
 Wed Jun 10 07:52:47 2015
 Thread 1 advanced to log sequence 82830 (LGWR switch)
  Current log# 2 seq# 82830 mem# 0: /dbccbPR1/oracle/PRODB/redolog_A2/redo/redo02A.log
  Current log# 2 seq# 82830 mem# 1: /dbccbPR1/oracle/PRODB/redolog_B2/redo/redo02B.log

 Wed Jun 10 12:09:40 2015
 Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x210] [PC:0x193F4DF, kxfpqrclb()+71] [flags: 0x0, count: 1]
 Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p135_15651.trc  (incident=2116204):
ORA-07445: exception encountered: core dump [kxfpqrclb()+71] [SIGSEGV] [ADDR:0x210] [PC:0x193F4DF] [Address not mapped to object] []
ORA-10382: parallel query server interrupt (reset)
 Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2116204/PRODB_p135_15651_i2116204.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Wed Jun 10 12:09:41 2015
 Dumping diagnostic data in directory=[cdmp_20150610120941], requested by (instance=1, osid=15651 (P135)), summary=[incident=2116204].
 Wed Jun 10 12:09:44 2015
 Sweep [inc][2116204]: completed
 Sweep [inc2][2116204]: completed
 Wed Jun 10 12:09:59 2015

 Wed Jun 10 13:46:27 2015
 Archived Log entry 82967 added for thread 1 sequence 82901 ID 0xb8c6d509 dest 1:
 Wed Jun 10 13:47:17 2015
 Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p070_1200.trc  (incident=2124332):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2124332/PRODB_p070_1200_i2124332.trc
 Wed Jun 10 13:48:04 2015

 Archived Log entry 82993 added for thread 1 sequence 82927 ID 0xb8c6d509 dest 1:
 Wed Jun 10 14:38:53 2015
 Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_15548.trc  (incident=2110436):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pmucalm coll)
 Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2110436/PRODB_ora_15548_i2110436.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.

为了更加清晰,我把对应的ORA错误和数据库负载联系在一起。
对于deadlock的错误,很可能是应用死锁造成的,简单查看了下trace日志,做了基本确认,就交给开发去分析这部分了。
从后续的错误情况来看,似乎和后续的问题没有直接关系。我们暂时先放下这个deaklock的错误。所以在图中没有标注出来。
 其它三个都用红色标注出来。可以看出在负载开始增加的几个时间点里,依次发生了几个ORA错误。

第一个错误。
ORA-07445: exception encountered: core dump [kxfpqrclb()+71] [SIGSEGV] [ADDR:0x210] [PC:0x193F4DF] [Address not mapped to object] []
ORA-10382: parallel query server interrupt (reset)
可以从日志信息看出,似乎是和并行是相关的。
 对于这个错误。在metalink 中查到一篇有些相似的文章。
ORA-07445:[kxfpqrclb()+72] [SIGSEGV] And ORA-10382 (Doc ID 1987833.1)
看了solution让我有些失望,需要升级到12.2版本。

SOLUTION

Unpublished Bug 16682786 : HIT ORA-7445 [KXFPQRCLB+1691] WHEN RUN SHARED CURSOR TEST

The base bug is fixed in 12.2 release.

Check patch downling link for availability of patch, if patch does not exists then create an SR with oracle support with output of "opatch lsinventory -details" to get a patch.. 
暂时也不明朗,继续跳过看下一个错误。
第二个错误
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
对于这个问题,metalink中确实有一个相关的文章 Query Fails with ORA-00600: Internal Error Code, Arguments: [srsnext_3] (Doc ID 1589589.1)
但是对于这个问题,提供的solution让我也有些无奈。因为段时间内确实没有升级的计划。

CAUSE

Bug 11852469 : TS11.2.0.3V3 - TRC - SRSNEXT.

Rediscovery information:

If the srsnext_3 internal error is raised and the query involves statistical functions or other aggregates that are treated as distinct aggregates then you may be encountering this problem.

SOLUTION

Apply patch 11852469 if it exists for your version/platform 

or

Apply patchset 11.2.0.3 where the fix is included

第三个错误。
Archived Log entry 82993 added for thread 1 sequence 82927 ID 0xb8c6d509 dest 1:
 Wed Jun 10 14:38:53 2015
 Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_15548.trc  (incident=2110436):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pmucalm coll)
 Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2110436/PRODB_ora_15548_i2110436.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 
 
对于这个错误,metalink中有一篇相关的文章。诊断并解决 ORA-4030 错误 (Doc ID 1548826.1)

从metalink的描述来看,该错误意味着 Oracle Server 进程无法从操作系统分配更多内存。该内存由 PGA(Program Global Area)组成,其内容取决于服务器配置。对于专用的服务器进程,内存包含堆栈以及用于保存用户会话数据、游标信息和排序区的 UGA(User Global Area)。在多线程配置中(共享服务器),UGA 被分配在 SGA(System Global Area)中,所以在这种配置下 UGA 不是造成 ORA-4030 错误的原因。
因此,ORA-4030 表示进程需要更多内存(堆栈 UGA 或 PGA)来执行其任务。
 看起来是缓存配置出问题了。
 先卖个关子,其实事实并非如此,而且ora-00600的错误,如果不是反复出现,严重影响,是不会直接去考虑打补丁,可能通过一些其它的方式去做。后续会结合一些分析方法来看。

更多详情见请继续阅读下一页的精彩内容:

  • 1
  • 2
  • 下一页

相关内容