ORA-12516故障解决


早上刚上班,同事告诉我数据库连不上了,提示“ORA-12516”错误,我尝试通过PL/SQL Developer远程连接数据库,果然,报错了“ORA-12516: TNS: 监听程序无法找到匹配协议栈的可用句柄”;接着我通过远程桌面登录服务器,尝试用sys用户登录数据库,报了同样的错误。奇怪,昨天下班时还好好的。

我上网查了一下,这个报错一般是由于数据库的当前会话数不足造成的,相关的参数有两个:processes和sessions。我想查一下数据库这两个参数,但是sys用户无法登陆,真是着急。后来在朋友的建议下,采取以下步骤,顺利解决了这个问题。

a.关闭listener,禁止新的连接;
b.杀掉local=no的部分或者全部进程(根据业务的重要性),杀掉几个,保证sys用户能登陆;
c.登进去看看哪个业务出问题了,杀掉出问题的用户进程;
d.检查数据库;
e.启动listener;

 介绍一下我的操作环境:
操作系统:Windows Server 2008 R2
数据库:Oracle 10g
     
      首先,通过lsnrctl stop关闭监听器,禁止新的连接,以确保第二步能够执行成功;
    第二,关闭了两个连接数据库的应用程序,然后尝试用sys用户登录数据库,登录成功;
    第三,查看了processes和sessions两个初始化参数值,分别为150、170,均为默认值;

SQL>

SQL> show parameter processes

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

aq_tm_processes integer 0

db_writer_processes integer 3

gcs_server_processes integer 0

job_queue_processes integer 10

log_archive_max_processes integer 2

processes integer 150

SQL> show parameter sessions

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

java_max_sessionspace_size integer 0

java_soft_sessionspace_limit integer 0

license_max_sessions integer 0

license_sessions_warning integer 0

logmnr_max_persistent_sessions integer 1

sessions integer 170

shared_server_sessions integer

SQL>
      第四,通过select sid,serial#,program,terminal from v$session;查看当前所有会话信息,从当时的结果可以看到,有一百多条记录,已经超过了数据库的session上限;而且,除了Oracle自身的十几个会话外,其余一百多个会话都是同一个terminal。由此,找出了故障点所在(这台设备是昨晚刚刚安装的一台终端)。
    第五,关闭故障设备上的应用程序,再次通过select sid,serial#,program,terminal from v$session;查看当前所有会话信息,查询结果显示只剩下二十多条会话信息,考虑到Oracle自身的十几个会话外和同时启动的几个应用程序,应该是正常的;
    第六,启动listener,尝试通过其他客户端连接数据库,一切正常,到此故障解决;
    接下来,我想看一下究竟是什么原因导致了这次故障,继续;
    第七,查看报警日志,在日志中看到了大量的Process m000 died报警;

Wed Apr 29 21:27:31 2015

ksvcreate: Process(m000) creation failed

Wed Apr 29 21:28:32 2015

Process m000 died, see its trace file

Wed Apr 29 21:28:32 2015

ksvcreate: Process(m000) creation failed

Wed Apr 29 21:29:33 2015

Process m000 died, see its trace file
    第八,找到对应时间的trace文件,看到了“ORA-00020: maximum number of processes 150 exceeded Died during process startup with error 20 (seq=5413)”语句,原来是连接数超过了阀值,数据库无法再建立新的连接,所以报错。

Dump file c:\\oracle\\product\\10.2.0\\admin\\hoegh\\bdump\\hoegh_ora_8032.trc

Wed Apr 29 21:28:31 2015

ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Windows NT Version V6.1 Service Pack 1

CPU : 24 - type 8664, 12 Physical Cores

Process Affinity : 0x0000000000000000

Memory (Avail/Total): Ph:3339M/8181M, Ph+PgF:10815M/16361M

Instance name: hoegh

Redo thread mounted by this instance: 1

Oracle process number: 0

Windows thread id: 8032, image: ORACLE.EXE

ORA-00020: maximum number of processes 150 exceeded

Died during process startup with error 20 (seq=5413)

OPIRIP: Uncaught error 20. Error stack:

ORA-00020: maximum number of processes (150) exceeded

Dump file c:\\oracle\\product\\10.2.0\\admin\\hoegh\\bdump\\hoegh_ora_8032.trc

Thu Apr 30 00:19:05 2015

ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Windows NT Version V6.1 Service Pack 1

CPU : 24 - type 8664, 12 Physical Cores

Process Affinity : 0x0000000000000000

Memory (Avail/Total): Ph:3347M/8181M, Ph+PgF:10813M/16361M

Instance name: hoegh

Redo thread mounted by this instance: 1

Oracle process number: 0

Windows thread id: 8032, image: ORACLE.EXE

ORA-00020: maximum number of processes 150 exceeded

Died during process startup with error 20 (seq=5582)

OPIRIP: Uncaught error 20. Error stack:

ORA-00020: maximum number of processes (150) exceeded

Dump file c:\\oracle\\product\\10.2.0\\admin\\hoegh\\bdump\\hoegh_ora_8032.trc

Thu Apr 30 01:27:31 2015

ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Windows NT Version V6.1 Service Pack 1

CPU : 24 - type 8664, 12 Physical Cores

Process Affinity : 0x0000000000000000

Memory (Avail/Total): Ph:3350M/8181M, Ph+PgF:10812M/16361M

Instance name: hoegh

Redo thread mounted by this instance: 1

Oracle process number: 0

Windows thread id: 8032, image: ORACLE.EXE

ORA-00020: maximum number of processes 150 exceeded

Died during process startup with error 20 (seq=5650)

OPIRIP: Uncaught error 20. Error stack:

ORA-00020: maximum number of processes (150) exceeded

Dump file c:\\oracle\\product\\10.2.0\\admin\\hoegh\\bdump\\hoegh_ora_8032.trc

Thu Apr 30 09:54:12 2015

ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

Windows NT Version V6.1 Service Pack 1

CPU : 24 - type 8664, 12 Physical Cores

Process Affinity : 0x0000000000000000

Memory (Avail/Total): Ph:3857M/8181M, Ph+PgF:11421M/16361M

Instance name: hoegh

Redo thread mounted by this instance: 1

Oracle process number: 0

Windows thread id: 8032, image: ORACLE.EXE
        至于为什么新增设备会产生大量连接,到现在还没有搞清楚,怀疑和操作系统有关,这台设备安装的操作系统是windows xp embeded裁剪版系统,据说在安装系统时不太顺利;在故障设备上启动应用程序,通过select sid,serial#,program,terminal from v$session;监控实时会话信息,会话数不断增多,直到触碰阀值,数据库报错,问题成功复现;
      我们又找来另外一台相同配置、相同操作系统的设备进行测试,没有出现这个问题。最后,只能把这台设备重装系统。

      下面总结一下ORA-12516错误的解决办法:
 一、一般是由于数据库的当前会话数不满足造成的,可以视业务需要增加processes和sessions参数的大小,这二者的关系是:sessions=(1.1*processes+5);
 二、如果存在类似上述案例的恶意连接,可以按照上述步骤找到问题session,直接kill相关进程。

相关内容