一般情况下,ORACLE DBA看到如下情况的第一反应是,数据库实例没有启动或者是数据库环境变量没有设置正确,今天遇到的情况均不是以上两种情况,有点特别,且来看看为哪般。

oracle@POC-SV12-I2KDB:~> sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Thu Aug 13 11:31:58 2015

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> quit

Disconnected.

检查数据库实例是否启动

oracle@POC-SV12-I2KDB:~> ps -ef|grep ora_

oracle    8799     1  0  2011 ?        02:27:30 ora_pmon_inomc

oracle    8801     1  0  2011 ?        00:00:13 ora_vktm_inomc

oracle    8805     1  0  2011 ?        00:00:26 ora_diag_inomc

oracle    8807     1  0  2011 ?        00:00:53 ora_dbrm_inomc

oracle    8809     1  0  2011 ?        00:07:28 ora_psp0_inomc

oracle    8811     1  0  2011 ?        1-22:14:21 ora_dia0_inomc

oracle    8813     1  0  2011 ?        00:00:22 ora_mman_inomc

oracle    8815     1  0  2011 ?        01:09:25 ora_dbw0_inomc

oracle    8817     1  0  2011 ?        00:39:51 ora_lgwr_inomc

oracle    8819     1  0  2011 ?        04:23:53 ora_ckpt_inomc

oracle    8821     1  0  2011 ?        01:35:38 ora_smon_inomc

oracle    8823     1  0  2011 ?        00:00:05 ora_reco_inomc

oracle    8825     1  0  2011 ?        00:16:42 ora_mmon_inomc

oracle    8827     1  0  2011 ?        02:24:41 ora_mmnl_inomc

oracle    8829     1  0  2011 ?        00:00:07 ora_d000_inomc

oracle    8831     1  0  2011 ?        00:00:07 ora_s000_inomc

oracle    8875     1  0  2011 ?        00:03:26 ora_arc0_inomc

oracle    8877     1  0  2011 ?        00:03:27 ora_arc1_inomc

oracle    8879     1  0  2011 ?        00:03:28 ora_arc2_inomc

oracle    8881     1  0  2011 ?        00:00:16 ora_arc3_inomc

oracle    8883     1  0  2011 ?        00:00:16 ora_fbda_inomc

oracle    8885     1  0  2011 ?        00:00:10 ora_qmnc_inomc

oracle    8901     1  0  2011 ?        00:00:05 ora_q000_inomc

oracle    8903     1  0  2011 ?        00:08:33 ora_q001_inomc

oracle   10895     1  0  2011 ?        01:02:28 ora_cjq0_inomc

oracle   10965     1  0  2011 ?        00:00:26 ora_smco_inomc

oracle   23221 22576  0 11:31 pts/2    00:00:00 grep ora_

检查数据库监听状态是否正常

oracle@POC-SV12-I2KDB:~> lsnrctl status

LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 13-AUG-2015 11:31:34

Copyright (c) 1991, 2008, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))

STATUS of the LISTENER

------------------------

Alias                     LISTENER

Version                   TNSLSNR for Linux: Version 11.1.0.7.0 - Production

Start Date                17-SEP-2011 14:59:03

Uptime                    431 days 15 hr. 36 min. 44 sec

Trace Level               off

Security                  ON: Local OS Authentication

SNMP                      OFF

Listener Parameter File   /opt/oracle/app/product/11g/db/network/admin/listener.ora

Listening Endpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=POC-SV12-I2KDB)(PORT=1521)))

Services Summary...

Service "inomc" has 1 instance(s).

Instance "inomc", status READY, has 1 handler(s) for this service...

Service "inomcXDB" has 1 instance(s).

Instance "inomc", status READY, has 1 handler(s) for this service...

Service "inomc_XPT" has 1 instance(s).

Instance "inomc", status READY, has 1 handler(s) for this service...

The command completed successfully

看吧,数据库实例状态、监听状态都是正常的,但是ORACLE数据库用户SYS登录就是不能一如往常登录进去。

再排除一下环境变量设置不当(如果同一数据库安装多个实例,ORACLE环境变量又ORACLE_SID配置不当是有可能出现SYS登录空闲实例的)的问题

oracle@POC-SV12-I2KDB:/opt/oracle/app/product/11g/db> pwd

/opt/oracle/app/product/11g/db

oracle@POC-SV12-I2KDB:/opt/oracle/app/product/11g/db> ps -ef|grep smon

oracle    8821     1  0  2011 ?        01:35:38 ora_smon_inomc

oracle   32267 22576  0 11:36 pts/2    00:00:00 grep smon

oracle@POC-SV12-I2KDB:~> echo $ORACLE_HOME

/opt/oracle/app/product/11g/db

oracle@POC-SV12-I2KDB:~> echo $ORACLE_SID

inomc

数据库环境变量也都是正常配置了的,那我接下来尝试使用业务用户在本地登录看数据库报什么错误:

oracle@POC-SV12-I2KDB:/opt/oracle/app/product/11g/db/network/admin> sqlplus imap/imap

SQL*Plus: Release 11.1.0.7.0 - Production on Thu Aug 13 11:42:21 2015

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

ERROR:

ORA-01034: ORACLE not available

ORA-27102: out of memory

Linux-x86_64 Error: 12: Cannot allocate memory

Additional information: 1

Additional information: 819200

Additional information: 8

Process ID: 0

Session ID: 0 Serial number: 0

这回很明显了,数据库报内存溢出了,接下来再看下当前数据库连接到数据库实例的进程,由于当前数据库SYS无法登录,只能ps -ef|grep LOCAL了,结果发现有300个之多。

oracle@POC-SV12-I2KDB:~> ps -ef|grep LOCAL

oracle   28688     1  0  2011 ?        00:03:05 oracleinomc (LOCAL=NO)

oracle   28690     1  0  2011 ?        00:00:00 oracleinomc (LOCAL=NO)

oracle   28737     1  0  2011 ?        00:00:00 oracleinomc (LOCAL=NO)

oracle   28762     1  0  2011 ?        00:00:00 oracleinomc (LOCAL=NO)

oracle   28782     1  0  2011 ?        00:00:16 oracleinomc (LOCAL=NO)

oracle   28785     1  0  2011 ?        00:00:16 oracleinomc (LOCAL=NO)

oracle   28791     1  0  2011 ?        00:00:16 oracleinomc (LOCAL=NO)

oracle   28795     1  0  2011 ?        00:00:16 oracleinomc (LOCAL=NO)

oracle   28798     1  0  2011 ?        00:00:16 oracleinomc (LOCAL=NO)

oracle   28815     1  0  2011 ?        00:00:00 oracleinomc (LOCAL=NO)

oracle   29217     1  0  2011 ?        00:11:59 oracleinomc (LOCAL=NO)

oracle   29226     1  0  2011 ?        00:09:16 oracleinomc (LOCAL=NO)

.

.

.

再看下数据库服务器的内存状态发现,16G的内存确实已经用完了,SWAP分区也已经使用殆尽了。

oracle@POC-SV12-I2KDB:~> free -t

total       used       free     shared    buffers     cached

Mem:       8096276    8039268   57008     0     195744    1732592

-/+ buffers/cache:    6110932    1985344

Swap:     16779852   16779844         8

Total:    24876128   24819112      57016

与现场业务、系统管理员方沟通后决定杀掉LOCAL=NO的进程释放内存。

oracle@POC-SV12-I2KDB:~> kill -9 28688

oracle@POC-SV12-I2KDB:~> kill -9 28690

oracle@POC-SV12-I2KDB:~> kill -9 28737

oracle@POC-SV12-I2KDB:~> kill -9 28762

oracle@POC-SV12-I2KDB:~> kill -9 28782

oracle@POC-SV12-I2KDB:~> kill -9 28785

oracle@POC-SV12-I2KDB:~> kill -9 28791

oracle@POC-SV12-I2KDB:~> kill -9 28795

oracle@POC-SV12-I2KDB:~> kill -9 28798

oracle@POC-SV12-I2KDB:~> kill -9 28815

oracle@POC-SV12-I2KDB:~> free

total       used       free     shared    buffers     cached

Mem:       8096276    8044760      51516          0     195652    1730480

-/+ buffers/cache:    6118628    1977648

Swap:     16779852   16754328      25524

oracle@POC-SV12-I2KDB:~> free

total       used       free     shared    buffers     cached

Mem:       8096276    8033788      62488          0     194856    1686568

-/+ buffers/cache:    6152364    1943912

Swap:     16779852   16753524      26328

oracle@POC-SV12-I2KDB:~> free

total       used       free     shared    buffers     cached

Mem:       8096276    8028440     67836          0     194856    1686568

-/+ buffers/cache:    6147016    1949260

Swap:     16779852   16753524     26328

oracle@POC-SV12-I2KDB:~> free

total       used       free     shared    buffers     cached

Mem:       8096276    8006900      89376         0     194856    1686568

-/+ buffers/cache:    6125476    1970800

Swap:     16779852   16753524      26328

释放完部分内存,再尝试oracle管理账户SYS本地登录,发现SYS用户能够正常登录了:

oracle@POC-SV12-I2KDB:~> sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Thu Aug 13 11:48:29 2015

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

With the Partitioning, Oracle Label Security, OLAP, Data Mining,

Oracle Database Vault and Real Application Testing options

SQL>

后续分析,真的很奇怪,以前遇见过会话数超、进程数超、内存溢出SYS无法登录的,但是现状也只是到如下就卡主(这是比较明显的,要么是像业务用户本地登录一样直接报内存溢出)

oracle@POC-SV12-I2KDB:~> sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Thu Aug 13 11:48:29 2015

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

Connected to:

由于查杀进程前无法登录数据库,无法获知告警日志的确切位置,杀过进程后,查看告警日志,有大量如下报错:

Process J000 died, see its trace file

kkjcre1p: unable to spawn jobq slave process

Errors in file /opt/oracle/app/diag/rdbms/inomc/inomc/trace/inomc_cjq0_10895.trc:

.

.

.

Thu Aug 13 11:45:52 2015

Process W000 died, see its trace file

Process W000 died, see its trace file

Process W000 died, see its trace file

查看kkjcre1p: unable to spawn jobq slave process 的官方解释,说可能是数据库进程数超了,但是数据库告警日志没有报,并且查看数据库进程数设置有1000个之多,会话数设置有1101个之多,根据现有信息分析,不是数据库进程数超或会话数超引起的。

总结:由于数据库服务器内存小,而数据库会话数、进程数设置过大,当数据库进程数急剧增多时,数据库服务器内存耗尽,导致数据库管理账户SYS无法登陆;比较奇怪的是11G的SYS无法登陆不报服务器内存溢出却连接空闲实例,而使用业务用户登录时却报了明显的内存溢出。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29357786/viewspace-1770478/,如需转载,请注明出处,否则将追究法律责任。

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐