


環(huán)境說明:
DB:Oracle 12.1.2.0 ADG
OS:Windows NT Version V6.2
問題現(xiàn)象:
Oracle數(shù)據(jù)庫自動(dòng)宕機(jī),告警日志如下:
Archived Log entry 174990 added for thread 1 sequence 804504 ID 0x59d7346b dest 1:
Thu Jul 31 15:30:02 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x1181C97878] [PC:0x7FFEB886A5BC, 00007FFEB886A5BC]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_21380.trc (incident=790563) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A5BC] [ACCESS_VIOLATION] [ADDR:0x1181C97878] [PC:0x7FFEB886A5BC] [UNABLE_TO_WRITE] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_790563\cjc_m000_21380_i790563.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:30:23 2025
Dumping diagnostic data in directory=[cdmp_20250731153023], requested by (instance=1, osid=21380 (M000)), summary=[incident=790563].
Thu Jul 31 15:30:32 2025
Sweep [inc][790563]: completed
Sweep [inc2][790563]: completed
Thu Jul 31 15:31:07 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x11AA8B44F0] [PC:0x7FFEB886A331, 00007FFEB886A331]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_21772.trc (incident=790875) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A331] [ACCESS_VIOLATION] [ADDR:0x11AA8B44F0] [PC:0x7FFEB886A331] [UNABLE_TO_WRITE] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_790875\cjc_m000_21772_i790875.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:31:14 2025
Dumping diagnostic data in directory=[cdmp_20250731153114], requested by (instance=1, osid=21772 (M000)), summary=[incident=790875].
Thu Jul 31 15:31:31 2025
Sweep [inc][790875]: completed
Sweep [inc2][790875]: completed
Thu Jul 31 15:32:05 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x119E2AAC88] [PC:0x7FFEB886A5BC, 00007FFEB886A5BC]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_16864.trc (incident=789667) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A5BC] [ACCESS_VIOLATION] [ADDR:0x119E2AAC88] [PC:0x7FFEB886A5BC] [UNABLE_TO_WRITE] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_789667\cjc_m000_16864_i789667.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:32:12 2025
Dumping diagnostic data in directory=[cdmp_20250731153212], requested by (instance=1, osid=16864 (M000)), summary=[incident=789667].
Thu Jul 31 15:32:32 2025
Sweep [inc][789667]: completed
Sweep [inc2][789667]: completed
Thu Jul 31 15:33:04 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x1089EF0000] [PC:0x7FFEB886A323, 00007FFEB886A323]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_5000.trc (incident=789763) (PDBNAME=CDB$ROOT):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A323] [ACCESS_VIOLATION] [ADDR:0x1089EF0000] [PC:0x7FFEB886A323] [UNABLE_TO_READ] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_789763\cjc_m000_5000_i789763.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:33:14 2025
Dumping diagnostic data in directory=[cdmp_20250731153314], requested by (instance=1, osid=5000 (M000)), summary=[incident=789763].
Thu Jul 31 15:33:34 2025
Sweep [inc][789763]: completed
Sweep [inc2][789763]: completed
Thu Jul 31 15:33:54 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x74] [PC:0x7FF662D84E1F, qecinisub()+63]
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_ora_34180.trc (incident=791099) (PDBNAME=PDBcjc):
ORA-07445: 出現(xiàn)異常錯(cuò)誤: 核心轉(zhuǎn)儲(chǔ) [qecinisub()+63] [ACCESS_VIOLATION] [ADDR:0x74] [PC:0x7FF662D84E1F] [UNABLE_TO_READ] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_791099\cjc_ora_34180_i791099.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_ora_34180.trc (incident=791100) (PDBNAME=PDBcjc):
ORA-00600: 內(nèi)部錯(cuò)誤代碼, 參數(shù): [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
ORA-07445: 出現(xiàn)異常錯(cuò)誤: 核心轉(zhuǎn)儲(chǔ) [qecinisub()+63] [ACCESS_VIOLATION] [ADDR:0x74] [PC:0x7FF662D84E1F] [UNABLE_TO_READ] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_791100\cjc_ora_34180_i791100.trc
Thu Jul 31 15:34:04 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_12928.trc (incident=789691) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_789691\cjc_m000_12928_i789691.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:34:05 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_791099\cjc_ora_34180_i791099.trc:
ORA-00600: 內(nèi)部錯(cuò)誤代碼, 參數(shù): [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
ORA-07445: 出現(xiàn)異常錯(cuò)誤: 核心轉(zhuǎn)儲(chǔ) [qecinisub()+63] [ACCESS_VIOLATION] [ADDR:0x74] [PC:0x7FF662D84E1F] [UNABLE_TO_READ] []
Thu Jul 31 15:34:05 2025
Dumping diagnostic data in directory=[cdmp_20250731153405], requested by (instance=1, osid=34180), summary=[incident=791100].
Thu Jul 31 15:34:13 2025
Sweep [inc][789691]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_12928.trc (incident=789692) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_789692\cjc_m000_12928_i789692.trc
Sweep [inc][791100]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:34:18 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_m000_12928.trc:
ORA-00600: internal error code, arguments: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
Thu Jul 31 15:34:19 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_pmon_25648.trc (incident=788659) (PDBNAME=PDBcjc):
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x1314CD428], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_788659\cjc_pmon_25648_i788659.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sweep [inc][791099]: completed
Sweep [inc2][791100]: completed
Sweep [inc2][789691]: completed
Sweep [inc][789692]: completed
Sweep [inc][788659]: completed
Sweep [inc2][789692]: completed
Thu Jul 31 15:34:55 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_pmon_25648.trc:
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x1314CD428], [], [], [], [], [], [], [], [], [], []
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_pmon_25648.trc (incident=788660) (PDBNAME=CDB$ROOT):
KSBRDP-472 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_788660\cjc_pmon_25648_i788660.trc
Thu Jul 31 15:34:56 2025
Dumping diagnostic data in directory=[cdmp_20250731153456], requested by (instance=1, osid=25648 (PMON)), summary=[incident=788659].
Thu Jul 31 15:34:58 2025
Errors in file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_ora_8428.trc (incident=790107) (PDBNAME=PDBcjc):
ORA-00600: 內(nèi)部錯(cuò)誤代碼, 參數(shù): [kghfrh:ds], [0x13159E5C8], [], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_790107\cjc_ora_8428_i790107.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jul 31 15:34:59 2025
USER (ospid: 25648): terminating the instance due to error 472
Thu Jul 31 15:35:13 2025
System state dump requested by (instance=1, osid=25648 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_diag_25976_20250731153513.trc
Thu Jul 31 15:35:50 2025
Instance terminated by USER, pid = 25648
問題分析:
問題期間,出現(xiàn)多個(gè)ORA-00600,ORA-07445報(bào)錯(cuò):
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A5BC] [ACCESS_VIOLATION] [ADDR:0x1181C97878] [PC:0x7FFEB886A5BC] [UNABLE_TO_WRITE] []
ORA-07445: exception encountered: core dump [PC:0x7FFEB886A331] [ACCESS_VIOLATION] [ADDR:0x11AA8B44F0] [PC:0x7FFEB886A331] [UNABLE_TO_WRITE] []
ORA-07445: 出現(xiàn)異常錯(cuò)誤: 核心轉(zhuǎn)儲(chǔ) [qecinisub()+63] [ACCESS_VIOLATION] [ADDR:0x74] [PC:0x7FF662D84E1F] [UNABLE_TO_READ] []
ORA-00600: internal error code, arguments: [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
ORA-00600: 內(nèi)部錯(cuò)誤代碼, 參數(shù): [qkexrXCopn1], [0], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x1314CD428], [], [], [], [], [], [], [], [], [], []
ORA-00600: 內(nèi)部錯(cuò)誤代碼, 參數(shù): [kghfrh:ds], [0x13159E5C8], [], [], [], [], [], [], [], [], [], []
最后,PMON進(jìn)程異常
System state dump requested by (instance=1, osid=25648 (PMON)), summary=[abnormal instance termination].
自動(dòng)生成了SSD日志:
System State dumped to trace file E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_diag_25976_20250731153513.trc
Thu Jul 31 15:35:50 2025
Instance terminated by USER, pid = 25648
問題原因:
宕機(jī)前最后一個(gè)600錯(cuò)誤是:
ORA-00600: 內(nèi)部錯(cuò)誤代碼, 參數(shù): [kghfrh:ds], [0x13159E5C8], [], [], [], [], [], [], [], [], [], []
根據(jù),在MOS上進(jìn)行搜索(support.oracle.com):
ORA-600/ORA-7445/ORA-700 Error Look-up Tool(Doc ID 153788.1)


沒有搜到?
直接搜索關(guān)鍵字:ORA-00600: [kghfrh:ds]

ORA-600 [kghfrh:ds] (Doc ID 300602.1) 有 27個(gè)bug。

根據(jù) 12.1.0.2 和 Crash關(guān)鍵字進(jìn)行搜索,沒有完全吻合的,和 Bug 18388363,Bug 22243719相似。
Bug 22243719 - Several Internal Errors Due to Shared Pool Memory Corruptions in 11.2.0.4 and Later. Instance May Crash (Doc ID 22243719.8)
MOS內(nèi)容如下:
Instance Termination with ORA-07445 [kghsrch()+144], ORA-00600 [kghfrh:ds] (Doc ID 2128933.1)




解決方案:
結(jié)合多篇mos文章:
永久解決方案:
打 12.1.0.2 最新補(bǔ)丁、或升級(jí)到12.2、或升級(jí)到19C(最徹底)。
臨時(shí)解決方案:
alter system set "_enable_shared_pool_durations"=false scope=spfile;
重啟實(shí)例生效,不一定能解決
日志分析:
需要哪些日志:
1.alert_cjc.log
2.E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_pmon_25648.trc
3.E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_788660\cjc_pmon_25648_i788660.trc
4.E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_ora_8428.trc
5.E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_790107\cjc_ora_8428_i790107.trc
6.E:\APP\CJC\diag\rdbms\cjc\cjc\trace\cjc_diag_25976_20250731153513.trc
通過 cjc_pmon_25648.trc 進(jìn)行分析:
基本信息:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Windows NT Version V6.2
CPU : 16 - type 8664, 16 Physical Cores
Process Affinity : 0x0x0000000000000000
Memory (Avail/Total): Ph:46765M/130237M, Ph+PgF:81254M/168109M
Instance name: cjc
Redo thread mounted by this instance: 1
Oracle process number: 2
Windows thread id: 25648, image: ORACLE.EXE (PMON)
Oracle版本:12.1.0.2。
操作系統(tǒng):Windows Server 2012(NT 6.2)。
資源:16核CPU,物理內(nèi)存130GB(可用47GB),無嚴(yán)重內(nèi)存壓力。
進(jìn)程:PMON(進(jìn)程號(hào)2),負(fù)責(zé)清理失效進(jìn)程和會(huì)話。
容器:操作發(fā)生在PDBcjc(Container ID=3)。
錯(cuò)誤觸發(fā)點(diǎn):
Incident 788659 created, dump file: E:\APP\CJC\diag\rdbms\cjc\cjc\incident\incdir_788659\cjc_pmon_25648_i788659.trc
ORA-00600: internal error code, arguments: [kghfrh:ds], [0x1314CD428], [], [], [], [], [], [], [], [], [], []
PMON: fatal error while deleting s.o. 0000000580827100 in this tree:
----------------------------------------
SO: 0x000000106900CCD0, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x000000106900CCD0, name=process, file=ksu.h LINE:14165, pg=0 conuid=0
(process) Oracle pid:307, ser:251, calls cur/top: 0x0000000FE5880488/0x0000000FE5880488
flags : (0x1) DEAD icon_uid:0
flags2: (0x8000), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
Cleanup details:
Found dead = 37 sec ago
Total Cleanup attempts = 1, Total time = 0 sec,
Cleanup timer = 0.000000 sec
Last Cleanup attempt (full) started 37 sec ago, Length = in progress,
Cleanup timer = (Total = 0.000000 sec, Current = 0.000000 sec, Timeouts = 0)
(post info) last post received: 140 0 2
last post received-location: ksl2.h LINE:3108 ID:kslpsr
last process to post me: 0x105901aae8 1 6
last post sent: 0 0 26
last post sent-location: ksa2.h LINE:290 ID:ksasnd
last process posted by me: 0x105901aae8 1 6
waiter on post event: 0
(latch info) hold_bits=0x0
Process Group: DEFAULT, pseudo proc: 0x0000001059163D78
O/S info: user: SYSTEM, term: CJC-SVR, ospid: 34180 (DEAD)
錯(cuò)誤代碼:ORA-00600 [kghfrh:ds]。
kghfrh: Kernel Generic Heap Free Heap,Oracle內(nèi)存管理模塊。
ds: Data Structure,表示操作涉及損壞的內(nèi)存數(shù)據(jù)結(jié)構(gòu)。
參數(shù) 0x1314CD428 是目標(biāo)內(nèi)存地址,指向一個(gè)無效的堆結(jié)構(gòu)。
觸發(fā)場(chǎng)景:PMON嘗試刪除一個(gè)死亡進(jìn)程(PID 307,OS PID 34180)。該進(jìn)程已標(biāo)記為DEAD(死亡37秒),但清理過程中訪問了無效內(nèi)存。
根本原因:共享池(Shared Pool)中的庫對(duì)象(Library Object)損壞或并發(fā)沖突,導(dǎo)致PMON無法安全釋放內(nèi)存。
詳細(xì)說明:
PMON: fatal error while deleting s.o. 0000000580827100 in this tree:
(1)PMON: 進(jìn)程監(jiān)視器 (Process Monitor),Oracle 后臺(tái)核心進(jìn)程之一,負(fù)責(zé)清理失敗的用戶進(jìn)程、回滾事務(wù)、釋放鎖和資源等。
(2)fatal error: 表明 PMON 在執(zhí)行清理任務(wù)時(shí)遇到了一個(gè)嚴(yán)重、無法恢復(fù)的錯(cuò)誤。
(3)deleting s.o.: s.o. 通常指代 Session Object 或更廣義的 Server Process Object。PMON 正在嘗試從內(nèi)部?jī)?nèi)存結(jié)構(gòu)(樹狀結(jié)構(gòu))中刪除這個(gè)對(duì)象。
(4)0000000580827100: 這是 PMON 試圖刪除的那個(gè)特定 s.o. 對(duì)象在 SGA 內(nèi)存中的十六進(jìn)制地址。
(5)in this tree: 指這個(gè) s.o. 對(duì)象所屬的 Oracle 內(nèi)部?jī)?nèi)存管理結(jié)構(gòu)(通常是一個(gè)樹狀結(jié)構(gòu),如空閑列表或活動(dòng)列表)。PMON 在遍歷或操作這個(gè)結(jié)構(gòu)時(shí)遇到了問題。
被清理的進(jìn)程對(duì)象 (SO - Session Object / Server Process Object) 詳細(xì)信息:
SO: 0x000000106900CCD0, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
(1)SO: 0x000000106900CCD0: 這是描述目標(biāo)進(jìn)程(Oracle pid 307)的主 Session Object 結(jié)構(gòu)在 SGA 中的內(nèi)存地址。它與 PMON 試圖刪除的 s.o. (0000000580827100) 是不同的對(duì)象,但緊密相關(guān)(s.o. 很可能是 SO 結(jié)構(gòu)內(nèi)部或關(guān)聯(lián)的一個(gè)組件)。
(2)type: 2: 對(duì)象類型代碼。2 通常代表 PROCESS 類型,即一個(gè)服務(wù)器進(jìn)程或后臺(tái)進(jìn)程對(duì)象。
(3)owner: 0x0000000000000000: 對(duì)象的所有者。0 通常表示該對(duì)象當(dāng)前沒有被任何其他結(jié)構(gòu)(如 latch, enqueue)顯式鎖定或持有。
(4)flag: INIT/-/-/0x00: 對(duì)象的狀態(tài)標(biāo)志。
(5)INIT: 表示對(duì)象處于初始化狀態(tài)。這在一個(gè)正在被清理的死進(jìn)程對(duì)象上出現(xiàn)是異常且關(guān)鍵的。PMON 在清理過程中期望該對(duì)象處于某種可清理狀態(tài)(如 DEAD),但它卡在了 INIT 狀態(tài),這很可能就是導(dǎo)致 fatal error 的根本原因。
(6)-/-/0x00: 其他標(biāo)志位未設(shè)置或?yàn)榭铡?br />
(7)if: 0x3: Instance Flags (實(shí)例標(biāo)志)。0x3 的具體含義需查內(nèi)部文檔,通常表示該對(duì)象與當(dāng)前實(shí)例相關(guān)。
(8)c: 0x3: 引用計(jì)數(shù) (Reference Count)。0x3 表示有 3 個(gè)地方引用了這個(gè) SO 結(jié)構(gòu)。PMON 需要等待引用計(jì)數(shù)降為 0 才能安全釋放它。如果計(jì)數(shù)因某種原因無法下降,就會(huì)導(dǎo)致清理掛起。
proc=0x000000106900CCD0, name=process, file=ksu.h LINE:14165, pg=0 conuid=0
(1)確認(rèn) proc 指向同一個(gè) SO 地址 (0x000000106900CCD0)。
(2)name=process: 再次確認(rèn)這是一個(gè)進(jìn)程對(duì)象。
(3)file=ksu.h LINE:14165: 指示這個(gè) SO 結(jié)構(gòu)是在內(nèi)核服務(wù)工具層 (Kernel Services Utility) 的 ksu.h 頭文件第 14165 行附近定義的。
(4)pg=0: Process Group ID (進(jìn)程組 ID) 為 0,通常指默認(rèn)進(jìn)程組。
(5)conuid=0: Container UID (容器用戶 ID) 為 0,通常表示屬于根容器 (CDB$ROOT),或者在不涉及多租戶的環(huán)境中未使用。
(process) Oracle pid:307, ser:251, calls cur/top: 0x0000000FE5880488/0x0000000FE5880488
(1)Oracle pid:307: 該進(jìn)程在 Oracle 實(shí)例內(nèi)部的進(jìn)程標(biāo)識(shí)符。
(2)ser:251: 序列號(hào) (Serial#)。與 pid 一起唯一標(biāo)識(shí)一個(gè)進(jìn)程(例如在 V$PROCESS 視圖中)。307,251 就是這個(gè)進(jìn)程的唯一標(biāo)識(shí)。
(3)calls cur/top: 0x0000000FE5880488/0x0000000FE5880488: 指向當(dāng)前調(diào)用堆棧和頂層調(diào)用堆棧的內(nèi)存地址。兩者地址相同 (0x0000000FE5880488),表明該進(jìn)程在死亡時(shí)沒有活動(dòng)的 SQL 調(diào)用堆棧在執(zhí)行。它處于空閑或等待狀態(tài)。這對(duì)于 PMON 清理來說通常是個(gè)好跡象,因?yàn)橐馕吨鴽]有未完成的事務(wù)需要復(fù)雜回滾。
flags : (0x1) DEAD icon_uid:0
(1)flags: (0x1) DEAD: 這是最重要的標(biāo)志之一。0x1 對(duì)應(yīng) DEAD 狀態(tài)位被置位。這明確告訴 Oracle 內(nèi)核(包括 PMON)這個(gè)進(jìn)程已經(jīng)被操作系統(tǒng)終止或內(nèi)部檢測(cè)為失效,需要被清理。
(2)icon_uid:0: 實(shí)例連接 UID (Instance Connection UID),通常為 0。
intr error: 0, call error: 0, sess error: 0, txn error 0
這些計(jì)數(shù)器記錄進(jìn)程生命周期中發(fā)生的特定類型的錯(cuò)誤次數(shù)(中斷錯(cuò)誤、調(diào)用錯(cuò)誤、會(huì)話錯(cuò)誤、事務(wù)錯(cuò)誤)。全為 0 表明這個(gè)進(jìn)程在死亡前沒有記錄到這類內(nèi)部錯(cuò)誤。它的死亡可能是由外部因素(如 OOM Killer, kill -9, 網(wǎng)絡(luò)中斷)或未捕獲的內(nèi)部問題導(dǎo)致的。
intr queue: empty
中斷隊(duì)列為空。表明該進(jìn)程在死亡時(shí)沒有待處理的中斷請(qǐng)求。這也簡(jiǎn)化了清理工作。
ksudlp FALSE at location: 0
(1)ksudlp 是 Kernel Service User Process Dead Process Cleanup 的縮寫,這是 PMON 用來清理死進(jìn)程的核心函數(shù)。
(2)FALSE: 表明 ksudlp 函數(shù)在某個(gè)檢查點(diǎn)返回了 FALSE,意味著它未能成功完成對(duì)該進(jìn)程的清理。
(3)at location: 0: 指示在 ksudlp 函數(shù)內(nèi)部哪個(gè)代碼點(diǎn)返回了 FALSE。0 通常代表函數(shù)入口或一個(gè)非常早期的檢查點(diǎn)。這說明清理在剛開始嘗試時(shí)就失敗了,甚至沒能取得進(jìn)展。這與 SO 狀態(tài)卡在 INIT 相吻合。
清理嘗試的詳細(xì)統(tǒng)計(jì)信息:
Cleanup details:
Found dead = 37 sec ago
PMON 在 37 秒前 首次檢測(cè)到或被告知該進(jìn)程 (pid 307, ser 251) 已經(jīng)死亡 (DEAD)。
Total Cleanup attempts = 1, Total time = 0 sec,
Cleanup timer = 0.000000 sec
(1)PMON 到目前為止只嘗試了 1 次 清理這個(gè)進(jìn)程。
(2)這 1 次 嘗試消耗的總時(shí)間是 0 秒。這非常異常!表明清理嘗試幾乎立即失敗或被阻止,沒有花費(fèi)任何實(shí)際清理時(shí)間。這與 ksudlp FALSE at location: 0 的瞬間失敗一致。
(3)Cleanup timer 記錄了本次清理的計(jì)時(shí)器值(也是 0 秒)。
Last Cleanup attempt (full) started 37 sec ago, Length = in progress,
Cleanup timer = (Total = 0.000000 sec, Current = 0.000000 sec, Timeouts = 0)
最后一次(也是唯一一次)清理嘗試:
(1)Started 37 sec ago: 開始于 37 秒前(與 Found dead 時(shí)間相同,表明一發(fā)現(xiàn)死亡就立即嘗試清理)。
(2)Length = in progress: 狀態(tài)顯示為 in progress (進(jìn)行中)。這是關(guān)鍵矛盾點(diǎn)。雖然計(jì)時(shí)器顯示耗時(shí) 0 秒且 ksudlp 返回了 FALSE,但 PMON 的內(nèi)部狀態(tài)機(jī)卻認(rèn)為這次清理嘗試仍在進(jìn)行中。這種狀態(tài)不一致是導(dǎo)致 fatal error 的直接原因。PMON 無法推進(jìn)清理,也無法放棄。
(3)Cleanup timer: 再次確認(rèn)總耗時(shí) 0 秒,當(dāng)前耗時(shí) 0 秒,超時(shí)次數(shù)為 0。
進(jìn)程間通信 (Post) 信息:
(post info) last post received: 140 0 2
last post received-location: ksl2.h LINE:3108 ID:kslpsr
記錄該進(jìn)程最后一次接收到的其他進(jìn)程的“Post”信號(hào)(一種進(jìn)程間通信機(jī)制,用于喚醒等待)。
(1)140 0 2: 包含發(fā)送者信息(pid?/sender id?)和 post 類型的編碼。需內(nèi)部解析。
(2)ksl2.h LINE:3108 ID:kslpsr: 接收發(fā)生在 kslpsr (Kernel Service Latch Post Receive) 函數(shù)中。這表明該進(jìn)程死亡前可能在等待一個(gè) latch 或 post 事件。
last process to post me: 0x105901aae8 1 6
最后一次向該進(jìn)程發(fā)送 post 的源進(jìn)程的 SO 地址 (0x105901aae8) 及其標(biāo)識(shí)信息 (1 6,可能是 pid/ser 或其他)。
last post sent: 0 0 26
last post sent-location: ksa2.h LINE:290 ID:ksasnd
記錄該進(jìn)程最后一次發(fā)送給其他進(jìn)程的“Post”信號(hào)。
0 0 26: 包含接收者信息和 post 類型的編碼。需內(nèi)部解析。
ksa2.h LINE:290 ID:ksasnd: 發(fā)送發(fā)生在 ksasnd (Kernel Service Asynchronous Send) 函數(shù)中。
last process posted by me: 0x105901aae8 1 6
該進(jìn)程最后一次發(fā)送 post 的目標(biāo)進(jìn)程的 SO 地址 (0x105901aae8) 及其標(biāo)識(shí)信息 (1 6)。注意: 這個(gè)地址 (0x105901aae8) 和標(biāo)識(shí) (1 6) 與 last process to post me 中的完全相同。這表明 pid 307 在死亡前與 pid=1, ser=6 (很可能是 PMON 本身或其他關(guān)鍵后臺(tái)進(jìn)程) 有雙向的 post 通信。
waiter on post event: 0
該進(jìn)程在死亡時(shí)沒有在等待任何 post 事件。這與 last post received 顯示它之前確實(shí)接收過 post 不矛盾,只是表明在死亡那一刻它沒有處于等待狀態(tài)。
Latch 信息:
(latch info) hold_bits=0x0
hold_bits: 0x0: 表明該進(jìn)程在死亡時(shí)沒有持有任何 latch。這是好消息,意味著它的死亡不會(huì)直接導(dǎo)致 latch 泄漏或阻塞其他需要這些 latch 的進(jìn)程。PMON 清理時(shí)不需要處理 latch 釋放問題。
進(jìn)程組信息:
Process Group: DEFAULT, pseudo proc: 0x0000001059163D78
Process Group: DEFAULT: 該進(jìn)程屬于默認(rèn)進(jìn)程組。
pseudo proc: 0x0000001059163D78: 指向其所屬進(jìn)程組的“偽進(jìn)程”結(jié)構(gòu)在 SGA 中的地址。
操作系統(tǒng) (O/S) 信息:
O/S info: user: SYSTEM, term: CJC-SVR, ospid: 34180 (DEAD)
user: SYSTEM: 該進(jìn)程在操作系統(tǒng)層面是由 SYSTEM 用戶(或運(yùn)行 Oracle 軟件的操作系統(tǒng)用戶)運(yùn)行的。
term: CJCTEST-SVR: 終端或客戶端信息(可能不準(zhǔn)確,特別是對(duì)于后臺(tái)進(jìn)程)。這里顯示 CJCTEST-SVR,可能是服務(wù)器主機(jī)名或一個(gè)標(biāo)識(shí)符。
ospid: 34180 (DEAD): 該進(jìn)程的操作系統(tǒng)進(jìn)程 ID (OSPID) 是 34180。(DEAD) 標(biāo)簽明確確認(rèn) Oracle 檢測(cè)到操作系統(tǒng)進(jìn)程 34180 已經(jīng)不存在了。這是進(jìn)程狀態(tài)
被標(biāo)記為 DEAD 的直接證據(jù)。常見原因包括:
(1)被操作系統(tǒng) OOM (Out-Of-Memory) Killer 終止。
(2)被 kill -9 命令強(qiáng)制終止。
(3)進(jìn)程自身崩潰 (Segmentation Fault, Abort 等)。
(4)網(wǎng)絡(luò)連接中斷導(dǎo)致操作系統(tǒng)關(guān)閉連接。
(5)操作系統(tǒng)重啟或資源限制。
service name: pdbcjc
client details:
O/S info: user: sys_cjc, term: CJC-SVR, ospid: 33496:2
machine: IIS APPPOOL\CJC-SVR program: w3wp.exe
Current Wait Stack:
Not in wait; last wait ended 1 min 1 sec ago
Wait State:
fixed_waits=0 flags=0x21 boundary=0x0000000000000000/-1
Session Wait History:
elapsed time of 1 min 1 sec since last wait
0: waited for 'SQL*Net message from client'
driver id=0x28444553, #bytes=0x1, =0x0
wait_id=6 seq_num=7 snap_id=1
wait times: snap=0.005437 sec, exc=0.005437 sec, total=0.005437 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000003 sec of elapsed time
1: waited for 'SQL*Net message to client'
driver id=0x28444553, #bytes=0x1, =0x0
wait_id=5 seq_num=6 snap_id=1
wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000017 sec of elapsed time
2: waited for 'SQL*Net message from client'
driver id=0x28444553, #bytes=0x1, =0x0
wait_id=4 seq_num=5 snap_id=1
wait times: snap=0.000481 sec, exc=0.000481 sec, total=0.000481 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000038 sec of elapsed time
3: waited for 'SQL*Net message to client'
driver id=0x28444553, #bytes=0x1, =0x0
wait_id=3 seq_num=4 snap_id=1
wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000036 sec of elapsed time
4: waited for 'log file sync'
buffer#=0xca56, sync scn=0xecdc383, =0x0
wait_id=2 seq_num=3 snap_id=1
wait times: snap=0.067455 sec, exc=0.067455 sec, total=0.067455 sec
wait times: max=infinite
wait counts: calls=1 os=1
occurred after 0.005086 sec of elapsed time
5: waited for 'SQL*Net message from client'
driver id=0x28444553, #bytes=0x1, =0x0
wait_id=1 seq_num=2 snap_id=1
wait times: snap=0.045936 sec, exc=0.045936 sec, total=0.045936 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000016 sec of elapsed time
6: waited for 'SQL*Net message to client'
driver id=0x28444553, #bytes=0x1, =0x0
wait_id=0 seq_num=1 snap_id=1
wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
服務(wù)與客戶端信息
service name: pdbcjc
該進(jìn)程連接的數(shù)據(jù)庫服務(wù)名為 pdbcjc(通常是一個(gè)PDB級(jí)別的服務(wù)名),表明連接來自特定應(yīng)用或租戶。
client details:
O/S info: user: sys_cjc, term: CJC-SVR, ospid: 33496:2
machine: IIS APPPOOL\CJC-SVR program: w3wp.exe
客戶端詳細(xì)信息:
user: sys_cjc:操作系統(tǒng)用戶名為 sys_cjc(可能是應(yīng)用服務(wù)賬戶)。
term: CJC-SVR:終端/主機(jī)名為 CJC-SVR。
ospid: 33496:2:客戶端進(jìn)程的操作系統(tǒng)PID為 33496,2 可能是線程ID或子進(jìn)程標(biāo)識(shí)。
machine: IIS APPPOOL\CJC-SVR:客戶端機(jī)器標(biāo)識(shí)為 IIS應(yīng)用程序池(微軟Web服務(wù)器),主機(jī)名 CJC-SVR。
program: w3wp.exe:客戶端程序?yàn)?IIS工作進(jìn)程(用于運(yùn)行ASP.NET等Web應(yīng)用),表明這是一個(gè)Web應(yīng)用發(fā)起的數(shù)據(jù)庫連接。
關(guān)鍵結(jié)論:該Oracle進(jìn)程由運(yùn)行在IIS上的Web應(yīng)用(ASP.NET等)創(chuàng)建,通過服務(wù)名 pdbcjc 連接到PDB。
當(dāng)前等待狀態(tài)
Current Wait Stack:
Not in wait; last wait ended 1 min 1 sec ago
當(dāng)前無等待:進(jìn)程在死亡時(shí)未處于任何等待狀態(tài)。
上次等待結(jié)束時(shí)間:最后一次等待結(jié)束于 1分1秒前,表明進(jìn)程在死亡前已空閑超過1分鐘。
會(huì)話等待歷史 (Session Wait History)
按時(shí)間倒序列出死亡前的最后7個(gè)等待事件(0=最近,6=最早):
0. 最近等待:SQLNet message from client (結(jié)束于1分鐘前)
waited for 'SQLNet message from client’
driver id=0x28444553, #bytes=0x1, =0x0
wait times: snap=0.005437 sec, total=0.005437 sec
occurred after 0.000003 sec of elapsed time
事件:等待客戶端發(fā)送請(qǐng)求(SQL*Net message from client)。
驅(qū)動(dòng):0x28444553 = (DES)(Oracle TNS驅(qū)動(dòng))。
參數(shù):預(yù)期接收1字節(jié)數(shù)據(jù)(心跳包或控制信號(hào)),實(shí)際收到0字節(jié)。
等待時(shí)間:5.4毫秒(短暫等待)。
發(fā)生時(shí)機(jī):會(huì)話空閑后極短時(shí)間內(nèi)觸發(fā)(0.000003秒)。
關(guān)鍵事件:log file sync (事務(wù)提交等待)
waited for ‘log file sync’
buffer#=0xca56, sync scn=0xecdc383
wait times: snap=0.067455 sec, total=0.067455 sec
wait counts: calls=1 os=1
occurred after 0.005086 sec of elapsed time
事件:log file sync - 提交事務(wù)時(shí)等待重做日志寫入磁盤。
參數(shù):
buffer#=0xca56:重做日志緩沖區(qū)編號(hào)(十進(jìn)制 51798)。
sync scn=0xecdc383:事務(wù)提交的SCN(System Change Number)。
等待時(shí)間:67.5毫秒(顯著高于其他事件)。
系統(tǒng)調(diào)用:calls=1 os=1 表明涉及磁盤I/O操作。
影響:這是Web應(yīng)用執(zhí)行的最后一次有效操作(事務(wù)提交)。
總結(jié):
進(jìn)程異常死亡DEAD—>PMON清理死進(jìn)程—>嘗試釋放INVL游標(biāo)—>共享池內(nèi)存損壞kghfrh:ds—>實(shí)例自動(dòng)終止 error 472
參考:
Instance Termination with ORA-07445 [kghsrch()+144], ORA-00600 [kghfrh:ds] (Doc ID 2128933.1)
歡迎關(guān)注我的公眾號(hào)《IT小Chen》




