問題現象
客戶在執行一條查詢語句的時候,數據庫報出ORA-01578錯誤
報錯截圖如下:

問題初步分析
通過錯誤可以看出,很明顯是因為SQL訪問到的數據塊(file#=7,block#=113906)損壞導致報錯。
客戶反饋出問題的是一個開發庫,因該庫多次處理過該庫的問題,我對這個庫的大概信息有所了解,該庫沒有備份,數據也不是很重要。
修復損壞的數據塊,最好的方式是通過備份集,執行blockrecover來修復,數據不會丟失。或者dump該塊的結構,通過bbed進行修復(技能要求較高)。
既然數據不是很重要,如果損壞的就這一個塊,那么最快修復的方式是跳過損壞的數據塊,然后重建表,一個數據塊的數據損失對一個開發數據庫來講,幾乎可以忽略不計。
與客戶簡單溝通后,同意該方案后開干。
問題處理
(1)檢查數據文件
通過backup validate檢查數據文件,看是否還有其它block損壞,也可以通過dbv來檢查。
RMAN> backup validate check logical datafile 7;
Starting backup at 2022-04-14 16:33:38
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1119 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/u01/app/oracle/oradata/yktdb/GFMIS_SYS_DATA_2.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:07:25
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7 OK 1 52893 4194302 18110743018595
File Name: /u01/app/oracle/oradata/yktdb/GFMIS_SYS_DATA_2.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1399377
Index 0 2569304
Other 0 172728
Finished backup at 2022-04-14 16:41:04
SQL> select * from V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTION_TYPE
---------- ---------- ---------- ------------------ ---------------------------
7 113906 1 1.8111E+13 CORRUPT
可以看到該文件只有一個數據塊損壞。
(2)確認對象
select OWNER,
SEGMENT_NAME,
PARTITION_NAME,
SEGMENT_TYPE,
TABLESPACE_NAME,
FILE_ID,
BLOCK_ID,
BLOCKS,
RELATIVE_FNO
from dba_extents
where file_id = 7
AND 113906 BETWEEN block_id AND block_id + blocks - 1;
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME FILE_ID BLOCK_ID BLOCKS RELATIVE_FNO
--------------- -------------------- -------------------- -------------------- ------------------ ---------- ---------- ---------- ------------
PROJECTLIB GLF_BAL TABLE GFMIS_SYS_DATA 7 113904 8 7
對象名與報錯SQL語句中要查的對象一致。
(3)跳過壞塊
這一步,我們可以參考MOS中提供的幾種方法,任選一種來處理:
ORA-1578 Methods to Skip Block Corruption (Doc ID 2199133.1)
我這里采用DBMS_REPAIR.SKIP_CORRUPT_BLOCKS程序來進行跳過。
SQL> col table_name for a15
SQL> select owner,table_name,skip_corrupt from dba_tables where table_name='GLF_BAL';
OWNER TABLE_NAME SKIP_CORRUPT
--------------- --------------- ------------------------
PROJECTLIB GLF_BAL DISABLED
SQL> BEGIN
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
(
SCHEMA_NAME => 'PROJECTLIB',
OBJECT_NAME => 'GLF_BAL',
OBJECT_TYPE => dbms_repair.table_object,
FLAGS => dbms_repair.SKIP_FLAG
);
END;
/ 2 3 4 5 6 7 8 9 10
PL/SQL procedure successfully completed.
執行后可以通過dba_tables里的SKIP_CORRUPT字段來驗證。
SQL> select owner,table_name,skip_corrupt from dba_tables where table_name='GLF_BAL';
OWNER TABLE_NAME SKIP_CORRUPT
--------------- --------------- ------------------------
PROJECTLIB GLF_BAL ENABLED
官方文檔中關于SKIP_CORRUPT字段的解釋如下,一看就明白。
Indicates whether Oracle Database ignores blocks marked corrupt during table and index scans (ENABLED) or raises an error (DISABLED). To enable this feature, run the DBMS_REPAIR.SKIP_CORRUPT_BLOCKS procedure.
4. 驗證
標記壞塊后,查詢之前報錯SQL句正常了。
但是觀察alert log,發現還是在報錯。
Thu Apr 14 17:17:08 2022
Errors in file /u01/app/oracle/diag/rdbms/yktdb/yktdb/trace/yktdb_smon_22147.trc (incident=819024):
ORA-01578: ORACLE data block corrupted (file # 7, block # 113906)
ORA-01110: data file 7: '/u01/app/oracle/oradata/yktdb/GFMIS_SYS_DATA_2.dbf'
Incident details in: /u01/app/oracle/diag/rdbms/yktdb/yktdb/incident/incdir_819024/yktdb_smon_22147_i819024.trc
ORACLE Instance yktdb (pid = 15) - Error 1578 encountered while recovering transaction (58, 11) on object 172902.
Errors in file /u01/app/oracle/diag/rdbms/yktdb/yktdb/trace/yktdb_smon_22147.trc:
ORA-01578: ORACLE data block corrupted (file # 7, block # 113906)
ORA-01110: data file 7: '/u01/app/oracle/oradata/yktdb/GFMIS_SYS_DATA_2.dbf'
Thu Apr 14 17:17:09 2022
Dumping diagnostic data in directory=[cdmp_20220414171709], requested by (instance=1, osid=22147 (SMON)), summary=[incident=819024].
從alert log里的報錯來看,錯誤是smon進程拋出來的,while recovering transaction表明正在進行事務恢復。查詢可以跳過,但是事務恢復不能,符合常理。
5. 繼續處理
不得不說MOS實在太強大了,這個問題如下文檔有完整的記錄:
ORA-01578 Encountered while Recovering Transaction on Object (Doc ID 2470377.1)
(1)為了避免smon導致數據庫崩潰,設置_smu_debug_mode
SQL> alter system set "_smu_debug_mode"=1024;
設置"_smu_debug_mode"=1024來停止事務恢復,該參數可以動態修改。
(2)導出表
[oracle@yth132 ~]$ exp \'\/ as sysdba\' tables=PROJECTLIB.GLF_BAL log=GLF_BAL.log
Export: Release 11.2.0.4.0 - Production on Fri Apr 15 18:03:59 2022
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in AL32UTF8 character set and AL16UTF16 NCHAR character set
server uses ZHS16GBK character set (possible charset conversion)
About to export specified tables via Conventional Path ...
Current user changed to PROJECTLIB
. . exporting table GLF_BAL
Export terminated successfully without warnings.
(3)刪除表
SQL> drop table PROJECTLIB.GLF_BAL;
Table dropped.
(4)導入表
[oracle@yth132 ~]$ imp \'\/ as sysdba\' log=imp_GLF_BAL.log file=GLF_BAL.dmp IGNORE=Y fromuser=PROJECTLIB touser=PROJECTLIB
Import: Release 11.2.0.4.0 - Production on Fri Apr 15 18:24:37 2022
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export file created by EXPORT:V11.02.00 via conventional path
import done in AL32UTF8 character set and AL16UTF16 NCHAR character set
import server uses ZHS16GBK character set (possible charset conversion)
. importing PROJECTLIB's objects into PROJECTLIB
. . importing table "GLF_BAL" 273155 rows imported
Import terminated successfully without warnings.
(5)取消表上的SKIP CORRUPT標記
SQL> BEGIN
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
(
SCHEMA_NAME => 'PROJECTLIB',
OBJECT_NAME => 'GLF_BAL',
FLAGS => dbms_repair.NOSKIP_FLAG
);
END; 2 3 4 5 6 7 8
9 /
PL/SQL procedure successfully completed.
(6)重置_smu_debug_mode參數
SQL> alter system reset "_smu_debug_mode";
System altered.
完畢后再觀察alert log,沒有再報錯。




