一、實驗環境
| 序號 | 主機名 | IP地址 | 系統版本 |
|---|---|---|---|
| 1 | ora11g | 192.168.116.3 | Centos7.4 |
二、一鍵安裝ORACLE
為了快速實驗,省去安裝的繁瑣步驟。本次實驗直接一鍵安裝ORACLE(含安裝數據庫軟件、創建監聽、創建實例)
[root@oracle11g soft]# pwd
/soft
[root@oracle11g soft]# ll
total 1900
-rwxrwxr-x 1 oracle oinstall 651780 Nov 5 2021 bison-2.4.1-5.el6.x86_64.rpm
-rwxrwxr-x 1 oracle oinstall 19112 Jul 11 16:15 compat-libcap1-1.10-7.el7.x86_64.rpm
-rwxrwxr-x 1 oracle oinstall 228841 Nov 5 2021 compat-libstdc++-33-3.2.3-47.3.x86_64.rpm
-rw-r--r-- 1 oracle oinstall 502 Aug 1 11:16 db.rsp
-rwxrwxr-x 1 oracle oinstall 54070 Jun 1 2022 elfutils-libelf-0.137-3.el5.x86_64.rpm
-rwxrwxr-x 1 oracle oinstall 24980 Jun 1 2022 elfutils-libelf-devel-0.137-3.el5.x86_64.rpm
-rwxrwxr-x 1 oracle oinstall 64118 Jun 1 2022 elfutils-libelf-devel-static-0.137-3.el5.x86_64.rpm
-rwxrwxr-x 1 oracle oinstall 291380 Nov 5 2021 flex-2.5.35-9.el6.x86_64.rpm
-rwxrwxr-x 1 oracle oinstall 17327 Aug 1 11:22 oracleAllSilent_20230801111436.log
-rwxrwxr-x 1 oracle oinstall 166252 Nov 5 2021 ORACLE-INSTALL.sh
-rwxrwxr-x 1 oracle oinstall 210877 Nov 5 2021 pdksh-5.2.14-37.el5_8.1.x86_64.rpm
-rwxrwxr-x 1 oracle oinstall 59332 Nov 5 2021 telnet-0.17-48.el6.x86_64.rpm
-rwxrwxr-x 1 oracle oinstall 124812 Nov 5 2021 xinetd-2.3.14-40.el6.x86_64.rpm
[root@oracle11g soft]# ./ORACLE-INSTALL.sh -i 192.168.116.3
---------------------------------------------------------------------------------------------
安裝過程略
---------------------------------------------------------------------------------------------
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Congratulations, Install Successful! Please Reboot Later.
[oracle@orcl:/home/oracle]$ sqlplus / as sysdba
SQL> shutdown immediate
SQL> startup mount;
SQL> alter database archivelog;
SQL> alter database open;
SQL> alter system switch logfile;
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /archivelog
Oldest online log sequence 3
Next log sequence to archive 5
Current log sequence 5
三、數據庫壞塊實驗
3.1、壞塊介紹
出現壞塊現象是指:在Oracle數據庫的一個或多個數據塊(一個數據塊的容量在創建數據庫時由db_block_size參數指定,缺省為8K)內出現內容混亂的現象。由于正常的數據塊都有固定的合法內容格式,壞塊的出現,導致數據庫進程無法正常解析數據塊的內容,進而使數據庫進程報錯乃至掛起,并級聯導致整個數據庫實例出現異常。
對于 Oracle 數據塊物理損壞的情形,通常可以分為兩種情況:
-
有備份,通過 RMAN 恢復
有備份的情況下,這是很理想的情形,可以直接通過 RMAN 塊介質恢復(BLOCK MEDIA RECOVERY)功能來完成受損塊的恢復。
-
無備份,通過 ORACLE自帶的DBMS_REPAIR 包來修復
?? 注意: 使用 DBMS_REPAIR 包來修復,并非完全恢復,而是標記壞塊,然后不對起進行訪問,這部分被標記的數據也就丟失了,這是無法避免的。
1,壞塊產生的原因
壞塊產生的原因大致有以下幾種:
-
硬件問題
Oracle進程在處理一個數據塊時,首先將其讀入物理內存空間,在處理完成后,再由特定進程將其寫回磁盤;如果在這個過程中,出現內存故障,CPU計算失誤,都會導致內存數據塊的內容混亂,最后反映到寫回磁盤的數據塊內容有誤。同樣,如果存儲子系統出現異常,數據塊損壞也就隨之出現了。
-
操作系統BUG
由于Oracle進程對數據塊的讀寫,都是以操作系統內核調用(system call)的方式完成的,如果操作系統在內核調用存在問題,必然導致Oracle進程寫入非法的內容。
-
操作系統的I/O錯誤或緩沖問題
-
內存或paging問題
-
OracleBUG
Oracle軟件特定版本上,可能出現導致數據塊的內容出現異常BUG。
-
非Oracle進程擾亂Oracle共享內存區域
如上文所述,在當數據塊的內容被讀入主機的物理內存時,如果其他非Oracle進程,對Oracle使用的共享內存區域形成了擾亂,最終導致寫回磁盤的數據塊內容混亂。
-
異常關機,掉電,終止服務
異常關機,掉電,終止服務使進程異常終止,而破壞數據塊的完整性,導致壞塊產生。注:這也是為什么突然斷電會導致數據庫無法啟動
由上可見,壞塊的形成原因復雜。當出現壞塊時,為了找到確切的原因,需要大量的分析時間和排查操作,甚至需要多次重現才能找出根本原因。但當故障發生在生產系統上,我們為了減少停機時間,會盡快實施應急權變措施以保證系統的可用性,這樣就破壞了故障現場,對根本原因的分析因而也更加困難了。
2,壞塊的預防
壞塊問題破壞性大,但并非不可預防。
-
在Metalink.oracle.com網站,Oracle定期發布基于特定軟件版本的“已知問題(known issues)說明”。對于可能導致壞塊的Oracle軟件BUG,在Oracle公司內部,是作為高嚴重級別的問題進行處理,在“已知問題(known issues)說明”中,這些BUG以嚴重(Noticable)問題標出(標記為*或+),部分問題,Oracle還會發布警告(Alert)通告。在文檔中,Oracle會提供相應的補丁或應對措施。
-
Oracle提供恢復工具-Recovery Manager,提供了掃描文件檢查壞塊的功能。
在Recovery Manager界面中,使用:
RMAN> BACKUP CHECK LOGICAL VALIDATE DATAFILE n ;
可以檢查數據文件是否包含壞塊,同時并不產生實際的備份輸出。
-
Dbv工具檢查
因為dbv要求file后面跟的必須是一個文件擴展名,所以如果用裸設備存儲的,就必須使用ln鏈接裸設備到一個文件,然后再用dbv對這個鏈接文件進行檢查。ANALYZE TABLE tablename VALIDATE STRUCTURE CASCADE它執行壞塊的檢查,但是不會標記壞塊為corrupt,檢測的結果保存在USER_DUMP_DEST目錄下的用戶trace文件中。
-
利用exp工具導出整個數據庫可以檢測壞塊
對以下情況的壞塊是檢測不出來的:
HWM以上的壞塊是不會發現的
索引中存在的壞塊是不會發現的
數據字典中的壞塊是不會發現的
結合數據庫性能綜合考慮db_block_checksum和db_blockchecking參數。
當我們使用Recovery Manager進行實際的數據庫備份時,同時也就進行了壞塊檢查。但要注意的是,在線使用Recovery Manager掃描壞塊和備份時,需要數
據庫運行在歸檔模式(archive log),否則只能在數據庫未打開的情況下進行。對于操作系統問題和硬件故障,則需要相應廠商的配合支持。同時,避免在數
據庫主機運行其他用戶進程,避免異常停機,也會減少壞塊發生的幾率。
3,壞塊故障的識別
遇到壞塊問題時,數據庫的異常表現通常有:
報告ORA-01578錯誤。
報告Ora-1110錯誤。
報告ORA-00600錯誤,其中,第一個參數為2000-8000,Cache layer 2000 – 4000,Transaction layer 4000 – 6000, layer 6000 - 8000。
Trace文件中出現Corrupt block dba: 0x160c5958 . found。
分析對象失敗。
后臺進程,如DBWR,LGWR出現長時間異常等待,如“LGWR wait for redo copy”。
3.2、模擬標記壞塊方法的恢復
1,準備數據
--創建用于演示的data file
SQL> create tablespace tbs_tmp datafile '/oradata/orcl/tbs_tmp.dbf' size 10m autoextend on;
SQL> create user test identified by test;
SQL> grant dba to test;
SQL> conn test/test
--基于新的數據文件創建對象tb_tmp
SQL> create table tb_tmp tablespace tbs_tmp as select * from dba_objects;
SQL> select count(*) from TB_TMP;
COUNT(*)
----------
86953
SQL> col file_name format a60
SQL> select file_id,file_name from dba_data_files where tablespace_name='TBS_TMP';
FILE_ID FILE_NAME
---------- ------------------------------------------------------------
6 /oradata/orcl/tbs_tmp.dbf
--表對象tb_tmp上的信息,包含對應的文件信息,頭部塊,總塊數
SQL> col SEGMENT_NAME for a20
SQL> select segment_name , header_file , header_block,blocks
from dba_segments
where segment_name = 'TB_TMP' and owner='TEST';
SEGMENT_NAME HEADER_FILE HEADER_BLOCK BLOCKS
-------------------- ----------- ------------ ----------
TB_TMP 6 130 1280
2,模擬壞塊
此處模擬物理壞塊,注意,在沒有備份的情況下,物理壞塊基本無救,但是我們需要對壞塊進行跳過處理。
破壞137 數據塊的內容seek=n從輸出文件開頭跳過 n個blocks 個塊后再開始復制。
conv=notrunc不截短輸出文件
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/tbs_tmp.dbf bs=8192 conv=notrunc seek=137 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000214285 s, 38.2 MB/s
--清空buffer cache
SQL> alter system flush buffer_cache;
--查詢表對相 tb_tmp,收到ORA-01578的報錯
SQL> select count(*) from tb_tmp;
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 137)
ORA-01110: data file 6: '/oradata/orcl/tbs_tmp.dbf'
同時,alert文件也收到對應的報錯信息
Hex dump of (file 6, block 137) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_44345.trc
Corrupt block relative dba: 0x01800089 (file 6, block 137)
Completely zero block found during multiblock buffer read
Reading datafile '/oradata/orcl/tbs_tmp.dbf' for corruption at rdba: 0x01800089 (file 6, block 137)
Reread (file 6, block 137) found same corrupt data (no logical check)
Tue Aug 01 14:30:34 2023
Corrupt Block Found
TSN = 7, TSNAME = TBS_TMP
RFN = 6, BLK = 137, RDBA = 25165961
OBJN = 88605, OBJD = 88605, OBJECT = TB_TMP, SUBOBJECT =
SEGMENT OWNER = TEST, SEGMENT TYPE = Table Segment
Corrupt Block Found
TSN = 7, TSNAME = TBS_TMP
RFN = 6, BLK = 137, RDBA = 25165961
OBJN = 88605, OBJD = 88605, OBJECT = TB_TMP, SUBOBJECT =
SEGMENT OWNER = TEST, SEGMENT TYPE = Table Segment
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_44345.trc (incident=5025):
ORA-01578: ORACLE data block corrupted (file # 6, block # 137)
ORA-01110: data file 6: '/oradata/orcl/tbs_tmp.dbf'
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5025/orcl_ora_44345_i5025.trc
Tue Aug 01 14:30:35 2023
Sweep [inc][5025]: completed
Hex dump of (file 6, block 137) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5025/orcl_m000_44572_i5025_a.trc
Corrupt block relative dba: 0x01800089 (file 6, block 137)
Completely zero block found during validation
Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data
Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data
Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data
Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data
Reread of blocknum=137, file=/oradata/orcl/tbs_tmp.dbf. found same corrupt data
Tue Aug 01 14:30:35 2023
Dumping diagnostic data in directory=[cdmp_20230801143035], requested by (instance=1, osid=44345), summary=[incident=5025].
Checker run found 1 new persistent data failures
--此時dbv命令已經可以檢查出壞塊了:
[oracle@orcl:/home/oracle]$ dbv file=/oradata/orcl/tbs_tmp.dbf
DBVERIFY: Release 11.2.0.4.0 - Production on Tue Aug 1 14:32:15 2023
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
DBVERIFY - Verification starting : FILE = /oradata/orcl/tbs_tmp.dbf
Page 137 is marked corrupt
Corrupt block relative dba: 0x01800089 (file 6, block 137)
Completely zero block found during dbv:
DBVERIFY - Verification complete
Total Pages Examined : 1408
Total Pages Processed (Data) : 1241
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 155
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 11
Total Pages Marked Corrupt : 1
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 997436 (0.997436)
--通過backup validate檢查數據文件,看是否還有其它block損壞
RMAN> backup validate check logical datafile 6;
Starting backup at 01-AUG-23
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=39 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00006 name=/oradata/orcl/tbs_tmp.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
6 FAILED 0 139 1536 997839
File Name: /oradata/orcl/tbs_tmp.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1241
Index 0 0
Other 1 156
validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_44745.trc for details
Finished backup at 01-AUG-23
--查詢視圖v$database_block_corruption,提示有壞塊,可以看到該文件只有一個數據塊損壞。注意該視圖可能不會返回任何數據,如無返回,先在RMAN執行backup validate。
SQL> select * from V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
6 137 1 0 ALL ZERO
--查看壞塊的對象,這里很明顯是個表對象,如果是索引可以重建,如果是表,則有可能會丟數據
SQL> set linesize 200
SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME
FROM DBA_EXTENTS A
WHERE FILE_ID = 6
AND 137 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
/
OWNER SEGMENT_NAME SEGMENT_TYPE TABLESPACE_NAME
------------------------------ -------------------- ------------------ ------------------------------
TEST TB_TMP TABLE TBS_TMP
3,跳過壞塊
--產生壞塊的時候,exp導出的時候會發現壞塊同時報錯無法導出
[oracle@orcl:/home/oracle]$ exp test/test tables=tb_tmp file=tb_tmp.dump
EXP-00056: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 6, block # 137)
ORA-01110: data file 6: '/oradata/orcl/tbs_tmp.dbf'
Export terminated successfully with warnings.
--我們可以參考MOS中提供的幾種方法,任選一種來處理:
SQL> select owner,table_name,skip_corrupt from dba_tables where table_name='TB_TMP';
OWNER TABLE_NAME SKIP_COR
------------------------------ --------------- --------
TEST TB_TMP DISABLED
--這里先采用DBMS_REPAIR.SKIP_CORRUPT_BLOCKS程序來將表上進行SKIP CORRUPT標記跳過。
[oracle@orcl:/archivelog]$ sqlplus / as sysdba
BEGIN
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
(
SCHEMA_NAME => 'TEST',
OBJECT_NAME => 'TB_TMP',
OBJECT_TYPE => dbms_repair.table_object,
FLAGS => dbms_repair.SKIP_FLAG
);
END;
/
PL/SQL procedure successfully completed.
SQL> select owner,table_name,skip_corrupt from dba_tables where table_name='TB_TMP';
OWNER TABLE_NAME SKIP_COR
------------------------------ ------------------------------ --------
TEST TB_TMP ENABLED
4,驗證
上面將TB_TMP表標記壞塊后,查詢之前報錯的SQL句就正常了。
SQL> select count(*) from tb_tmp;
COUNT(*)
----------
86873
再次導出,沒有報錯
[oracle@orcl:/home/oracle]$ exp test/test tables=tb_tmp file=tb_tmp.dump
About to export specified tables via Conventional Path ...
. . exporting table TB_TMP 86873 rows exported
Export terminated successfully without warnings.
--取消表上的SKIP CORRUPT標記
[oracle@orcl:/archivelog]$ sqlplus / as sysdba
BEGIN
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
(
SCHEMA_NAME => 'TEST',
OBJECT_NAME => 'TB_TMP',
FLAGS => dbms_repair.NOSKIP_FLAG
);
END;
/
PL/SQL procedure successfully completed.
SQL> select count(*) from tb_tmp;
select count(*) from tb_tmp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 137)
ORA-01110: data file 6: '/oradata/orcl/tbs_tmp.dbf'
后臺alert繼續報錯
--刪除表
SQL> conn test/test
SQL> drop table TB_TMP;
--導入表
[oracle@orcl:/home/oracle]$ imp test/test tables=tb_tmp file=tb_tmp.dump
Export file created by EXPORT:V11.02.00 via conventional path
import done in AL32UTF8 character set and AL16UTF16 NCHAR character set
. importing TEST's objects into TEST
. importing TEST's objects into TEST
. . importing table "TB_TMP" 86873 rows imported
Import terminated successfully without warnings.
SQL> select count(*) from tb_tmp;
COUNT(*)
----------
86873
3.3、模擬blockrecover方法的恢復
1,準備數據
--創建用于演示的data file
[oracle@orcl:/home/oracle]$ sqlplus / as sysdba
SQL> create tablespace rman_tmp datafile '/oradata/orcl/rman_tmp.dbf' size 100m autoextend on;
SQL> create user test01 identified by test01;
SQL> grant dba to test01;
SQL> conn test01/test01;
--基于新的數據文件創建表對象tb_rman
SQL> create table tb_rman tablespace rman_tmp as select * from dba_objects;
SQL> select count (*) from tb_rman;
COUNT(*)
----------
86953
SQL> col file_name format a60
SQL> select file_id,file_name from dba_data_files where tablespace_name='RMAN_TMP';
FILE_ID FILE_NAME
---------- ------------------------------------------------------------
6 /oradata/orcl/rman_tmp.dbf
--表對象tb_rman上的信息,包含對應的文件信息,頭部塊,總塊數
SQL> col SEGMENT_NAME for a20
SQL> select segment_name , header_file , header_block,blocks
from dba_segments
where segment_name = 'TB_RMAN' and owner='TEST01';
SEGMENT_NAME HEADER_FILE HEADER_BLOCK BLOCKS
-------------------- ----------- ------------ ----------
TB_RMAN 6 130 1280
2,RMAN備份數據庫
[oracle@orcl:/home/oracle]$ rman target /
RMAN> backup full database;
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T160510_ldkh5q28_.bkp tag=TAG20230801T160510 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_ncsnf_TAG20230801T160510_ldkh674r_.bkp tag=TAG20230801T160510 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 01-AUG-23
3,模擬單個壞塊
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=130 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000152197 s, 53.8 MB/s
--清空buffer cache
SQL> alter system flush buffer_cache;
--查詢表tb_rman,收到ORA-01578報錯
SQL> select count (*) from tb_rman;
select count (*) from tb_rman
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 130)
ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf'
--同時alert也有報錯日志輸出
Hex dump of (file 6, block 130) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_46480.trc
Corrupt block relative dba: 0x01800082 (file 6, block 130)
Completely zero block found during buffer read
Reading datafile '/oradata/orcl/rman_tmp.dbf' for corruption at rdba: 0x01800082 (file 6, block 130)
Reread (file 6, block 130) found same corrupt data (no logical check)
Tue Aug 01 16:35:04 2023
Corrupt Block Found
TSN = 8, TSNAME = RMAN_TMP
RFN = 6, BLK = 130, RDBA = 25165954
OBJN = 88611, OBJD = 88611, OBJECT = TB_RMAN, SUBOBJECT =
SEGMENT OWNER = TEST01, SEGMENT TYPE = Table Segment
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_46480.trc (incident=5027):
ORA-01578: ORACLE data block corrupted (file # 6, block # 130)
ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf'
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5027/orcl_ora_46480_i5027.trc
Tue Aug 01 16:35:05 2023
Sweep [inc][5027]: completed
Hex dump of (file 6, block 130) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5027/orcl_m000_46980_i5027_a.trc
Corrupt block relative dba: 0x01800082 (file 6, block 130)
Completely zero block found during validation
Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data
Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data
Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data
Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data
Reread of blocknum=130, file=/oradata/orcl/rman_tmp.dbf. found same corrupt data
Hex dump of (file 6, block 130) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5027/orcl_m000_46980_i5027_a.trc
Corrupt block relative dba: 0x01800082 (file 6, block 130)
Completely zero block found during buffer read
Reading datafile '/oradata/orcl/rman_tmp.dbf' for corruption at rdba: 0x01800082 (file 6, block 130)
Reread (file 6, block 130) found same corrupt data (no logical check)
Corrupt Block Found
TSN = 8, TSNAME = RMAN_TMP
RFN = 6, BLK = 130, RDBA = 25165954
OBJN = -1, OBJD = 88611, OBJECT = RMAN_TMP, SUBOBJECT =
SEGMENT OWNER = , SEGMENT TYPE = Temporary Segment
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_46980.trc (incident=5049):
ORA-01578: ORACLE data block corrupted (file # 6, block # 130)
ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf'
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_5049/orcl_m000_46980_i5049.trc
Tue Aug 01 16:35:06 2023
Dumping diagnostic data in directory=[cdmp_20230801163506], requested by (instance=1, osid=46480), summary=[incident=5027].
Checker run found 1 new persistent data failures
Dumping diagnostic data in directory=[cdmp_20230801163507], requested by (instance=1, osid=46980 (M000)), summary=[incident=5049].
----查詢視圖v$database_block_corruption,提示有壞塊,注意該視圖可能不會返回任何數據,如無返回,先執行backup validate
SQL> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
6 130 1 0 ALL ZERO
4,blockrecover恢復單個壞塊
[oracle@orcl:/home/oracle]$ rman target /
RMAN> blockrecover datafile 6 block 130;
Starting recover at 01-AUG-23
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=47 device type=DISK
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T162916_ldkjlwgs_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T162916_ldkjlwgs_.bkp tag=TAG20230801T162916
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01
starting media recovery
media recovery complete, elapsed time: 00:00:01
Finished recover at 01-AUG-23
--再次查詢表tb_rman正常無報錯
SQL> select count (*) from tb_rman;
COUNT(*)
----------
86953
5,模擬多個數據壞塊
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=133 count=1
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=143 count=1
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=153 count=1
SQL> alter system flush buffer_cache;
--下面提示塊133被損壞,注意我們損壞了多塊數據塊,但查詢時,從塊號最小的開始提示,如133被修復后還有壞塊則繼續提示133之后的壞塊
SQL> select count (*) from tb_rman;
select count (*) from tb_rman
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 133)
ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf'
后臺alert也只報了133塊的錯誤
ORA-01578: ORACLE data block corrupted (file # 6, block # 133)
ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf'
--查詢視圖v$database_block_corruption無任何記錄
SQL> select * from v$database_block_corruption;
no rows selected
--下面使用backup validate來校驗數據文件
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
6 FAILED 0 11403 12800 1002692
File Name: /oradata/orcl/rman_tmp.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1239
Index 0 0
Other 3 158
--再次查詢v$database_block_corruption,表明有3個損壞的塊
SQL> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
6 133 1 0 ALL ZERO
6 143 1 0 ALL ZERO
6 153 1 0 ALL ZERO
--使用dbv工具來校驗數據文件,查出3個壞塊
[oracle@orcl:/home/oracle]$ dbv file=/oradata/orcl/rman_tmp.dbf
DBVERIFY: Release 11.2.0.4.0 - Production on Tue Aug 1 16:48:21 2023
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
DBVERIFY - Verification starting : FILE = /oradata/orcl/rman_tmp.dbf
Page 133 is marked corrupt
Corrupt block relative dba: 0x01800085 (file 6, block 133)
Completely zero block found during dbv:
Page 143 is marked corrupt
Corrupt block relative dba: 0x0180008f (file 6, block 143)
Completely zero block found during dbv:
Page 153 is marked corrupt
Corrupt block relative dba: 0x01800099 (file 6, block 153)
Completely zero block found during dbv:
DBVERIFY - Verification complete
Total Pages Examined : 12800
Total Pages Processed (Data) : 1239
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 155
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 11403
Total Pages Marked Corrupt : 3
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 1002692 (0.1002692)
6,blockrecover恢復多個壞塊
--下面直接使用blockrecover corruption list來恢復,剛剛被校驗的壞塊都會被恢復
[oracle@orcl:/home/oracle]$ rman target /
RMAN> blockrecover corruption list;
Starting recover at 01-AUG-23
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=42 device type=DISK
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T162916_ldkjlwgs_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_01/o1_mf_nnndf_TAG20230801T162916_ldkjlwgs_.bkp tag=TAG20230801T162916
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01
starting media recovery
media recovery complete, elapsed time: 00:00:01
Finished recover at 01-AUG-23
--校驗結果,查詢表tb_rman無報錯
SQL> select count (*) from tb_rman;
COUNT(*)
----------
86953
SQL> col OWNER for a20
SQL> col OBJECT_NAME for a20
SQL> set linesize 200
SQL> select dbms_rowid.rowid_object(rowid) object_id,dbms_rowid.rowid_relative_fno(rowid) file_id,
dbms_rowid.rowid_block_number(rowid) block_id,owner,object_name,object_id
from TEST01.tb_rman where dbms_rowid.rowid_block_number(rowid)=163 and rownum<=2;
OBJECT_ID FILE_ID BLOCK_ID OWNER OBJECT_NAME OBJECT_ID
---------- ---------- ---------- -------------------- -------------------- ----------
88611 6 163 PUBLIC GV$BACKUP_SET 2364
88611 6 163 SYS GV_$BACKUP_PIECE 2365
3.4、壞塊的對象定位與影響
--下面我們查詢塊號為163上的對象
SQL> set linesize 200
SQL> col OBJECT_NAME for a20
SQL> select dbms_rowid.rowid_object(rowid) object_id,dbms_rowid.rowid_relative_fno(rowid) file_id,
dbms_rowid.rowid_block_number(rowid) block_id,owner,object_name,object_id
from test01.tb_rman where dbms_rowid.rowid_block_number(rowid)=163 and rownum<=2;
OBJECT_ID FILE_ID BLOCK_ID OWNER OBJECT_NAME OBJECT_ID
---------- ---------- ---------- ------------------------------ -------------------- ----------
88611 6 163 PUBLIC GV$BACKUP_SET 2364
88611 6 163 SYS GV_$BACKUP_PIECE 2365
--使用上面的方法,我們損塊塊163,173
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=163 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000175403 s, 46.7 MB/s
[oracle@orcl:/home/oracle]$ dd if=/dev/zero of=/oradata/orcl/rman_tmp.dbf bs=8192 conv=notrunc seek=173 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000140643 s, 58.2 MB/s
--對于壞塊對象無法進行聚合匯總等操作
SQL> alter system flush buffer_cache;
SQL> select count(*) from test01.tb_rman;
select count(*) from test01.tb_rman
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 163)
ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf'
--對于壞塊上的記錄無法被查詢,我們使用基于之前查詢到的OBJECT_ID來查詢
SQL> select owner,object_name,object_id from test01.tb_rman where object_id in(2364,2365);
select owner,object_name,object_id from test01.tb_rman where object_id in(2364,2365)
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 163)
ORA-01110: data file 6: '/oradata/orcl/rman_tmp.dbf'
--如下面的查詢,位于損壞塊上的數據無法被查詢到,但對于未損壞的依舊可以查詢。下面的查詢時塊161上的對象
SQL> select owner,object_name,object_id from test01.tb_rman
2 where dbms_rowid.rowid_block_number(rowid)=161 and rownum<3;
OWNER OBJECT_NAME OBJECT_ID
------------------------------ -------------------- ----------
SYS GV_$LATCHNAME 2203
PUBLIC GV$LATCHNAME 2204
--定位受損塊所對應的對象
SQL> col SEGMENT_NAME for a20
SQL> run get_obj_name_from_corrupt_block
1 SELECT tablespace_name,
2 segment_type,
3 owner,
4 segment_name,
5 partition_name
6 FROM dba_extents
7* WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1
Enter value for file_id: 6
Enter value for block_id: 163
old 7: WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1
new 7: WHERE file_id = 6 AND 163 BETWEEN block_id AND block_id + blocks - 1
TABLESPACE_NAME SEGMENT_TYPE OWNER SEGMENT_NAME PARTITION_NAME
------------------------------ ------------------ ------------------------------ -------------------- ------------------------------
RMAN_TMP TABLE TEST01 TB_RMAN
SQL> run get_obj_name_from_corrupt_block
1 SELECT tablespace_name,
2 segment_type,
3 owner,
4 segment_name,
5 partition_name
6 FROM dba_extents
7* WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1
Enter value for file_id: 6
Enter value for block_id: 173
old 7: WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1
new 7: WHERE file_id = 6 AND 173 BETWEEN block_id AND block_id + blocks - 1
TABLESPACE_NAME SEGMENT_TYPE OWNER SEGMENT_NAME PARTITION_NAME
------------------------------ ------------------ ------------------------------ -------------------- --------------------------
RMAN_TMP TABLE TEST01 TB_RMAN
--對于損壞的數據文件,缺省情況下,不能對其進行備份,如下
RMAN> backup datafile 6 tag='corruption';
Starting backup at 02-AUG-23
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=40 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00006 name=/oradata/orcl/rman_tmp.dbf
channel ORA_DISK_1: starting piece 1 at 02-AUG-23
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/02/2023 09:51:08
ORA-19566: exceeded limit of 0 corrupt blocks for file /oradata/orcl/rman_tmp.dbf
--需要設定允許損壞塊的數量之后才能進行備份
RMAN> run{
2> set maxcorrupt for datafile 6 to 2;
3> backup datafile 6 tag='corruption';
4> }
executing command: SET MAX CORRUPT
Starting backup at 02-AUG-23
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00006 name=/oradata/orcl/rman_tmp.dbf
channel ORA_DISK_1: starting piece 1 at 02-AUG-23
channel ORA_DISK_1: finished piece 1 at 02-AUG-23
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2023_08_02/o1_mf_nnndf_CORRUPTION_ldmfpw3v_.bkp tag=CORRUPTION comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 02-AUG-23
--查看備份信息如下,應在修復壞塊后重新備份以避免由于保留策略導致先前可用的備份被aged out
RMAN> list backup summary;
List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1 B F A DISK 01-AUG-23 1 1 NO TAG20230801T160510
2 B F A DISK 01-AUG-23 1 1 NO TAG20230801T160510
3 B F A DISK 01-AUG-23 1 1 NO TAG20230801T162916
4 B F A DISK 01-AUG-23 1 1 NO TAG20230801T162916
5 B F A DISK 02-AUG-23 1 1 NO CORRUPTION
3.5、總結
- 對于受損的數據塊,僅僅壞塊上的數據無法被查詢或讀取,其余正常塊的數據依舊可以使用。
- 對于受損的表對象進行聚合等相關運算時收到錯誤提示,因為壞塊上的數據無法被統計。如果你聚合的是索引列,索引未損壞的情形則可正常返回。
- 可以基于RMAN可用的備份文件實現塊介質恢復,其數據文件無需offline,開銷最小,影響最小。
- 對于多個數據塊的損壞,先執行backup validate校驗數據庫或相應的數據文件以便標記受損的壞塊后,填充v$database_block_corruption以及后續恢復。
- 對于使用backup validate 校驗后的情形,壞塊恢復時可以直接使用blockrecover corruption list一次性恢復所有的壞塊。
- 缺省情況下,存在壞塊的數據文件無法成功備份,也會導致自動備份腳本失敗。




