故障描述
最近遷移中遇到一個free block 問題。模擬記錄。
xtts遷移備份報錯(測試環(huán)境數(shù)據(jù)):
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/24/2020 09:45:47
ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/oracle/oradata/orcl10/t1.dbf
原因分析
之前確實有壞塊,因為沒有rman備份,無法直接修復。使用event 10231和dbms.repair跳過,遷移前也已經(jīng)drop。這個網(wǎng)上教程很多,處理方式也有很多,但都沒有后續(xù)。使用上述辦法數(shù)據(jù)層次上已經(jīng)正常,其實數(shù)據(jù)修復后這個塊只是不在具體對象中,是一個free block,在下一次format前仍是壞塊。
xtts調用rman,rman對空塊不備份并不是直接跳過,也會檢測。
!!!再次說明rman備份的重要性。有一個rman就不用這么大費周章。
處理思路:
1、 bbed制造一個空塊替換 (慎用)
2、 allocate extent datafile方式覆蓋 (推薦)
3、 resize或10g環(huán)境中的truncate或許有用,但都是有限制的,本來也不是針對這個問題的處理方式。成功率并不高
故障模擬
說的有點繞,模擬一下就完事了。
模擬壞塊及沒有rman備份的修復
1、新建一個表插入6行數(shù)據(jù)。通過 rowid查看測試數(shù)據(jù)的文件號(rel_fno)、塊號( blockno ,)和行號(rowno)
select rowid,
dbms_rowid.rowid_relative_fno(rowid) rel_fno,
dbms_rowid.rowid_block_number(rowid) blockno,
dbms_rowid.rowid_row_number(rowid) rowno
from t11;
ROWID REL_FNO BLOCKNO ROWNO
------------------ ---------- ---------- ----------
AAAMpCAAFAAAAAOAAA 5 14 0
AAAMpCAAFAAAAAOAAB 5 14 1
AAAMpCAAFAAAAAOAAC 5 14 2
AAAMpCAAFAAAAAOAAD 5 14 3
AAAMpCAAFAAAAAOAAE 5 14 4
AAAMpCAAFAAAAAOAAF 5 14 5
2、使用bbed對file#=5,block#=14進行破壞
bbed網(wǎng)上教程也很多,簡單記錄下:
set dba 5,14
dump /v dba 5,14 offset 0
modify /x 12345678 dba 5,14 offset 0 --修改
sum dba 5,14 apply 提交
3、測試數(shù)據(jù)是否正常
alter system flush buffer_cache; -數(shù)據(jù)已經(jīng)緩存到內存中,手工刷新,不然不會報錯
查詢提示壞塊:
09:42:48 SQL> select * from t11;
select * from t11
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 14)
ORA-01110: data file 5: '/u01/app/oracle/oradata/orcl10/t1.dbf'
4、修復壞塊
修復壞塊方法很多:
- event 10231
- dbms.repair
- truncate 重新導入
- drop 重建
- rebulid 索引重建
09:42:57 SQL> truncate table t11;
Table truncated.
09:43:04 SQL> select * from t11;
no rows selected
確實可以了查詢了,之前查行號顯示6行數(shù)據(jù),這里0行,沒有備份的修復代價就是這部分數(shù)據(jù)丟失。再次提醒備份的重要性。
網(wǎng)上大部分的教程也都是這一部分,基本到這就結束了。這個工作真的徹底完成了嗎?
5、dbv與rman檢測
- DBA檢測
[oracle@rhel6 ~]$ dbv file='/u01/app/oracle/oradata/orcl10/t1.dbf'
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Nov 24 09:47:02 2020
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl10/t1.dbf
Page 14 is marked corrupt
Corrupt block relative dba: 0x0140000e (file 5, block 14)
Bad header found during dbv:
Data in bad block:
type: 18 format: 4 rdba: 0x0140000e
last change scn: 0x0000.000959ff seq: 0x8 flg: 0x06
spare1: 0x56 spare2: 0x78 spare3: 0x0
consistency value in tail: 0x59ff0608
check value in block header: 0x588c
computed block checksum: 0x0
DBVERIFY - Verification complete
Total Pages Examined : 1280
Total Pages Processed (Data) : 4
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 11
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 1264
**Total Pages Marked Corrupt : 1**
Total Pages Influx : 0
Highest block SCN : 613140 (0.613140)
- RMAN檢測
backup validate check logical datafile 5;
--校驗后查詢
09:50:36 SQL> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
5 14 1 0 CORRUPT
--還不死心,直接備份:
RMAN> backup datafile 5;
Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: starting piece 1 at 24-NOV-20
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/24/2020 09:45:47
ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/oracle/oradata/orcl10/t1.dbf
基本驗證了上述理論。重建方式或手工標記只是解決了數(shù)據(jù)上的。壞塊還是存在的,被置為free block。如果確認以后不再使用rman的情況下,也可以這么做。
模擬故障處理
1、根據(jù)以上dbv或rman信息確定壞塊信息
- 文件號: FILE#=5
- 塊 號: BLOCK#=14
- 文件名: /u01/app/oracle/oradata/orcl10/t1.dbf --T1表空間
2、檢查塊是否是某個對象的一部分
這個查詢在大庫中會很慢,嘗試從v$bh中抓取失敗,等一會。
select segment_name, segment_type, owner
from dba_extents
where file_id = 5
and 14 between block_id
and block_id + blocks -1;
no rows selected --dba_extents已經(jīng)查不到了,不屬于任何對象
3、如果塊不屬于任何對象,查詢dba_free_space確認塊是否屬于數(shù)據(jù)文件的可用空間
Select * from dba_free_space where file_id= 5
and 14 between block_id and block_id + blocks -1;
TABLESPACE_NAME FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
T1 5 9 65536 8 5
運行以下SQL,確認塊是位于可用空間中還是已占用空間中
這sql也很慢,模擬的時候可以執(zhí)行找到原因,方便理解。生產(chǎn)修復可以略過。
set lines 200 pages 10000
col segment_name format a30
SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
, greatest(e.block_id, c.block#) corr_start_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
- greatest(e.block_id, c.block#) + 1 blocks_corrupted
, null description
FROM dba_extents e, v$database_block_corruption c
WHERE e.file_id = c.file#
AND e.block_id <= c.block# + c.blocks - 1
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
, header_block corr_start_block#
, header_block corr_end_block#
, 1 blocks_corrupted
, 'Segment Header' description
FROM dba_segments s, v$database_block_corruption c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
, greatest(f.block_id, c.block#) corr_start_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
- greatest(f.block_id, c.block#) + 1 blocks_corrupted
, 'Free Block' description
FROM dba_free_space f, v$database_block_corruption c
WHERE f.file_id = c.file#
AND f.block_id <= c.block# + c.blocks - 1
AND f.block_id + f.blocks - 1 >= c.block#
ORDER BY file#, corr_start_block#;
--最后列,free block
OWNER SEGMENT_TYPE SEGMENT_NAME PARTITION_NAME FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED DESCRIPTION
------------------------------ ------------------ ------------------------------ ------------------------------ ---------- ----------------- --------------- ---------------- --------------
5 14 14 1 Free Block
4、創(chuàng)建中間表
創(chuàng)建一個表 ,該表位于出現(xiàn)壞塊的表空間T1中,使用nologging選項,避免生成redo
conn t1/t1
create table s (n number,c varchar2(4000)) nologging;
--驗證表是否創(chuàng)建在了正確的表空間
select segment_name,tablespace_name from user_segments where segment_name='S' ;
SEGMENT_NAME TABLESPACE_NAME
------------------------------ ------------------------------
S T1
5、在表上創(chuàng)建觸發(fā)器
根據(jù)dbv的結果輸入變量,塊號14 ,數(shù)據(jù)文件5
conn / as sysdba
CREATE OR REPLACE TRIGGER corrupt_trigger
AFTER INSERT ON t1.s
REFERENCING OLD AS p_old NEW AS new_p
FOR EACH ROW
DECLARE
corrupt EXCEPTION;
BEGIN
IF (dbms_rowid.rowid_block_number(:new_p.rowid)=&blocknumber)
and (dbms_rowid.rowid_relative_fno(:new_p.rowid)=&filenumber) THEN
RAISE corrupt;
END IF;
EXCEPTION
WHEN corrupt THEN
RAISE_APPLICATION_ERROR(-20000, 'Corrupt block has been formatted');
END;
/
6、計算分配空間
Select BYTES/1024 from dba_free_space where file_id=5 and 1280 between block_id and block_id + blocks -1;
BYTES/1024
----------
64
為受影響的數(shù)據(jù)文件中的表分配空間,這里計算下來是64K。
7、 關閉datafile 自動擴展
先查詢下,壞塊修復在改回原值保持一致。
select FILE_NAME,AUTOEXTENSIBLE from dba_data_files where FILE_NAME='/u01/app/oracle/oradata/orcl10/t1.dbf';
alter database datafile '/u01/app/oracle/oradata/orcl10/t1.dbf' autoextend off;
8、分配空間
alter table t1.s allocate extent (DATAFILE '/u01/app/oracle/oradata/orcl10/t1.dbf' SIZE 64K);
如果在這個數(shù)據(jù)文件中有多個空閑extent,或者計算的分配空間很大,也可以使用這個循環(huán):
BEGIN
for i in 1..1000000 loop
EXECUTE IMMEDIATE 'alter table t1.s allocate extent (DATAFILE '||'''/u01/app/oracle/oradata/orcl10/t1.dbf''' ||'SIZE 64K) ';
end loop;
end ;
/
使用這個循環(huán)也有一個好處,第9步中的查詢會很慢(測試環(huán)境可能幾秒鐘跑完了,實際環(huán)境中跑了接近2小時),使用for 循環(huán)可以避免多次查詢,由于datafile自動擴展關了,受限于datafile 32G并不會造成太多數(shù)據(jù)。
9、確認壞塊成為新建表的一部分
select segment_name, segment_type, owner
from dba_extents
where file_id = 5
and 1280 between block_id
and block_id + blocks -1 ;
SEGMENT_NAME SEGMENT_TYPE OWNER
------------------------------ ------------------ ------------------------------
S TABLE T1
10、向表中插入數(shù)據(jù)格式化塊
BEGIN
FOR i IN 1..1000000000 LOOP
INSERT /*+ APPEND */ INTO t1.s select i, lpad('REFORMAT',3092, 'R') from dual;
commit ;
END LOOP;
END;
每向表中插入一行就會觸發(fā)觸發(fā)器,當向壞塊中插入第一行數(shù)據(jù)的時候,會產(chǎn)生ORA-2000異常 。
ERROR at line 1:
ORA-20000: Corrupt block has been formatted
ORA-06512: at “SYS.CORRUPT_TRIGGER”, line 10
ORA-04088: error during execution of trigger ‘SYS.CORRUPT_TRIGGER’
ORA-06512: at line 4
11、 重新驗證
直接dbv不行,要先使用rman在用dbv就正常了
- RMAN
RMAN> backup validate check logical datafile 5;
Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=159 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 24-NOV-20
SQL> select * from v$database_block_corruption;
no rows selected
RMAN> backup datafile 5;
Starting backup at 24-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=145 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl10/t1.dbf
channel ORA_DISK_1: starting piece 1 at 24-NOV-20
channel ORA_DISK_1: finished piece 1 at 24-NOV-20
piece handle=/u01/app/oracle/product/10.2.0/db_1/dbs/05vganpf_1_1 tag=TAG20201124T110415 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 24-NOV-20
- DBV
[oracle@rhel6 ~]$ dbv file='/u01/app/oracle/oradata/orcl10/t1.dbf'
DBVERIFY: Release 10.2.0.4.0 - Production on Tue Nov 24 11:03:53 2020
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl10/t1.dbf
DBVERIFY - Verification complete
Total Pages Examined : 7680
Total Pages Processed (Data) : 1147
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 134
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 6399
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 631762 (0.631762)
12、清理
drop table,切換日志和checkpoint后刪除觸發(fā)器
drop table scott.s;
alter system switch logfile;
alter system switch logfile;
alter system checkpoint;
DROP trigger corrupt_trigger ;




