有客户的数据库由于不当操作导致asm磁盘头损坏,进行的操作命令类似ddif/dev/zeroof/dev/dm-29bs1024K count10asm磁盘组无法mount,提示ORA-15042SQL ALTER DISKGROUP DATA MOUNT /* asm agent *//* {1:1712:2} */2026-05-19T21:57:16.517284-04:00NOTE: cache registered group DATA 5/0xAE103296NOTE: cache beganmount(first) of group DATA 5/0xAE103296NOTE: Assigning number (5,10) to disk (/dev/dm-29)…………NOTE: Assigning number (5,18) to disk (/dev/dm-15)NOTE: Assigning number (5,0) to disk (/dev/dm-7)NOTE: Assigning number (5,15) to disk (/dev/dm-5)2026-05-19T21:57:16.650481-04:00cluster guid (b216d47e2bf86f6aff34a36119b0c161) generatedforPST Hbeatforinstance 12026-05-19T21:57:22.659262-04:00NOTE: GMON heartbeatingforgrp 5 (DATA)GMON querying group 5 at 28forpid 42, osid 129962026-05-19T21:57:22.661447-04:00NOTE: Assigning number (5,14) to disk ()2026-05-19T21:57:22.662476-04:00GMON querying group 5 at 29forpid 42, osid 129962026-05-19T21:57:22.663144-04:00NOTE: cache dismounting (clean) group 5/0xAE103296(DATA)NOTE: messaging CKPT to quiesce pins Unix process pid: 12996, image: oracledlycdb1 (TNS V1-V3)NOTE: dbwr not being msgd to dismountNOTE: LGWR not being messaged to dismountNOTE: cache dismounted group 5/0xAE103296(DATA)NOTE: cache endingmount(fail) of group DATA number5 incarn0xae103296NOTE: cache deleting contextforgroup DATA 5/0xae1032962026-05-19T21:57:22.720673-04:00GMON dismounting group 5 at 30forpid 42, osid 129962026-05-19T21:57:22.721055-04:00NOTE: Disk DATA_0000inmode 0x7f markedforde-assignmentNOTE: Disk DATA_0001inmode 0x7f markedforde-assignment…………NOTE: Disk DATA_0013inmode 0x7f markedforde-assignmentNOTE: Disk DATA_0015inmode 0x7f markedforde-assignmentNOTE: Disk DATA_0016inmode 0x7f markedforde-assignmentNOTE: Disk DATA_0017inmode 0x7f markedforde-assignmentNOTE: Disk DATA_0018inmode 0x7f markedforde-assignmentERROR: diskgroup DATA was not mountedORA-15032: not all alterations performedORA-15040: diskgroup is incompleteORA-15042: ASM disk14is missing from group number52026-05-19T21:57:22.745140-04:00ERROR: ALTER DISKGROUP DATA MOUNT /* asm agent *//* {1:1712:2} */由于客户这个是19c版本,直接使用备份au还原,然后mount磁盘组成功SQL alter diskgroup datamount2026-05-19T22:40:59.13765708:00NOTE: cache registered group DATA 2/0xB126DA41NOTE: cache beganmount(first) of group DATA 2/0xB126DA41NOTE: Assigning number (2,14) to disk (/dev/dm-29)NOTE: Assigning number (2,8) to disk (/dev/dm-28)…………NOTE: Assigning number (2,2) to disk (/dev/dm-7)NOTE: Assigning number (2,0) to disk (/dev/dm-6)NOTE: Assigning number (2,1) to disk (/dev/dm-3)2026-05-19T22:40:59.30390808:00cluster guid (b216d47e2bf86f6aff34a36119b0c161) generatedforPST Hbeatforinstance 22026-05-19T22:41:05.31279208:00NOTE: GMON heartbeatingforgrp 2 (DATA)GMON querying group 2 at 80forpid 35, osid 530512026-05-19T22:41:05.31492508:00NOTE: cache is mounting group DATA created on 2024/11/2817:55:45NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/dm-6NOTE: 05/20/2616:41:04 DATA.F1X0 found on disk 0 au 10 fcn 0.0 datfmt 1NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/dm-3…………NOTE: cache opening disk 13 of grp 2: DATA_0013 path:/dev/dm-18NOTE: cache opening disk 14 of grp 2: DATA_0014 path:/dev/dm-29NOTE: cache opening disk 15 of grp 2: DATA_0015 path:/dev/dm-16NOTE: cache opening disk 16 of grp 2: DATA_0016 path:/dev/dm-17NOTE: cache opening disk 17 of grp 2: DATA_0017 path:/dev/dm-20NOTE: cache opening disk 18 of grp 2: DATA_0018 path:/dev/dm-212026-05-19T22:41:05.31730708:00NOTE: cache mounting (first) external redundancy group 2/0xB126DA41(DATA)2026-05-19T22:41:05.52219108:00NOTE: attached to recovery domain 22026-05-19T22:41:05.55813608:00validate pdb 2, flags x4, valid 0, pdb flags x204* validated domain 2, flags 0x200NOTE: cache recovered group 2 to fcn 0.46336611NOTE: redo buffer size is 512 blocks (2105344 bytes)2026-05-19T22:41:05.56947908:00NOTE: LGWR attempting tomountthread 1fordiskgroup 2 (DATA)NOTE: LGWR found thread 1 closed at ABA 110.6507 lock domain0 inc#0 instnum1NOTE: LGWR mounted thread 1fordiskgroup 2 (DATA)2026-05-19T22:41:05.57449908:00mntstmp2026/05/2016:41:05.5720002026-05-19T22:41:05.57485408:00NOTE: cache mounting group 2/0xB126DA41(DATA) succeededNOTE: cache endingmount(success) of group DATA number2 incarn0xb126da412026-05-19T22:41:05.61622708:00NOTE: Instance updated compatible.asm to 19.0.0.0.0forgrp 2 (DATA).2026-05-19T22:41:05.61675408:00NOTE: Instance updated compatible.asm to 19.0.0.0.0forgrp 2 (DATA).2026-05-19T22:41:05.61793208:00NOTE: Instance updated compatible.rdbms to 19.0.0.0.0forgrp 2 (DATA).2026-05-19T22:41:05.61830308:00NOTE: Instance updated compatible.rdbms to 19.0.0.0.0forgrp 2 (DATA).2026-05-19T22:41:05.64384408:00SUCCESS: diskgroup DATA was mounted虽然磁盘组mount成功,但是asm依旧在报错这个客户胆子也真够大的,这样mount起来的磁盘组,还往里面加入磁盘出发Rebalance操作12026-05-20T09:18:34.292103-04:00Errorsinfile/u01/app/grid/diag/asm/asm/ASM1/trace/ASM1_arb0_60220.trc:ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1014] [0 ! 1]ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1014] [0 ! 1]NOTE: cache repaired a corrupt block: group2(DATA) dsk14 blk1014 on disk 14 from disk14 (DATA_0014) incarn4042690154 au11 blk1014 count12026-05-20T09:18:36.721669-04:00WARNING: cachereada corrupt block: group2(DATA) dsk14 blk1015 disk14 (DATA_0014) incarn4042690154 au0 blk1015 count12026-05-20T09:18:36.721982-04:00Errorsinfile/u01/app/grid/diag/asm/asm/ASM1/trace/ASM1_arb0_60220.trc:ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1015] [0 ! 1]NOTE: a corrupted block from group DATA was dumped to/u01/app/grid/diag/asm/asm/ASM1/trace/ASM1_arb0_60220.trcWARNING: cacheread(retry) a corrupt block: group2(DATA) dsk14 blk1015 disk14 (DATA_0014) incarn4042690154 au0 blk1015 count12026-05-20T09:18:36.724279-04:00Errorsinfile/u01/app/grid/diag/asm/asm/ASM1/trace/ASM1_arb0_60220.trc:ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1015] [0 ! 1]ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] [2147483662] [1015] [0 ! 1]NOTE: cache repaired a corrupt block: group2(DATA) dsk14 blk1015 on disk 14 from disk14 (DATA_0014) incarn4042690154 au11 blk1015 count12026-05-20T09:44:20.469792-04:00NOTE: Starting expel slaveforgroup 2/0xcd67eb4(DATA)2026-05-20T09:44:20.473880-04:00NOTE: GroupBlock outside rolling migration privileged regionNOTE: requesting all-instance membership refreshforgroup22026-05-20T09:44:20.530101-04:00NOTE: membership refresh pendingforgroup 2/0xcd67eb4(DATA)2026-05-20T09:44:20.534097-04:00GMON querying group 2 at 178forpid 27, osid 701302026-05-20T09:44:20.557651-04:00SUCCESS: refreshed membershipfor2/0xcd67eb4(DATA)2026-05-20T09:44:23.489304-04:00NOTE: Attempting votingfilerefresh on diskgroup DATANOTE: Refresh completed on diskgroup DATA. No votingfilefound.2026-05-20T11:20:49.804413-04:00NOTE: stopping process ARB0NOTE: stopping process ARBA2026-05-20T11:20:51.538777-04:00SUCCESS: rebalance completedforgroup 2/0xcd67eb4(DATA)2026-05-20T23:20:51.54146208:00SUCCESS: ALTER DISKGROUP DATA ADD DISK/dev/dm-20SIZE 4194304MREBALANCE WAIT算幸运,由于ORA-15196: invalid ASM block header [kfc.c:30747] [endian_kfbh] 错误导致rebalance没有真正运行起来,从而该磁盘组没有dismount(19c这个方面确实增强不少,如果以前版本大概率会直接dismount掉)客户在这样mount的磁盘组上尝试启动库,报ORA-01578错误,无法启动成功2026-05-20T17:46:07.06079408:00Error attempting to elevate LMHBs priority: no further priority changes will be attemptedforthis process2026-05-20T17:46:07.64711408:00Undo initialization recovery: Parallel FPTR complete: start:26091908 end:26096229diff:4321 ms (4.3 seconds)Undo initialization recovery: err:0 start: 26091907 end: 26096229diff: 4322 ms (4.3 seconds)[50880] Successfully onlined Undo Tablespace 5.Undo initialization online undo segments: err:0 start: 26096229 end: 26096576diff: 347 ms (0.3 seconds)Undo initialization finished serial:0 start:26091907 end:26096591diff:4684 ms (4.7 seconds)Database Characterset is AL32UTF8No Resource Manager plan active2026-05-20T17:46:08.73411308:00Corrupt block relative dba: 0x004030ee (file1, block 12526)Completely zero block found during bufferreadReread (file1, block 12526) found same corrupt data (no logical check)2026-05-20T17:46:08.81145508:00Corrupt Block FoundTIME STAMP (GMT) 05/20/202617:46:07CONT 0, TSN 0, TSNAME SYSTEMRFN 1, BLK 12526, RDBA 4206830OBJN 37, OBJD 37, OBJECT I_OBJ2, SUBOBJECT SEGMENT OWNER SYS, SEGMENT TYPE Index SegmentErrorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc (incident800708):ORA-01578: ORACLE data block corrupted (file# 1, block # 12526)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.11867201652026-05-20T05:46:09.047772-04:00ALTER SYSTEM SET remote_listener xff-scan:11521SCOPEMEMORY SIDxff2;2026-05-20T05:46:09.049615-04:00ALTER SYSTEM SET listener_networksSCOPEMEMORY SIDxff2;2026-05-20T17:46:09.81227108:00*****************************************************************An internal routine has requested a dump of selected redo.This usually happens following a specific internal error, whenanalysis of the redo logs will help Oracle Support with thediagnosis.It is recommended that you retain all the redo logs generated (byall the instances) during the past 12 hours,incaseadditionalredo dumps are required to help with the diagnosis.*****************************************************************Corrupt block relative dba: 0x004030de (file1, block 12510)Completely zero block found during bufferreadReread (file1, block 12510) found same corrupt data (no logical check)2026-05-20T17:46:10.35057308:00Corrupt Block FoundTIME STAMP (GMT) 05/20/202617:46:09CONT 0, TSN 0, TSNAME SYSTEMRFN 1, BLK 12510, RDBA 4206814OBJN 83, OBJD 83, OBJECT DEPENDENCY$, SUBOBJECT SEGMENT OWNER SYS, SEGMENT TYPE Table SegmentErrorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc (incident800709):ORA-01578: ORACLE data block corrupted (file# 1, block # 12510)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.11867201652026-05-20T17:46:10.69436508:00Corrupt block relative dba: 0x00403000 (file1, block 12288)Completely zero block found during validationReread of blocknum12288,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12288,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12288,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12288,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12288,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataCorrupt block relative dba: 0x00403001 (file1, block 12289)Completely zero block found during validationReread of blocknum12289,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12289,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12289,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12289,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12289,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt data………………Corrupt block relative dba: 0x004030ff (file1, block 12543)Completely zero block found during validationReread of blocknum12543,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12543,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12543,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12543,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataReread of blocknum12543,fileDATA/xff/DATAFILE/system.257.1186720165. found same corrupt dataCorrupt block relative dba: 0x0040301d (file1, block 12317)Completely zero block found during bufferreadReread (file1, block 12317) found same corrupt data (no logical check)2026-05-20T17:46:12.18354508:00Corrupt Block FoundTIME STAMP (GMT) 05/20/202617:46:11CONT 0, TSN 0, TSNAME SYSTEMRFN 1, BLK 12317, RDBA 4206621OBJN 37, OBJD 37, OBJECT I_OBJ2, SUBOBJECT SEGMENT OWNER SYS, SEGMENT TYPE Index SegmentErrorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc (incident800710):ORA-01578: ORACLE data block corrupted (file# 1, block # 12317)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.1186720165Incident detailsin:/u01/app/oracle/diag/rdbms/xff/xff2/incident/incdir_800710/xff2_ora_50880_i800710.trc2026-05-20T05:46:12.371040-04:00ALTER SYSTEM SET remote_listener xff-scan:11521SCOPEMEMORY SIDxff2;2026-05-20T05:46:12.372924-04:00ALTER SYSTEM SET listener_networksSCOPEMEMORY SIDxff2;2026-05-20T17:46:13.81434608:00Errorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc:ORA-00604: error occurred at recursive SQL level 1ORA-01578: ORACLE data block corrupted (file# 1, block # 12317)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.11867201652026-05-20T17:46:13.81440708:00Errorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc:ORA-00604: error occurred at recursive SQL level 1ORA-01578: ORACLE data block corrupted (file# 1, block # 12317)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.11867201652026-05-20T17:46:13.81444208:00Error 604 happened during dbopen, shutting down databaseErrorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc (incident800711):ORA-00603: ORACLE server session terminated by fatal errorORA-01092: ORACLE instance terminated. Disconnection forcedORA-00604: error occurred at recursive SQL level 1ORA-01578: ORACLE data block corrupted (file# 1, block # 12317)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.1186720165Corrupt block relative dba: 0x00403097 (file1, block 12439)Completely zero block found during bufferreadReread (file1, block 12439) found same corrupt data (no logical check)2026-05-20T17:46:14.25904408:00Corrupt Block FoundTIME STAMP (GMT) 05/20/202617:46:13CONT 0, TSN 0, TSNAME SYSTEMRFN 1, BLK 12439, RDBA 4206743OBJN 18, OBJD 18, OBJECT OBJ$, SUBOBJECT SEGMENT OWNER SYS, SEGMENT TYPE Table SegmentErrorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_gen0_48604.trc (incident798852):ORA-01578: ORACLE data block corrupted (file# 1, block # 12439)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.11867201652026-05-20T17:46:13.81434608:00Errorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc:ORA-00604: error occurred at recursive SQL level 1ORA-01578: ORACLE data block corrupted (file# 1, block # 12317)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.11867201652026-05-20T17:46:13.81440708:00Errorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc:ORA-00604: error occurred at recursive SQL level 1ORA-01578: ORACLE data block corrupted (file# 1, block # 12317)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.11867201652026-05-20T17:46:13.81444208:00Error 604 happened during dbopen, shutting down databaseErrorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_ora_50880.trc (incident800711):ORA-00603: ORACLE server session terminated by fatal errorORA-01092: ORACLE instance terminated. Disconnection forcedORA-00604: error occurred at recursive SQL level 1ORA-01578: ORACLE data block corrupted (file# 1, block # 12317)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.1186720165Corrupt block relative dba: 0x00403097 (file1, block 12439)Completely zero block found during bufferreadReread (file1, block 12439) found same corrupt data (no logical check)2026-05-20T17:46:14.25904408:00Corrupt Block FoundTIME STAMP (GMT) 05/20/202617:46:13CONT 0, TSN 0, TSNAME SYSTEMRFN 1, BLK 12439, RDBA 4206743OBJN 18, OBJD 18, OBJECT OBJ$, SUBOBJECT SEGMENT OWNER SYS, SEGMENT TYPE Table SegmentErrorsinfile/u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_gen0_48604.trc (incident798852):ORA-01578: ORACLE data block corrupted (file# 1, block # 12439)ORA-01110: datafile1:DATA/xff/DATAFILE/system.257.11867201652026-05-20T05:46:14.436597-04:00ALTER SYSTEM SET remote_listener xff-scan:11521SCOPEMEMORY SIDxff2;2026-05-20T05:46:14.438492-04:00ALTER SYSTEM SET listener_networksSCOPEMEMORY SIDxff2;2026-05-20T17:46:15.48675808:00opiodr aborting process unknown ospid (50880) as a result of ORA-6032026-05-20T17:46:15.49870708:00ORA-603 : opitsk aborting processLicense high water mark 423USER(prelim) (ospid: 50880): terminating the instance due to ORA error 6042026-05-20T17:46:15.53674008:00opiodr aborting process unknown ospid (69547) as a result of ORA-10922026-05-20T05:46:16.321597-04:00ORA-1092 : opitsk aborting process这里基本上可以看出来是由于在数据库启动过程中递归调用一些sql,但是由于遭遇到坏块导致启动失败,通过dbv检查system数据文件发现256个坏块256个连续的全0坏块,怀疑是2M的数据被dd全空覆盖,这样的情况,也就是怀疑是au2的后面2M被覆盖(ausize为4M),分析system的数据分布情况这里可以确认system的第24个au(从0开始)在14号盘au 2 上面,也就是数据块起始损坏为block12288-1254324M*4/8K[有block 0 需要考虑])对于这种彻底损坏而且比较靠前的system中block,通过人工构造出来这些block的方式进行修复,在自研的Oracle Recovery Tools和obet工具都有该功能.运气不错,通过这个修复之后,直接expdp导出数据没有大问题,比较完美的恢复了这个故障.