Replacing a failed disk that holds MetaDB Replicas
Incidentally, I discovered that before you can remove non-exitant replicas,
the partition with the non-existant replicas has to have a non-zero length
(which is the likely state with a new disk). The recovery process is:
- Obtain a disk of the same make/model as the dead disk. It is unbelievably important that the new disk be identical to the failed disk. Don't say I didn't warn you.
- Boot with new disk (system fails because it can't find a majority of database replicas and drops to single-user mode)
- Label and format new disk (Hint: Disk VTOC Cloning).
# metadb -d (partition on new disk where ODS thinks replicas are)
(ignore complaint about read-only filesystem)
- Reboot. (System now finds the majority of replicas valid and boots normally; probably complains about submirrors requiring maintenance.)
- foreach mirror which uses the new disk:
metattach (mirror) (submirror on new disk)
metadb -a -c 2 (partition on new disk where ODS used to think replicas are)
Replacing a failed submirror
To get detailed information, use metastat
To replace a subdevice:
metareplace metamirror old-component new-component
For Example:
metareplace d22 c0t1d0s2 c0t6d0s2
… replaces the subcomponent of d22 using c0t1d0s2 with c0t6d0s2
Important: when refering to the metamirror, do not use the sub-mirror
component name; ODS doesn't know about it.
Example: on parsnip, /inst-images has d66 mounted on it. d66 is
actually metamirror d22. d22 has submirrors d21 and d23. If d23 fails,
we run the metareplace using d22 as the metamirror name, NOT d23 -- but
we use the actual device name for d23 (c0t1d0s2 or whatever).
(Confused? So were we. :)