Blackcairn β Recovery Playbook¶
Use this file when things are not normal. This is the calm, step-by-step path back to stability.
π¨ Symptom: Cannot SSH / Services Unreachable¶
First checks (remote)¶
- Can the public IP be pinged?
- Do any forwarded ports respond?
- If nothing responds β likely host down or boot issue
If physical access is possible, proceed below.
π₯ Symptom: Boot Loop / No SSH / No Services¶
Hardware triage¶
- Power off
- Unplug mains
- Hold power button 10β15 seconds
- Reconnect and power on
If looping persists: - Test with one RAM stick - Reseat GPU and power cables - Perform CMOS reset if needed
π§ Symptom: Dropped to (initramfs)¶
This usually means filesystem inconsistency after a freeze or power loss.
Activate LVM¶
lvm vgscan
lvm vgchange -ay
Identify root filesystem¶
ls /dev/mapper
blkid
Typical root volume:
/dev/mapper/ubuntu--vg-ubuntu--lv
Run filesystem repair¶
fsck -f -y /dev/mapper/ubuntu--vg-ubuntu--lv
Expected (good) output: - βFILE SYSTEM WAS MODIFIEDβ
Reboot¶
reboot
β After Successful Boot¶
Check logs from failed boot¶
sudo journalctl -b -1 -p err
Confirm disk health¶
sudo smartctl -a /dev/nvme0n1
β οΈ Common Non-Issues (Ignore These)¶
SGX disabled by BIOSsystemd-networkd-wait-onlinetimeout (network still works)- Scary-looking
fsckoutput during repair
These are informational, not failures.
π‘ Hardening After Recovery (Recommended)¶
- Smart plug for remote power cycling
- Secondary access path (Tailscale / Cloudflare Tunnel)
- SMART monitoring with alerts
- Watchdog or auto-reboot on kernel lockup
π Incident Notes¶
- Filesystem corruption usually comes from freezes or power loss
- Editors, aliases, and user config do not cause boot failures
- Repairing is preferable to reinstalling