Skip to content

Blackcairn – Recovery Playbook

Use this file when things are not normal. This is the calm, step-by-step path back to stability.


🚨 Symptom: Cannot SSH / Services Unreachable

First checks (remote)

  • Can the public IP be pinged?
  • Do any forwarded ports respond?
  • If nothing responds β†’ likely host down or boot issue

If physical access is possible, proceed below.


πŸ–₯ Symptom: Boot Loop / No SSH / No Services

Hardware triage

  1. Power off
  2. Unplug mains
  3. Hold power button 10–15 seconds
  4. Reconnect and power on

If looping persists: - Test with one RAM stick - Reseat GPU and power cables - Perform CMOS reset if needed


🧊 Symptom: Dropped to (initramfs)

This usually means filesystem inconsistency after a freeze or power loss.

Activate LVM

lvm vgscan
lvm vgchange -ay

Identify root filesystem

ls /dev/mapper
blkid

Typical root volume:

/dev/mapper/ubuntu--vg-ubuntu--lv

Run filesystem repair

fsck -f -y /dev/mapper/ubuntu--vg-ubuntu--lv

Expected (good) output: - β€œFILE SYSTEM WAS MODIFIED”

Reboot

reboot

βœ… After Successful Boot

Check logs from failed boot

sudo journalctl -b -1 -p err

Confirm disk health

sudo smartctl -a /dev/nvme0n1

⚠️ Common Non-Issues (Ignore These)

  • SGX disabled by BIOS
  • systemd-networkd-wait-online timeout (network still works)
  • Scary-looking fsck output during repair

These are informational, not failures.


  • Smart plug for remote power cycling
  • Secondary access path (Tailscale / Cloudflare Tunnel)
  • SMART monitoring with alerts
  • Watchdog or auto-reboot on kernel lockup

πŸ“ Incident Notes

  • Filesystem corruption usually comes from freezes or power loss
  • Editors, aliases, and user config do not cause boot failures
  • Repairing is preferable to reinstalling