Yes – **run `e2fsck` on your root partition first, then let `emerge` finish**.
The time‑out logs you just saw are *not* an NVMe‑controller fault; they’re a symptom of a
**dirty/partially‑corrupted ext4** that keeps the kernel issuing a large number of writes that it can’t
finish in time.
---
## Why this happens
```
nvme nvme0: I/O tag X … QID 9 timeout, aborting req_op:WRITE …
```
* **QID 9** = queue #9 – the kernel driver is trying to keep 9 + pending writes outstanding.
* The host‑file‑system was just remounted read‑only → the journal was in a dirty state → the kernel had to
run a recovery pass every time you hit a write.
* When you hit **Ctrl‑C** the `emerge` build stops mid‑write; the NVMe controller aborts the pending
requests and the kernel logs a timeout.
---
## What to do next
| Step | What it fixes | How to do it |
|------|---------------|--------------|
| **1. Clean the root filesystem** | Removes orphaned inodes / dirty journal → no more hard‑read stalls. |
```bash
# Boot from a rescue or a second kernel that can unmount /dev/nvme0n1p3
sudo e2fsck -f /dev/nvme0n1p3 # -f forces a check even if the journal looks clean
```
| **2. Check the NVMe error log** | Any controller‑level errors that `smart-log` missed. |
```bash
sudo nvme error-log /dev/nvme0n1 | less```
Look for “1‑s” or “4‑s” errors; a non‑empty log means the controller has seen an issue.
If the log contains entries, that’s a red flag that the drive may be aging or the driver is buggy.
| **3. Double‑check SMART / health** | Confirms that the drive itself is still healthy. |
```bash
sudo smartctl -a /dev/nvme0n1 | grep -E 'Temperature|Error|Power|S.M.A.R.T.'```
You should see **no** “PowerOnHours” > X or “DataUnitRead” errors.
| **4. Verify / upgrade firmware** | Some firmware revisions on 1.5 Gb/s PCIe‑3 NVMe drives can mis‑handle
high queue depths. |
```bash
sudo nvme id-ctrl /dev/nvme0n1 | grep Firmware```
If it’s older than the vendor’s latest release, flash it (the procedure depends on your board – often `nvme
update-fw` or a vendor‑specific tool).
| **5. Reduce the kernel’s outstanding‑I/O** | A quick mitigation while you rebuild is to lower the
driver’s queue depth. |
```bash
# Reduce the max outstanding requests from the kernel
echo 16 | sudo tee /sys/block/nvme0n1/queue/nr_requests
# or set the NVMe “Queue Size” feature to 16
sudo nvme set-feature /dev/nvme0n1 --feature-id 0x2 --feature-data 0x00000010```
(Replace `0x10` with whatever depth you want; `0x01` is the smallest.)
| **6. Re‑run `emerge`** | Start with the safest job count. |
```bash
MAKEOPTS="-j1" emerge --quiet-build -uDN @world```
If it finishes **without** any new timeout messages, then bump:
```bash
MAKEOPTS="-j2" emerge --quiet-build -uDN @world
# after that, try -j3, then -j4```
> **Tip** – never press Ctrl‑C during an `emerge` run.
> Instead, let it finish, or use `emerge --quiet-build` to suppress the long build log if you’re worried
about accidental interrupts.
---
## Why the timeouts were happening
| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| QID 9 timeouts | Too many outstanding writes on a dirty ext4 root | `e2fsck` → clean journal |
| Frequent read‑only → read‑write toggles | Filesystem corruption / bad journal | `e2fsck -f` |
| Controller stuck after a few seconds | Firmware/driver bug on that particular NVMe controller or PCIe
link | Update firmware, downgrade kernel NVMe driver (or use a newer kernel) |
| Timeout spikes on a specific QID | High queue depth (nr_requests) vs. limited controller bandwidth |
Reduce `nr_requests` or the `-j` value |
If you still see *any* timeout logs after cleaning the filesystem and (optionally) lowering the queue
depth, that usually indicates a **hardware‑level issue** (bad firmware, bad PCIe lane, power supply
instability). In that case:
1. **Check the vendor’s errata for that drive** (e.g., “NVMe 1.3 controller timeout on SATA‑to‑PCIe
bridges”).
2. **Test the drive in a different slot** or even a different machine.
3. **Replace the drive** if it’s within the warranty period.
---
### Bottom line
1. **Clean the filesystem** (`e2fsck -f /dev/nvme0n1p3`).
2. **Reboot** – make sure `dmesg` is clean.
3. **Retry emerge** with `-j1` → if it completes, slowly raise to `-j2` → `-j4`.
4. If you still hit timeouts, follow the “check error‑log / SMART / firmware” routine above; those steps
will tell you whether the NVMe itself is at fault.
Let me know how the `e2fsck` runs and what `dmesg` looks like after the reboot. Good luck!