GrandeKos.com on page ADATA HardDrive Problem

### The short answer

Yes – **run `e2fsck` on your root partition first, then let `emerge` finish**.

The time‑out logs you just saw are *not* an NVMe‑controller fault; they’re a symptom of a

**dirty/partially‑corrupted ext4** that keeps the kernel issuing a large number of writes that it can’t

finish in time.

---

## Why this happens

```

nvme nvme0: I/O tag X … QID 9 timeout, aborting req_op:WRITE …

```

* **QID 9** = queue #9 – the kernel driver is trying to keep 9 + pending writes outstanding.

* The host‑file‑system was just remounted read‑only → the journal was in a dirty state → the kernel had to

run a recovery pass every time you hit a write.

* When you hit **Ctrl‑C** the `emerge` build stops mid‑write; the NVMe controller aborts the pending

requests and the kernel logs a timeout.

---

## What to do next

| Step | What it fixes | How to do it |

|------|---------------|--------------|

| **1. Clean the root filesystem** | Removes orphaned inodes / dirty journal → no more hard‑read stalls. |

```bash

# Boot from a rescue or a second kernel that can unmount /dev/nvme0n1p3

sudo e2fsck -f /dev/nvme0n1p3 # -f forces a check even if the journal looks clean

```

| **2. Check the NVMe error log** | Any controller‑level errors that `smart-log` missed. |

```bash

sudo nvme error-log /dev/nvme0n1 | less

```

Look for “1‑s” or “4‑s” errors; a non‑empty log means the controller has seen an issue.

If the log contains entries, that’s a red flag that the drive may be aging or the driver is buggy.

| **3. Double‑check SMART / health** | Confirms that the drive itself is still healthy. |

```bash

sudo smartctl -a /dev/nvme0n1 | grep -E 'Temperature|Error|Power|S.M.A.R.T.'

```

You should see **no** “PowerOnHours” > X or “DataUnitRead” errors.

| **4. Verify / upgrade firmware** | Some firmware revisions on 1.5 Gb/s PCIe‑3 NVMe drives can mis‑handle

high queue depths. |

```bash

sudo nvme id-ctrl /dev/nvme0n1 | grep Firmware

```

If it’s older than the vendor’s latest release, flash it (the procedure depends on your board – often `nvme

update-fw` or a vendor‑specific tool).

| **5. Reduce the kernel’s outstanding‑I/O** | A quick mitigation while you rebuild is to lower the

driver’s queue depth. |

```bash

# Reduce the max outstanding requests from the kernel
echo 16 | sudo tee /sys/block/nvme0n1/queue/nr_requests
# or set the NVMe “Queue Size” feature to 16
sudo nvme set-feature /dev/nvme0n1 --feature-id 0x2 --feature-data 0x00000010

```

(Replace `0x10` with whatever depth you want; `0x01` is the smallest.)

| **6. Re‑run `emerge`** | Start with the safest job count. |

```bash

MAKEOPTS="-j1" emerge --quiet-build -uDN @world

```

If it finishes **without** any new timeout messages, then bump:

```bash

MAKEOPTS="-j2" emerge --quiet-build -uDN @world
# after that, try -j3, then -j4

```

> **Tip** – never press Ctrl‑C during an `emerge` run.

> Instead, let it finish, or use `emerge --quiet-build` to suppress the long build log if you’re worried

about accidental interrupts.

---

## Why the timeouts were happening

| Symptom | Likely cause | Fix |

|---------|--------------|-----|

| QID 9 timeouts | Too many outstanding writes on a dirty ext4 root | `e2fsck` → clean journal |

| Frequent read‑only → read‑write toggles | Filesystem corruption / bad journal | `e2fsck -f` |

| Controller stuck after a few seconds | Firmware/driver bug on that particular NVMe controller or PCIe

link | Update firmware, downgrade kernel NVMe driver (or use a newer kernel) |

| Timeout spikes on a specific QID | High queue depth (nr_requests) vs. limited controller bandwidth |

Reduce `nr_requests` or the `-j` value |

If you still see *any* timeout logs after cleaning the filesystem and (optionally) lowering the queue

depth, that usually indicates a **hardware‑level issue** (bad firmware, bad PCIe lane, power supply

instability). In that case:

1. **Check the vendor’s errata for that drive** (e.g., “NVMe 1.3 controller timeout on SATA‑to‑PCIe

bridges”).

2. **Test the drive in a different slot** or even a different machine.

3. **Replace the drive** if it’s within the warranty period.

---

### Bottom line

1. **Clean the filesystem** (`e2fsck -f /dev/nvme0n1p3`).

2. **Reboot** – make sure `dmesg` is clean.

3. **Retry emerge** with `-j1` → if it completes, slowly raise to `-j2` → `-j4`.

4. If you still hit timeouts, follow the “check error‑log / SMART / firmware” routine above; those steps

will tell you whether the NVMe itself is at fault.

Let me know how the `e2fsck` runs and what `dmesg` looks like after the reboot. Good luck!