DDR4 routing — the only rules that matter.
TL;DR
- DDR4 uses fly-by for address/command/control with on-DIMM termination — route past each load, not in a star.
- Match what the timing budget cares about: byte lanes to their strobe, intra-pair skew on the differential strobe/clock, and the write-leveling fly-by skew the controller can de-skew.
- Half the “rules” floating around (matching every DQ to ±5 mil board-wide, 3W everywhere) cost layout time without moving the eye. Spend that budget on stackup and reference integrity instead.
Fly-by, not star
DDR3 moved the industry to fly-by topology for the address, command, and control (ACC) group, and DDR4 keeps it. The clock and ACC signals route from the controller past each SDRAM load in sequence, terminated at the far end on the module. This trades a known, monotonic skew (which the controller removes with write-leveling) for clean, reflection-free edges.
The DQ/DQS byte lanes are point-to-point between the controller and each device, so they’re simpler — but they carry the tightest timing.
What you actually have to match
The timing budget is a sum of skews. Match the things that land in the budget:
- Intra-pair (P/N) skew on DQS and CK: keep it tight — target ≤ ~1 ps (≈ 6 mil on FR-4), tighter is always better. This directly degrades the differential crossing and is the cheapest win.
- DQ-to-DQS within a byte lane: match each DQ to its strobe within the controller’s per-bit deskew window — typically ±5 to ±10 mil within the lane. Lanes do not need to match each other tightly.
- Address/command to CK: matched as a group to the clock, within the fly-by guidance the controller can write-level out.
The propagation delay on FR-4 is roughly:
What you can stop sweating
- Board-wide ±5 mil on every DQ. Cross-lane matching beyond the controller’s deskew range buys nothing — it just eats router time and adds serpentine coupling.
- 3W spacing everywhere. 3W between aggressive nets matters; 3W between a DQ and an unrelated static net is wasted area. Spend the spacing where crosstalk actually lands (within a lane, near the strobe).
- Obsessing over total length. DDR4 at 3200 MT/s tolerates generous absolute lengths if the relative matching and reference returns are clean.
Termination & reference
DDR4 uses on-die termination (ODT) plus the fly-by far-end termination on the module — you rarely add discrete termination on a point-to-point DQ. What you do own is the reference plane: every DQ/DQS must run over a continuous reference (usually ground) with no plane splits. A split under a byte lane is the most common DDR4 SI failure we re-spin. Keep return vias near layer transitions.
If you only fix one thing on a marginal DDR4 board, it’s the reference plane under the byte lanes — not the length matching.
The DDR4 checklist
- ☐ Fly-by ACC, point-to-point DQ/DQS
- ☐ Intra-pair skew ≤ ~1 ps (≈ 6 mil) on DQS and CK
- ☐ Each DQ matched to its strobe within the deskew window (±5–10 mil)
- ☐ Continuous reference plane under every byte lane — no splits
- ☐ Return vias within ~40 mil of every signal-layer transition
- ☐ VrefDQ routed quiet, decoupled close to the controller
- ☐ Stackup hits 40 Ω SE / 80 Ω diff (DDR4 targets) before you route a single net
Need it routed?
This is bread-and-butter high-speed digital work for us — DDR4 at 3200 and DDR5 at 6400 MT/s, trained on the first article. Scope a project and we’ll size it in 60 seconds.
References
- JEDEC JESD79-4, DDR4 SDRAM Standard.
- Micron TN-40-07, DDR4 Point-to-Point Design Guide.
- Intel/AMD platform DDR4 layout design guidelines (vendor-specific).
- IPC-2141A, Controlled Impedance Circuit Boards.