Advanced: project structure and customization

This section is intended for users who want to modify the reference design — adding IP to the block design, changing constraints, adding packages or drivers to the PetaLinux project, and so on. It describes how the repository is laid out, how the build flow works, how the block design assembles the MRMAC subsystem, how the PetaLinux BSP is composed from layered fragments, and what modifications have been added on top of the stock AMD BSP.

The actual build instructions are in build_instructions; this section is about understanding the project well enough to modify it.

Repository layout

.
├── build.py                   <- Cross-platform build runner (the build logic)
├── build.sh / build.bat       <- Shims that invoke build.py (Linux/git bash, Windows)
├── Makefile                   <- Deprecated thin wrapper around build.sh (removed next version)
├── README.md
├── config/                    <- Source-of-truth design metadata and auto-generation
│   ├── data.json
│   └── update.py
├── docs/                      <- This documentation (Sphinx + Read the Docs)
├── PetaLinux/
│   └── bsp/                   <- Board and port-config BSP fragments
│       ├── vck190/            <-   board-specific overlay
│       ├── ports-versal-0/    <-   port-config overlay: port 0 only
│       └── ports-versal-01/   <-   port-config overlay: ports 0 and 1
└── Vivado/
    ├── scripts/
    │   ├── build.tcl          <- Project creation + block design assembly
    │   └── xsa.tcl            <- Synthesis, implementation, XSA export
    └── src/
        ├── bd/
        │   └── bd_versal.tcl  <- Block design for all (Versal) targets
        ├── constraints/
        │   └── <target>.xdc   <- One XDC per target (pin assignments)
        └── hdl/
            └── mrmac_axis_adapter.v  <- MRMAC 100G client ↔ AXI4-Stream adapters

Per-target build outputs are written to Vivado/<target>/ and PetaLinux/<target>/; packaged boot-image zips are written to bootimages/. None of these are committed.

Target naming

A target label is the canonical handle for a single design and is passed to every build command via --target. It encodes the board and the FMC connector:

<board>_<connector>

For this repo the (currently single) target is vck190_fmcp1. The first underscore-delimited token (vck190) is taken as the target board and is what PetaLinux/Makefile uses to select the BSP under PetaLinux/bsp/<board>/.

The complete list of valid targets comes from config/data.json; run ./build.sh list (or ./build.sh labels for one per line) to print it.

`config/data.json` and `config/update.py`

config/data.json is the canonical source of truth for the set of supported designs and their per-target metadata (board name, board URL, line rate, FMC connector, etc.). The build.py runner reads it directly at runtime, so the target list is never hand-maintained. config/update.py reads data.json and regenerates the auto-managed files that are not read at runtime: the target tables in the top-level README.md, the .gitignore, and the residual per-board UPDATER block still embedded in PetaLinux/Makefile — the sections delimited by UPDATER START / UPDATER END (or  / ) comment markers. The Sphinx documentation also reads data.json directly to render the supported-board and target-design tables.

Note

Terminology: the lanes field of each design holds the list of QSFP28 ports the design instantiates (["0"] for port 0 only, ["0","1"] for both ports). Each QSFP28 port is a single 100GbE (CAUI-4) MRMAC that uses four GTY lanes internally. This mirrors how the Quad SFP28 FMC repo uses lanes to mean SFP28 ports, so the same update machinery is reused unchanged.

When adding or modifying a target, edit data.json and re-run update.py (from the config/ directory). Do not hand-edit content between the updater markers; it will be overwritten on the next regeneration. Note that update.py derives the PetaLinux port-config overlay name from the populated ports: lanes=["0"] selects bsp/ports-versal-0/, lanes=["0","1"] selects bsp/ports-versal-01/.

Build runner

All build stages are driven by the cross-platform build.py runner at the root of the repository, invoked through the build.sh shim on Linux / git bash or build.bat on Windows (identical arguments). It reads the target list and per-target attributes straight from config/data.json, builds whatever a requested stage depends on automatically, skips anything already built, and locates and sources the AMD tools itself — so there is no need to source the Vivado / PetaLinux settings scripts beforehand.

The build is organised into stages, each available as a sub-command:

Command	Stage
`project`	Create the Vivado project (`.xpr`) and block design.
`xsa`	Synthesise, implement and export the hardware (`.xsa`).
`petalinux`	Create the PetaLinux project from the XSA, apply the BSP overlays, build and package.
`package`	Gather the built boot artifacts into `bootimages/*.zip`.
`all`	Build every stage the target supports, then `package`.

Run ./build.sh list to see the targets and their attributes, ./build.sh status --target <t> for per-stage artifact state, and ./build.sh --help for the full command list.

Because each stage builds its prerequisites first, a single ./build.sh all --target <t> cascades the whole pipeline:

./build.sh all --target t
  -> xsa         : vivado creates the project + block design (build.tcl),
                   then synth/impl/device-image/XSA export (xsa.tcl)
  -> petalinux   : petalinux-create --template versal -> petalinux-config --get-hw-description <XSA>
                   -> copy bsp/<board>/project-spec/* -> copy bsp/<port-config>/project-spec/* (overlay)
                   -> petalinux-config --silentconfig -> petalinux-build
                   -> petalinux-package boot --plm --psmfw --u-boot --dtb
  -> package     : zip the boot files into bootimages/

Build a single stage on its own with ./build.sh <stage> --target <t>; the runner still builds any missing prerequisite stages first.

Per-target lock files (.<target>.lock at the repository root) prevent two concurrent builds of the same target from clobbering each other — so two terminals can safely both run ./build.sh all --target all.

Tip

./build.sh project --target <t> creates the block design and runs validate_bd_design without synthesis — use it to catch block-design wiring errors fast before committing to the long XSA build.

Vivado side

Block design

There is one block-design TCL, Vivado/src/bd/bd_versal.tcl. It is parameterised by the ports list (passed in from build.tcl’s target_dict), and builds the design as follows:

CIPS + NoC + DDR. The Versal CIPS is added and configured by the xilinx.com:bd_rule:cips automation (DDR branch, one DDR memory controller), then extended with a per-board PS_PMC_CONFIG (M_AXI_LPD enabled, 16 PL→PS interrupts, two PL clocks).
Clocking. A clock wizard generates the 100 MHz system clock (all AXI-Lite control and the MCDMA/NoC datapath), and a second clock wizard generates the 390.625 MHz MRMAC AXIS client clock.
Per-port GT quad. For each port, a gt_quad_base (GTY) is configured for four lanes and given its own reference clock (gt_ref_clk_0 = GBTCLK0 for port 0, gt_ref_clk_1 = GBTCLK1 for port 1) and an APB3 bridge for its DRP.
Per-port MRMAC subsystem. The create_qsfp_port proc (called once per port) builds the MRMAC, its per-lane GT user-clock buffers, the MRMAC client AXIS adapter, the width-converter/CDC-FIFO datapath, the AXI MCDMA, the AXI-Lite control SmartConnect, the GT-control GPIO, the QSFP sideband GPIO, the per-port module-management IIC, and the user LEDs.
Structural scaling. The NoC slave-port count, the control SmartConnect master count, the interrupt list, and the shared Si5328 IIC are sized from the number of ports so the same script builds both single-port and two-port designs.

After sourcing the BD script, build.tcl runs validate_bd_design -force, which triggers parameter propagation and connection automation. To see the netlist as actually built, inspect the saved .bd under Vivado/<target>/<target>.gen/sources_1/bd/qsfp/ or use write_bd_tcl.

build.tcl checks XILINX_VIVADO against the version_required constant (2025.2) and refuses to build with a different Vivado version — the BD TCL APIs are not stable across major releases.

The MRMAC datapath, in detail

These are the design choices that are specific to driving a QSFP28 port as 1x100GbE CAUI-4. If you modify the block design, these are the parts most likely to need care.

GT Quad configuration

The GT quad uses PRESET None and specifies the full PROT0 field set manually: GTY, four lanes at 25.78125 Gb/s, LCPLL integer-N, 322.265625 MHz reference clock, 80-bit RAW datapath. The MRMAC requires an 80-bit RAW GT datapath that no named Ethernet preset provides, which is why PRESET None is used and every field is set explicitly. The field set is merged onto the 2025.2 IP’s default LR0 dictionary, applying only field names that exist in this IP version (so the configuration stays robust if the IP’s field set changes between releases).

Per-lane CAUI-4 user clocking

CAUI-4 bonds four lanes and requires them to align, so each lane’s recovered RX clock must drive that lane’s MRMAC serdes/core clock:

RX: each of the four GT lanes gets its own pair of BUFG_GT buffers — a full-rate usrclk and a half-rate (/2) usrclk2. The MRMAC rx_serdes_clk/rx_core_clk buses take the per-lane full-rate clocks; rx_alt_serdes_clk takes the per-lane half-rate clocks; the GT chN_rxusrclk inputs take the per-lane half-rate clocks.
TX: all four lanes share the TX PLL, so a single ch0 pair drives all four TX lanes (tx_core_clk = ch0 full-rate ×4; tx_alt_serdes_clk and the GT chN_txusrclk inputs = ch0 half-rate).

Warning

Driving the MRMAC RX serdes/core clocks from ch0 alone (broadcasting one lane’s recovered clock to all four) leaves lanes 1–3 sampled in the wrong clock domain — those PCS lanes never block-lock and 100G alignment never completes, even with a passive loopback. The per-lane clocking above is mandatory for CAUI-4. (The AXIS client clocks tx_axi_clk/rx_axi_clk are a separate, single 390.625 MHz domain — do not confuse the two clock buses.)

MRMAC client AXIS adapter

The MRMAC 100G “Independent 384b Non-Segmented” client is not a standard AXI4-Stream bus. In the block design its axis_rx_port0 / axis_tx_port0 interfaces are handshake-only (they map only TVALID/TLAST/TREADY, so IP integrator reports TDATA_NUM_BYTES=0). The 384-bit data actually rides on six separate 64-bit lane ports (rx/tx_axis_tdata0..5) plus six per-lane tkeep_user0..5[10:0] control words. Feeding that handshake-only interface straight into a stock axis_dwidth_converter mis-delineates frames (one packet per 384-bit beat — frames arrive fragmented into ~48-byte pieces).

Vivado/src/hdl/mrmac_axis_adapter.v provides two purely-combinational adapters (mrmac_rx_axis_adapter, mrmac_tx_axis_adapter) that pack/unpack the six 64-bit lanes into a single standard 384-bit AXIS stream (tdata[383:0], tkeep[47:0], tlast, tvalid), so the downstream width-converter / CDC-FIFO / MCDMA chain delineates frames correctly (one TLAST per Ethernet frame). build.tcl adds src/hdl/*.v to the project before sourcing the BD so create_bd_cell -type module -reference can resolve the modules.

Note

This packing is specific to 1x100GbE CAUI-4, which bonds all six client lanes into one frame. A design running independent 10G/25G ports (one 64-bit lane straight into its own MCDMA per port) would not need it.

Width conversion, CDC and MCDMA

The MRMAC client runs at 390.625 MHz / 384-bit; the MCDMA and NoC run at the 100 MHz system clock / 512-bit. Each direction has an axis_dwidth_converter (384 ↔ 512 bit) and an asynchronous axis_data_fifo for the clock-domain crossing. The AXI MCDMA (c_num_mm2s_channels/c_num_s2mm_channels = 1, 512-bit, 64-bit addressing) moves packet data to/from DDR over three NoC AXI master ports (scatter-gather, MM2S, S2MM).

MRMAC placement

Both MRMACs default to MRMAC_LOCATION_C0 = MRMAC_X0Y0, which makes port 1 fail placement (“bel is occupied”). The proc pins each port’s MRMAC to the integrated-MAC site in the clock region of its GT quad:

port 0 : GTY_QUAD_X1Y1 (region X9Y1) -> MRMAC_X0Y0
port 1 : GTY_QUAD_X1Y2 (region X9Y2) -> MRMAC_X0Y2

QSFP module sideband and power-on reset

Each port has a dual-channel AXI GPIO (axi_gpio_qsfp*) for the QSFP module’s sideband signals:

Channel 1 (outputs): bit 0 = ModSelL, bit 1 = ResetL, bit 2 = LPMode.
Channel 2 (inputs): bit 0 = ModPrsL, bit 1 = IntL.

ResetL is active-low — the module is held in reset while the line is 0. The GPIO is given a power-on output default of 0x2 (CONFIG.C_DOUT_DEFAULT on the axi_gpio), so the lines come up ModSelL = 0, ResetL = 1 (deasserted), LPMode = 0 (high power): the module is enabled the instant the device image is loaded, before Linux boots.

Important

Without this default the GPIO powers up at 0x0, so ResetL = 0 and the QSFP module is held in reset — its laser stays off and no link comes up. A passive electrical loopback still works in that state (it needs no powered module), which masks the problem; a real optical module or AOC stays dark until ResetL is deasserted.

The MRMAC / xilinx_axienet driver manages only the GT reset (via the gt-*-gpios); it never touches the optical module’s reset, which is board glue. This mirrors the Quad SFP28 FMC design, where the SFP TX_DISABLE is tied to a constant in fabric. Here the line is left software-controllable through the GPIO at 0x80020000 / 0x80030000, so ResetL / LPMode / ModSelL can still be driven at runtime (e.g. devmem 0x80020000 32 0x2) if a module ever needs to be power-cycled.

Address and interrupt maps

The control peripherals are mapped from M_AXI_LPD:

Peripheral	Port 0	Port 1
MRMAC `s_axi`	`0x80000000`	`0x80010000`
QSFP sideband GPIO	`0x80020000`	`0x80030000`
GT-control GPIO	`0x80070000`	`0x80090000`
AXI MCDMA	`0x80080000`	`0x800A0000`
QSFP module IIC	`0x80050000`	`0x80060000`
Si5328 clock IIC (shared)	`0x80040000`

Interrupts are connected to pl_ps_irq0..6 (SPI = 84 + index):

`pl_ps_irq`	SPI	Source
0	84	Port 0 MCDMA `mm2s`
1	85	Port 0 MCDMA `s2mm`
2	86	Port 0 QSFP module IIC
3	87	Port 1 MCDMA `mm2s`
4	88	Port 1 MCDMA `s2mm`
5	89	Port 1 QSFP module IIC
6	90	Si5328 clock IIC

Constraints

Vivado/src/constraints/<target>.xdc contains the pin assignments. For vck190_fmcp1 it covers both FMC slots: the eight GTY lanes (DP0–3 for port 0, DP4–7 for port 1), the two GT reference clocks (GBTCLK0/GBTCLK1), the three IIC buses (shared Si5328 on LA02, QSFP0 on LA03, QSFP1 on LA17_CC), and the per-slot QSFP module sideband I/O and user LEDs.

Modifying the block design

Edit Vivado/src/bd/bd_versal.tcl. Most per-port logic lives in the create_qsfp_port proc, which is called once per entry in ports; structural counts (NoC slave ports, control SmartConnect masters, interrupts) are derived from the number of ports, so adding or removing a port is largely a matter of changing the lanes list in data.json. After editing, delete the existing project directory and rebuild:

rm -rf Vivado/<target>
./build.sh xsa --target <target>

PetaLinux side

BSP composition

The PetaLinux project is composed at build time from two BSP fragments copied into the target’s project directory:

A board BSP at PetaLinux/bsp/vck190/. Provides the board kernel and U-Boot configuration, system-user.dtsi (which includes port-config.dtsi), the kernel patches, and the rootfs configuration.
A port-config overlay at PetaLinux/bsp/ports-versal-<ports>/. Provides port-config.dtsi — the device-tree fragment that wires up the MRMAC, MCDMA, IIC, GPIO and Si5328 nodes for the ports active on this target. ports-versal-0 is port 0 only; ports-versal-01 is both ports.

The mapping from target to (board BSP, port-config overlay) is encoded in PetaLinux/Makefile’s UPDATER block, for example:

vck190_fmcp1_target := versal 0 0 ports-versal-01

The first column is the PetaLinux template (versal); the last is the port-config overlay name. The board BSP is derived from the first token of the target name (vck190). At build time both project-spec/ trees are copied in, with the port-config overlay copied after the board BSP.

The `port-config.dtsi` overlay

This is the device-tree fragment that makes the MRMAC ports work. Per port it sets, on the SDT-generated mrmac@… node:

axistream-connected → the port’s MCDMA node, plus the MCDMA channel interrupts (mm2s_ch1_introut/s2mm_ch1_introut) and their GIC interrupts. The xilinx_axienet MCDMA probe looks the interrupts up by name on the MRMAC node, so both interrupt-names and the matching interrupt-parent/interrupts must be present here.
local-mac-address, xlnx,channel-ids, xlnx,num-queues, xlnx,addrwidth.
max-speed = <100000> and xlnx,mrmac-rate = <100000>. The driver reads max-speed first; without it, the auto-generated pl.dtsi max-speed = <25000> (the per-lane GT rate) wins and the port comes up at 25G single-lane instead of 100G.
gt-ctrl-gpios, gt-tx-dpath-gpios, gt-rx-dpath-gpios, gt-ctrl-rate-gpios, gt-tx-rst-done-gpios, gt-rx-rst-done-gpios (all on the port’s GT-control AXI GPIO) and xlnx,gtlane = <0>. These let the driver reset the GT and poll reset-done (see below).

It also overrides each MCDMA node’s compatible to "xlnx,eth-dma", and disables the three phantom mrmac_1/_2/_3 nodes the SDT emits per MRMAC. Finally it instantiates the Si5328 (see below).

Modifications layered on the stock BSP

The board BSP started as the stock AMD VCK190 reference BSP. This list is the answer to “what would I lose if I overwrote the BSP with the stock one?”

AXI Ethernet + MCDMA driver. Kernel configs enable the xilinx_axienet driver with MCDMA support (CONFIG_XILINX_AXI_EMAC, CONFIG_AXIENET_HAS_MCDMA, CONFIG_GPIO_XILINX, CONFIG_I2C_XILINX). The MRMAC binds to xilinx_axienet, not to phylink/SFP as the Quad SFP28 FMC design does.
MCDMA compatible override (device tree). xilinx_axienet ioremaps the MCDMA registers itself, but the standalone xilinx_dma dmaengine driver also matches the MCDMA node and claims the region first, so axienet’s probe fails with -EBUSY. Because CONFIG_XILINX_AXI_EMAC depends on XILINX_DMA, the dmaengine driver cannot simply be disabled. The fix is the compatible = "xlnx,eth-dma" override in port-config.dtsi: xilinx_dma’s of-match table does not bind eth-dma, and axienet finds the MCDMA via the axistream-connected phandle (not by compatible), so the region is left for axienet.
GT-control GPIO binding. The xilinx_axienet MRMAC driver needs to reset the GT and read reset-done/PLL status from the PS. The block design exposes a dual-channel AXI GPIO per port (axi_gpio_gt*): five outputs (gt_reset_all, gt_reset_tx_datapath, gt_reset_rx_datapath, plus two spare gt-ctrl-rate lines) and two inputs (gt_tx/rx_reset_done). The gt-*-gpios properties in port-config.dtsi point the driver at these GPIO lines. Without this, the driver fails with unable to get GT PLL resource.
Si5328 clock generator (device tree). port-config.dtsi instantiates a silabs,si5328 clock-generator@68 node on the shared clock IIC bus, with a fixed-clock 114.285 MHz crystal input and a clk0@0 output programmed to 322.265625 MHz (the 100G GT reference clock). The Linux clk-si5324 driver programs the device from this node on probe.
Si5328 CKOUT2 kernel patch. For the two-port design, port 1’s GT reference clock is GBTCLK1, which the FMC routes from the Si5328’s CKOUT2 output. The stock Xilinx clk-si5324 driver hard-disables CKOUT2 and only programs CKOUT1’s divider, so port 1’s GT never gets a reference clock and fails with GT TX Reset Done not achieved. The kernel patch recipes-kernel/linux/linux-xlnx/0001-clk-si5324-enable-ckout2-for-2x-qsfp28-fmc.patch enables CKOUT2 (sets the dual-LVDS output format and clears the CKOUT2 disable bit) and mirrors CKOUT1’s divider to CKOUT2 (both outputs share the same PLL, so they run at the same frequency). It is registered via SRC_URI:append in recipes-kernel/linux/linux-xlnx_%.bbappend.
MRMAC link carrier-monitor kernel patch. The MRMAC has no PHY/phylink and gives no link-change interrupt, and the stock xilinx_axienet driver checks RX block lock only once in axienet_open() (a 1 ms poll), failing the open with -ENODEV if the link is not already up. On a cold bring-up the GT/PCS take longer than that to lock, so the open loses the race and the port needs a manual ip link bounce; a link partner that powers on later never brings the port up; and the driver never sets the netdev carrier (ip link shows state UNKNOWN). The patch recipes-kernel/linux/linux-xlnx/0002-net-axienet-mrmac-carrier-link-monitor.patch adds a delayed-work monitor that drives the netdev carrier from RX block-lock + status. open() now brings the interface up with carrier off and starts the monitor instead of failing. While the link is down the monitor re-issues the MRMAC core/serdes reset (axienet_mrmac_reset) each cycle to re-attempt lock — the Versal GTY RX does not re-acquire block lock on a partner signal that stabilises after its last reset, and the GT reset is one-shot, so the MRMAC reset is the part that re-aligns the lanes — and once up it polls for loss. The net effect is that a port comes up automatically at boot and recovers on cable re-seat or partner power-on, with link state reflected in the netdev carrier (MRMAC link up at 100000 / MRMAC link down). It is registered via SRC_URI:append in recipes-kernel/linux/linux-xlnx_%.bbappend.
Loopback self-test app. The mrmac-loopback-test recipe (recipes-apps/mrmac-loopback-test/) installs the self-test script described in petalinux; it is force-installed via IMAGE_INSTALL:append in meta-user/conf/petalinuxbsp.conf.
Root filesystem additions. ethtool and iperf3.

Adding a kernel config option, patch, package or device-tree node

The mechanisms are the standard PetaLinux ones:

Kernel config: append CONFIG_<name>=y to bsp/vck190/project-spec/meta-user/recipes-kernel/linux/linux-xlnx/bsp.cfg.
Kernel patch: drop the .patch into recipes-kernel/linux/linux-xlnx/ and add a SRC_URI:append line to linux-xlnx_%.bbappend. Force a re-patch with petalinux-build -c kernel -x cleansstate && petalinux-build.
Rootfs package: add CONFIG_<package>=y to configs/rootfs_config (and declare it in meta-user/conf/user-rootfsconfig if it is not in the default menu).
Per-board device tree: edit meta-user/recipes-bsp/device-tree/files/system-user.dtsi.
Per-port device tree: edit the bsp/ports-versal-<ports>/…/port-config.dtsi overlay.

Tip

After a structurally-changed XSA (new peripherals/addresses), a petalinux-config --get-hw-description on an existing project keeps the stale SDT (“workspace already set up, leaving as-is”). Remove <target>/components/plnx_workspace before re-importing to force a fresh SDT, then rebuild (this reuses the sstate cache, so it is incremental).

Where build outputs land

Path	Contents
`Vivado/<target>/`	Vivado project. `qsfp_wrapper.xsa` is the export.
`Vivado/<target>/<target>.runs/impl_1/qsfp_wrapper.bit`	Device image / bitstream.
`Vivado/logs/`	Per-target Vivado build logs (xpr + xsa).
`PetaLinux/<target>/`	PetaLinux project. All Yocto build state lives here.
`PetaLinux/<target>/images/linux/`	`BOOT.BIN`, `image.ub`, `boot.scr`, `rootfs.tar.gz`, etc.
`bootimages/`	Per-target zipped boot files (`<prj>_<target>_petalinux-<ver>.zip`).

None of these directories are committed to the repository.