Advanced: project structure and customization
This section is intended for users who want to modify the reference design — adding IP to the block design, changing constraints, adding packages or drivers to the PetaLinux project, and so on. It describes how the repository is laid out, how the Make-driven build flow works, how the block design assembles the MRMAC subsystem, how the PetaLinux BSP is composed from layered fragments, and what modifications have been added on top of the stock AMD BSP.
The actual build instructions are in build_instructions; this section is about understanding the project well enough to modify it.
Repository layout
.
├── Makefile <- Top-level build entry point
├── README.md
├── config/ <- Source-of-truth design metadata and auto-generation
│ ├── data.json
│ └── update.py
├── docs/ <- This documentation (Sphinx + Read the Docs)
├── PetaLinux/
│ ├── Makefile <- PetaLinux build orchestration
│ └── bsp/ <- Board and port-config BSP fragments
│ ├── vck190/ <- board-specific overlay
│ ├── ports-versal-0/ <- port-config overlay: port 0 only
│ └── ports-versal-01/ <- port-config overlay: ports 0 and 1
└── Vivado/
├── Makefile <- Vivado build orchestration
├── build-vivado.bat <- Windows project-creation helper
├── scripts/
│ ├── build.tcl <- Project creation + block design assembly
│ └── xsa.tcl <- Synthesis, implementation, XSA export
└── src/
├── bd/
│ └── bd_versal.tcl <- Block design for all (Versal) targets
├── constraints/
│ └── <target>.xdc <- One XDC per target (pin assignments)
└── hdl/
└── mrmac_axis_adapter.v <- MRMAC 100G client ↔ AXI4-Stream adapters
Per-target build outputs are written to Vivado/<target>/ and
PetaLinux/<target>/; packaged boot-image zips are written to
bootimages/. None of these are committed.
Target naming
A TARGET is the canonical handle for a single design and is the only
parameter passed through the build flow. It encodes the board and the
FMC connector:
<board>_<connector>
For this repo the (currently single) target is vck190_fmcp1. The first
underscore-delimited token (vck190) is taken as the target board and
is what PetaLinux/Makefile uses to select the BSP under
PetaLinux/bsp/<board>/.
The complete list of valid targets is in the UPDATER START block of
each Makefile and is generated from config/data.json (see below).
config/data.json and config/update.py
config/data.json is the canonical source of truth for the set of
supported designs and their per-target metadata (board name, board URL,
line rate, FMC connector, etc.). config/update.py reads data.json
and regenerates the auto-managed sections of the Makefiles, the Vivado
build.tcl target dictionary, the top-level README.md, and
.gitignore — the sections delimited by UPDATER START / UPDATER END
(or <!-- updater start --> / <!-- updater end -->) comment markers.
The Sphinx documentation also reads data.json directly to render the
supported-board and target-design tables.
Note
Terminology: the lanes field of each design holds the list of QSFP28
ports the design instantiates (["0"] for port 0 only, ["0","1"]
for both ports). Each QSFP28 port is a single 100GbE (CAUI-4) MRMAC that
uses four GTY lanes internally. This mirrors how the Quad SFP28 FMC repo
uses lanes to mean SFP28 ports, so the same update machinery is reused
unchanged.
When adding or modifying a target, edit data.json and re-run
update.py (from the config/ directory). Do not hand-edit content
between the updater markers; it will be overwritten on the next
regeneration. Note that update.py derives the PetaLinux port-config
overlay name from the populated ports: lanes=["0"] selects
bsp/ports-versal-0/, lanes=["0","1"] selects bsp/ports-versal-01/.
Make-driven build flow
There are three Makefiles in the repository, each scoped to a stage of the build:
Makefile |
Scope |
|---|---|
|
Top-level orchestration; assembles boot-image zips for one or all targets. |
|
Creates the Vivado project, runs synthesis and implementation, exports the XSA. |
|
Creates the PetaLinux project from the XSA, applies BSP overlays, builds, packages. |
A make bootimage TARGET=<t> invocation at the top level cascades:
make bootimage TARGET=t
-> ensures PetaLinux build output exists
PetaLinux/Makefile petalinux TARGET=t
-> ensures Vivado XSA exists
Vivado/Makefile xsa TARGET=t
-> vivado -mode batch -source scripts/build.tcl (creates project + block design)
-> vivado -mode batch -source scripts/xsa.tcl (synth, impl, device image, XSA export)
-> petalinux-create --template versal --name t
-> petalinux-config --get-hw-description <XSA>
-> copy bsp/<board>/project-spec/* into the project
-> copy bsp/<port-config>/project-spec/* into the project (overlay)
-> petalinux-config --silentconfig
-> petalinux-build
-> petalinux-package boot --plm --psmfw --u-boot --dtb
-> zip the resulting boot files into bootimages/
The dependency chain means a clean make bootimage TARGET=t from
scratch will perform every step in order. Re-running after an
intermediate step has succeeded picks up where the previous run left
off. Per-target lock files (.<target>.lock) prevent two concurrent
builds of the same target from clobbering each other.
Tip
make project TARGET=<t> (in Vivado/) creates the block design and
runs validate_bd_design without synthesis — use it to catch
block-design wiring errors fast before committing to the long XSA build.
Vivado side
Block design
There is one block-design TCL, Vivado/src/bd/bd_versal.tcl. It is
parameterised by the ports list (passed in from build.tcl’s
target_dict), and builds the design as follows:
CIPS + NoC + DDR. The Versal CIPS is added and configured by the
xilinx.com:bd_rule:cipsautomation (DDR branch, one DDR memory controller), then extended with a per-boardPS_PMC_CONFIG(M_AXI_LPD enabled, 16 PL→PS interrupts, two PL clocks).Clocking. A clock wizard generates the 100 MHz system clock (all AXI-Lite control and the MCDMA/NoC datapath), and a second clock wizard generates the 390.625 MHz MRMAC AXIS client clock.
Per-port GT quad. For each port, a
gt_quad_base(GTY) is configured for four lanes and given its own reference clock (gt_ref_clk_0= GBTCLK0 for port 0,gt_ref_clk_1= GBTCLK1 for port 1) and an APB3 bridge for its DRP.Per-port MRMAC subsystem. The
create_qsfp_portproc (called once per port) builds the MRMAC, its per-lane GT user-clock buffers, the MRMAC client AXIS adapter, the width-converter/CDC-FIFO datapath, the AXI MCDMA, the AXI-Lite control SmartConnect, the GT-control GPIO, the QSFP sideband GPIO, the per-port module-management IIC, and the user LEDs.Structural scaling. The NoC slave-port count, the control SmartConnect master count, the interrupt list, and the shared Si5328 IIC are sized from the number of ports so the same script builds both single-port and two-port designs.
After sourcing the BD script, build.tcl runs validate_bd_design -force, which triggers parameter propagation and connection automation.
To see the netlist as actually built, inspect the saved .bd under
Vivado/<target>/<target>.gen/sources_1/bd/qsfp/ or use write_bd_tcl.
build.tcl checks XILINX_VIVADO against the version_required constant
(2025.2) and refuses to build with a different Vivado version — the BD
TCL APIs are not stable across major releases.
The MRMAC datapath, in detail
These are the design choices that are specific to driving a QSFP28 port as 1x100GbE CAUI-4. If you modify the block design, these are the parts most likely to need care.
GT Quad configuration
The GT quad uses PRESET None and specifies the full PROT0 field set
manually: GTY, four lanes at 25.78125 Gb/s, LCPLL integer-N,
322.265625 MHz reference clock, 80-bit RAW datapath. The MRMAC requires
an 80-bit RAW GT datapath that no named Ethernet preset provides, which
is why PRESET None is used and every field is set explicitly. The
field set is merged onto the 2025.2 IP’s default LR0 dictionary, applying
only field names that exist in this IP version (so the configuration
stays robust if the IP’s field set changes between releases).
Per-lane CAUI-4 user clocking
CAUI-4 bonds four lanes and requires them to align, so each lane’s recovered RX clock must drive that lane’s MRMAC serdes/core clock:
RX: each of the four GT lanes gets its own pair of
BUFG_GTbuffers — a full-rateusrclkand a half-rate (/2)usrclk2. The MRMACrx_serdes_clk/rx_core_clkbuses take the per-lane full-rate clocks;rx_alt_serdes_clktakes the per-lane half-rate clocks; the GTchN_rxusrclkinputs take the per-lane half-rate clocks.TX: all four lanes share the TX PLL, so a single
ch0pair drives all four TX lanes (tx_core_clk= ch0 full-rate ×4;tx_alt_serdes_clkand the GTchN_txusrclkinputs = ch0 half-rate).
Warning
Driving the MRMAC RX serdes/core clocks from ch0 alone (broadcasting
one lane’s recovered clock to all four) leaves lanes 1–3 sampled in the
wrong clock domain — those PCS lanes never block-lock and 100G alignment
never completes, even with a passive loopback. The per-lane clocking
above is mandatory for CAUI-4. (The AXIS client clocks
tx_axi_clk/rx_axi_clk are a separate, single 390.625 MHz domain — do
not confuse the two clock buses.)
MRMAC client AXIS adapter
The MRMAC 100G “Independent 384b Non-Segmented” client is not a
standard AXI4-Stream bus. In the block design its axis_rx_port0 /
axis_tx_port0 interfaces are handshake-only (they map only
TVALID/TLAST/TREADY, so IP integrator reports TDATA_NUM_BYTES=0). The
384-bit data actually rides on six separate 64-bit lane ports
(rx/tx_axis_tdata0..5) plus six per-lane tkeep_user0..5[10:0]
control words. Feeding that handshake-only interface straight into a
stock axis_dwidth_converter mis-delineates frames (one packet per
384-bit beat — frames arrive fragmented into ~48-byte pieces).
Vivado/src/hdl/mrmac_axis_adapter.v provides two purely-combinational
adapters (mrmac_rx_axis_adapter, mrmac_tx_axis_adapter) that
pack/unpack the six 64-bit lanes into a single standard 384-bit AXIS
stream (tdata[383:0], tkeep[47:0], tlast, tvalid), so the
downstream width-converter / CDC-FIFO / MCDMA chain delineates frames
correctly (one TLAST per Ethernet frame). build.tcl adds
src/hdl/*.v to the project before sourcing the BD so
create_bd_cell -type module -reference can resolve the modules.
Note
This packing is specific to 1x100GbE CAUI-4, which bonds all six client lanes into one frame. A design running independent 10G/25G ports (one 64-bit lane straight into its own MCDMA per port) would not need it.
Width conversion, CDC and MCDMA
The MRMAC client runs at 390.625 MHz / 384-bit; the MCDMA and NoC run at
the 100 MHz system clock / 512-bit. Each direction has an
axis_dwidth_converter (384 ↔ 512 bit) and an asynchronous
axis_data_fifo for the clock-domain crossing. The AXI MCDMA
(c_num_mm2s_channels/c_num_s2mm_channels = 1, 512-bit, 64-bit
addressing) moves packet data to/from DDR over three NoC AXI master
ports (scatter-gather, MM2S, S2MM).
MRMAC placement
Both MRMACs default to MRMAC_LOCATION_C0 = MRMAC_X0Y0, which makes
port 1 fail placement (“bel is occupied”). The proc pins each port’s
MRMAC to the integrated-MAC site in the clock region of its GT quad:
port 0 : GTY_QUAD_X1Y1 (region X9Y1) -> MRMAC_X0Y0
port 1 : GTY_QUAD_X1Y2 (region X9Y2) -> MRMAC_X0Y2
Address and interrupt maps
The control peripherals are mapped from M_AXI_LPD:
Peripheral |
Port 0 |
Port 1 |
|---|---|---|
MRMAC |
|
|
QSFP sideband GPIO |
|
|
GT-control GPIO |
|
|
AXI MCDMA |
|
|
QSFP module IIC |
|
|
Si5328 clock IIC (shared) |
|
Interrupts are connected to pl_ps_irq0..6 (SPI = 84 + index):
|
SPI |
Source |
|---|---|---|
0 |
84 |
Port 0 MCDMA |
1 |
85 |
Port 0 MCDMA |
2 |
86 |
Port 0 QSFP module IIC |
3 |
87 |
Port 1 MCDMA |
4 |
88 |
Port 1 MCDMA |
5 |
89 |
Port 1 QSFP module IIC |
6 |
90 |
Si5328 clock IIC |
Constraints
Vivado/src/constraints/<target>.xdc contains the pin assignments. For
vck190_fmcp1 it covers both FMC slots: the eight GTY lanes (DP0–3 for
port 0, DP4–7 for port 1), the two GT reference clocks (GBTCLK0/GBTCLK1),
the three IIC buses (shared Si5328 on LA02, QSFP0 on LA03, QSFP1 on
LA17_CC), and the per-slot QSFP module sideband I/O and user LEDs.
Modifying the block design
Edit Vivado/src/bd/bd_versal.tcl. Most per-port logic lives in the
create_qsfp_port proc, which is called once per entry in ports;
structural counts (NoC slave ports, control SmartConnect masters,
interrupts) are derived from the number of ports, so adding or removing a
port is largely a matter of changing the lanes list in data.json.
After editing, delete the existing project directory and rebuild:
rm -rf Vivado/<target>
cd Vivado
make xsa TARGET=<target>
PetaLinux side
BSP composition
The PetaLinux project is composed at build time from two BSP fragments copied into the target’s project directory:
A board BSP at
PetaLinux/bsp/vck190/. Provides the board kernel and U-Boot configuration,system-user.dtsi(which includesport-config.dtsi), the kernel patches, and the rootfs configuration.A port-config overlay at
PetaLinux/bsp/ports-versal-<ports>/. Providesport-config.dtsi— the device-tree fragment that wires up the MRMAC, MCDMA, IIC, GPIO and Si5328 nodes for the ports active on this target.ports-versal-0is port 0 only;ports-versal-01is both ports.
The mapping from target to (board BSP, port-config overlay) is encoded
in PetaLinux/Makefile’s UPDATER block, for example:
vck190_fmcp1_target := versal 0 0 ports-versal-01
The first column is the PetaLinux template (versal); the last is the
port-config overlay name. The board BSP is derived from the first token
of the target name (vck190). At build time both project-spec/ trees
are copied in, with the port-config overlay copied after the board BSP.
The port-config.dtsi overlay
This is the device-tree fragment that makes the MRMAC ports work. Per
port it sets, on the SDT-generated mrmac@… node:
axistream-connected→ the port’s MCDMA node, plus the MCDMA channel interrupts (mm2s_ch1_introut/s2mm_ch1_introut) and their GICinterrupts. Thexilinx_axienetMCDMA probe looks the interrupts up by name on the MRMAC node, so bothinterrupt-namesand the matchinginterrupt-parent/interruptsmust be present here.local-mac-address,xlnx,channel-ids,xlnx,num-queues,xlnx,addrwidth.max-speed = <100000>andxlnx,mrmac-rate = <100000>. The driver readsmax-speedfirst; without it, the auto-generatedpl.dtsimax-speed = <25000>(the per-lane GT rate) wins and the port comes up at 25G single-lane instead of 100G.gt-ctrl-gpios,gt-tx-dpath-gpios,gt-rx-dpath-gpios,gt-ctrl-rate-gpios,gt-tx-rst-done-gpios,gt-rx-rst-done-gpios(all on the port’s GT-control AXI GPIO) andxlnx,gtlane = <0>. These let the driver reset the GT and poll reset-done (see below).
It also overrides each MCDMA node’s compatible to "xlnx,eth-dma", and
disables the three phantom mrmac_1/_2/_3 nodes the SDT emits per MRMAC.
Finally it instantiates the Si5328 (see below).
Modifications layered on the stock BSP
The board BSP started as the stock AMD VCK190 reference BSP. This list is the answer to “what would I lose if I overwrote the BSP with the stock one?”
AXI Ethernet + MCDMA driver. Kernel configs enable the
xilinx_axienetdriver with MCDMA support (CONFIG_XILINX_AXI_EMAC,CONFIG_AXIENET_HAS_MCDMA,CONFIG_GPIO_XILINX,CONFIG_I2C_XILINX). The MRMAC binds toxilinx_axienet, not to phylink/SFP as the Quad SFP28 FMC design does.MCDMA
compatibleoverride (device tree).xilinx_axienetioremaps the MCDMA registers itself, but the standalonexilinx_dmadmaengine driver also matches the MCDMA node and claims the region first, so axienet’s probe fails with-EBUSY. BecauseCONFIG_XILINX_AXI_EMACdepends onXILINX_DMA, the dmaengine driver cannot simply be disabled. The fix is thecompatible = "xlnx,eth-dma"override inport-config.dtsi:xilinx_dma’s of-match table does not bindeth-dma, and axienet finds the MCDMA via theaxistream-connectedphandle (not by compatible), so the region is left for axienet.GT-control GPIO binding. The
xilinx_axienetMRMAC driver needs to reset the GT and read reset-done/PLL status from the PS. The block design exposes a dual-channel AXI GPIO per port (axi_gpio_gt*): five outputs (gt_reset_all, gt_reset_tx_datapath, gt_reset_rx_datapath, plus two spare gt-ctrl-rate lines) and two inputs (gt_tx/rx_reset_done). Thegt-*-gpiosproperties inport-config.dtsipoint the driver at these GPIO lines. Without this, the driver fails withunable to get GT PLL resource.Si5328 clock generator (device tree).
port-config.dtsiinstantiates asilabs,si5328clock-generator@68node on the shared clock IIC bus, with a fixed-clock 114.285 MHz crystal input and aclk0@0output programmed to 322.265625 MHz (the 100G GT reference clock). The Linuxclk-si5324driver programs the device from this node on probe.Si5328 CKOUT2 kernel patch. For the two-port design, port 1’s GT reference clock is GBTCLK1, which the FMC routes from the Si5328’s CKOUT2 output. The stock Xilinx
clk-si5324driver hard-disables CKOUT2 and only programs CKOUT1’s divider, so port 1’s GT never gets a reference clock and fails withGT TX Reset Done not achieved. The kernel patchrecipes-kernel/linux/linux-xlnx/0001-clk-si5324-enable-ckout2-for-2x-qsfp28-fmc.patchenables CKOUT2 (sets the dual-LVDS output format and clears the CKOUT2 disable bit) and mirrors CKOUT1’s divider to CKOUT2 (both outputs share the same PLL, so they run at the same frequency). It is registered viaSRC_URI:appendinrecipes-kernel/linux/linux-xlnx_%.bbappend.Loopback self-test app. The
mrmac-loopback-testrecipe (recipes-apps/mrmac-loopback-test/) installs the self-test script described in petalinux; it is force-installed viaIMAGE_INSTALL:appendinmeta-user/conf/petalinuxbsp.conf.Root filesystem additions.
ethtoolandiperf3.
Adding a kernel config option, patch, package or device-tree node
The mechanisms are the standard PetaLinux ones:
Kernel config: append
CONFIG_<name>=ytobsp/vck190/project-spec/meta-user/recipes-kernel/linux/linux-xlnx/bsp.cfg.Kernel patch: drop the
.patchintorecipes-kernel/linux/linux-xlnx/and add aSRC_URI:appendline tolinux-xlnx_%.bbappend. Force a re-patch withpetalinux-build -c kernel -x cleansstate && petalinux-build.Rootfs package: add
CONFIG_<package>=ytoconfigs/rootfs_config(and declare it inmeta-user/conf/user-rootfsconfigif it is not in the default menu).Per-board device tree: edit
meta-user/recipes-bsp/device-tree/files/system-user.dtsi.Per-port device tree: edit the
bsp/ports-versal-<ports>/…/port-config.dtsioverlay.
Tip
After a structurally-changed XSA (new peripherals/addresses), a
petalinux-config --get-hw-description on an existing project keeps the
stale SDT (“workspace already set up, leaving as-is”). Remove
<target>/components/plnx_workspace before re-importing to force a fresh
SDT, then rebuild (this reuses the sstate cache, so it is incremental).
Where build outputs land
Path |
Contents |
|---|---|
|
Vivado project. |
|
Device image / bitstream. |
|
Per-target Vivado build logs (xpr + xsa). |
|
PetaLinux project. All Yocto build state lives here. |
|
|
|
Per-target zipped boot files ( |
None of these directories are committed to the repository.