fix: preserve binary file caps through Kaniko multi-stage COPY #55

Merged
gofix merged 1 commit from fix/preserve-file-capabilities into master 2026-05-03 15:48:32 +00:00
Owner

Closes #54.

Problem

Kaniko strips the security.capability xattr when it executes COPY --chown in a multi-stage build, so every image the new daemonless workflow pushed has been missing cap_net_admin and cap_net_bind_service on the relay binary. The container then crash-loops at startup with:

initialize wireguard overlay runtime
configure kernel wireguard interface wg-portal on udp port 51820
Operation not permitted (os error 1)

This only surfaced now because the previous deploys to rly.best loaded a locally-built image (BuildKit, which preserves xattrs through COPY --chown) rather than the registry one. The first attempt to deploy the registry-pushed v2.1.8+rs.1-arm64 image hit the crash loop and was rolled back.

Fix

Move setcap out of the build stage and into a new capstamp intermediate stage that only depends on libcap2-bin. The distroless final stage then pulls the cap-stamped binary in via a plain COPY (no --chown):

FROM --platform=$BUILDPLATFORM debian:bookworm-slim AS capstamp
RUN apt-get update \
    && apt-get install -y --no-install-recommends libcap2-bin \
    && rm -rf /var/lib/apt/lists/*
COPY --from=build /usr/local/bin/portal-relay /portal-relay
RUN setcap cap_net_admin,cap_net_bind_service=+ep /portal-relay

FROM --platform=$TARGETPLATFORM gcr.io/distroless/cc-debian12:nonroot
COPY --from=build --chown=65532:65532 /portal-certs /portal-certs
COPY --from=capstamp /portal-relay /usr/local/bin/portal-relay

Key choices:

  • The cap-stamped COPY into distroless drops --chown=65532:65532. Linux file caps are uid-independent and distroless nonroot runs the binary as 65532 regardless of file owner. Removing --chown is what sidesteps the Kaniko xattr strip.
  • capstamp runs on BUILDPLATFORM. setcap only writes a filesystem xattr; it never executes the target-arch binary, so the stage does not need QEMU.
  • libcap2-bin is removed from the build stage, since it no longer runs setcap.

Validation

  • cargo fmt --check clean.

  • cargo test --locked -p portal-relay 86 passed, 3 ignored.

  • cargo clippy --locked --all-targets -- -D warnings clean.

  • Local BuildKit build of an equivalent multi-stage Dockerfile, then extracting /usr/local/bin/portal-relay from the final image with sudo docker cp (to preserve xattrs on the host filesystem):

    /tmp/cap-verify-bin-asroot cap_net_bind_service,cap_net_admin=ep
    

    i.e. caps survive the multi-stage plain COPY into distroless.

This PR does not address #53 (amd64 build still failing in CI). After this PR is merged we will deploy the cap-fixed arm64 image to rly.best as a stopgap, then continue with #53 to unblock the proper multi-arch v2.1.8+rs.1 release.

Closes #54. ## Problem Kaniko strips the `security.capability` xattr when it executes `COPY --chown` in a multi-stage build, so every image the new daemonless workflow pushed has been missing `cap_net_admin` and `cap_net_bind_service` on the relay binary. The container then crash-loops at startup with: ``` initialize wireguard overlay runtime configure kernel wireguard interface wg-portal on udp port 51820 Operation not permitted (os error 1) ``` This only surfaced now because the previous deploys to rly.best loaded a locally-built image (BuildKit, which preserves xattrs through `COPY --chown`) rather than the registry one. The first attempt to deploy the registry-pushed `v2.1.8+rs.1-arm64` image hit the crash loop and was rolled back. ## Fix Move `setcap` out of the build stage and into a new `capstamp` intermediate stage that only depends on `libcap2-bin`. The distroless final stage then pulls the cap-stamped binary in via a plain `COPY` (no `--chown`): ```dockerfile FROM --platform=$BUILDPLATFORM debian:bookworm-slim AS capstamp RUN apt-get update \ && apt-get install -y --no-install-recommends libcap2-bin \ && rm -rf /var/lib/apt/lists/* COPY --from=build /usr/local/bin/portal-relay /portal-relay RUN setcap cap_net_admin,cap_net_bind_service=+ep /portal-relay FROM --platform=$TARGETPLATFORM gcr.io/distroless/cc-debian12:nonroot COPY --from=build --chown=65532:65532 /portal-certs /portal-certs COPY --from=capstamp /portal-relay /usr/local/bin/portal-relay ``` Key choices: - The cap-stamped COPY into distroless drops `--chown=65532:65532`. Linux file caps are uid-independent and distroless `nonroot` runs the binary as 65532 regardless of file owner. Removing `--chown` is what sidesteps the Kaniko xattr strip. - `capstamp` runs on `BUILDPLATFORM`. `setcap` only writes a filesystem xattr; it never executes the target-arch binary, so the stage does not need QEMU. - `libcap2-bin` is removed from the build stage, since it no longer runs setcap. ## Validation - `cargo fmt --check` clean. - `cargo test --locked -p portal-relay` 86 passed, 3 ignored. - `cargo clippy --locked --all-targets -- -D warnings` clean. - Local BuildKit build of an equivalent multi-stage Dockerfile, then extracting `/usr/local/bin/portal-relay` from the final image with `sudo docker cp` (to preserve xattrs on the host filesystem): ``` /tmp/cap-verify-bin-asroot cap_net_bind_service,cap_net_admin=ep ``` i.e. caps survive the multi-stage plain COPY into distroless. This PR does not address #53 (amd64 build still failing in CI). After this PR is merged we will deploy the cap-fixed arm64 image to rly.best as a stopgap, then continue with #53 to unblock the proper multi-arch v2.1.8+rs.1 release.
fix: preserve binary file caps through Kaniko multi-stage COPY
All checks were successful
Rust CI / Format, lint, and test (pull_request) Successful in 1m16s
0174584a0f
Kaniko strips the `security.capability` xattr when applying `COPY --chown`
in a multi-stage build, so the published distroless image was shipping the
relay binary without `cap_net_admin` or `cap_net_bind_service`. The
container then crash-loops at startup when running as the non-root user
because configuring the kernel WireGuard overlay returns
`Operation not permitted (os error 1)`.

Move the `setcap` call into a separate `capstamp` intermediate stage that
has `libcap2-bin`, then bring the cap-stamped binary into the distroless
final stage with a plain `COPY` (no `--chown`). The plain COPY preserves
xattrs in both BuildKit and Kaniko; chown is no longer needed because
Linux file caps are uid-independent and distroless runs the binary at uid
65532 regardless of file ownership.

The capstamp stage runs on BUILDPLATFORM since setcap only writes xattrs
and does not need to execute the target-arch binary.

Verified locally with BuildKit: extracting the binary from the final
image with sudo (to preserve xattrs on the host fs) shows
`cap_net_bind_service,cap_net_admin=ep` as expected.

Closes #54.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gofix merged commit 7b7679f7ed into master 2026-05-03 15:48:32 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
gofix/portal-tunnel-rs!55
No description provided.