ci: published arm64 image runs without file capabilities (rly.best deploy crashes) #54
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The arm64 image successfully built and pushed by the new Kaniko-based daemonless workflow (#50) is missing the binary file capabilities that the relay needs to configure the kernel WireGuard overlay as a non-root process. The container starts and immediately crash-loops on:
This came up trying to deploy
v2.1.8+rs.1(commitab8f612) onto the production relay atrly.best. The deploy was rolled back; rly.best is back on the prior locally-built image (portal-relay-gofix:169c301-arm64).Evidence
The Dockerfile already runs
setcapin the build stage:Verified by extracting the binary from each image on the relay VM:
Same Dockerfile line, different result. The two builds differ in:
docker buildx, now Kaniko viagcr.io/kaniko-project/executor:debug(introduced in #50).setcapis applied in the build stage; the binary is thenCOPY --from=build --chown=65532:65532into agcr.io/distroless/cc-debian12:nonrootfinal stage.Kaniko has a known class of issues where xattrs (which
security.capabilityis) do not surviveCOPY --chownacross multi-stage builds. The local docker buildx path preserves them.Impact
The published v2.1.8+rs.1 container image (arm64) is non-functional out of the box. Any user pulling
code.rly.best/gofix/portal-tunnel-rs:v2.1.8-rs.1-arm64and running it with--cap-add NET_ADMIN --user 65532:65532(the documented setup) will hit the same crash loop. Together with #53 (amd64 not building at all), this means no user of the published v2.1.8+rs.1 release can actually run the relay.Suggested fixes
setcapafter the COPY in the final stage. Distroless has nosetcap, so this would require either a multi-FROM trick or a small intermediate image that holds setcap.libcap2-bin, thenRUN setcap ... && rm -rf /var/lib/apt/lists/*.prctl(PR_CAP_AMBIENT_RAISE)after gaining caps from--cap-add. Less invasive but adds a launcher dependency.COPY --chown).Fix path 1 with a builder image alongside distroless is probably the cleanest:
Building
setcapin a stage whose output is COPY'd straight in (no--chownin the offending step, or using a cap-preserving builder for that one COPY) should keep the xattr.Release status implication
With #53 (amd64 not built) and this issue (arm64 published image broken),
v2.1.8+rs.1is not actually shippable. Recommend:v2.1.8+rs.1as a botched release and re-cut asv2.1.8+rs.2once both this issue and #53 are resolved.Reproduction
Expect crash loop with
Operation not permitted (os error 1)from the WireGuard interface configuration step.Related: #53 (amd64 not building).