ci: amd64 container image build fails deterministically (release v2.1.8+rs.1 blocked) #53
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The
Build and publish container imageworkflow consistently fails on thelinux/amd64matrix job. Thelinux/arm64job succeeds with the same code, Dockerfile, build args, and registry auth. As a result, multi-arch manifest publishing is skipped and tagged releases (e.g.,v2.1.8+rs.1) ship with only the arm64 image in the registry; pulling:<tag>(no arch suffix) fails.This blocks the first stable Rust port release.
Evidence
Last two workflow runs for tag
v2.1.8+rs.1(commitab8f612):The second attempt fails ~3× faster than the first, which strongly suggests a deterministic Kaniko-step failure being short-circuited by warm caches rather than a flaky runner.
Full container-image workflow history on this repo:
The daemonless Kaniko workflow introduced in #50 was validated only against
linux/arm64with--no-push; thelinux/amd64matrix arm has never produced a successful image.What is in the registry now
code.rly.best/gofix/portal-tunnel-rs:v2.1.8-rs.1-arm64✓ publishedcode.rly.best/gofix/portal-tunnel-rs:v2.1.8-rs.1✗ multi-arch index missing (publish job skipped)code.rly.best/gofix/portal-tunnel-rs:latest✗ not updatedThe git tag
v2.1.8+rs.1and the corresponding Forgejo release entry exist but reference a release whose container image is only partially shipped.Steps that succeed before the failure
From the run page state of run #40, job
Build linux/amd64 image:So: tag-to-image-tag conversion (
+→-),rs.Nextraction from the ref, and registry auth are all working. The failure is inside the Kaniko executor invocation itself.Why this issue does not include a log excerpt
Forgejo's job log endpoints return
task with job_id N and attempt 0: resource does not existfor both run #39 and run #40 amd64 jobs. The runner appears to either drop logs on failed cleanup or never upload them in this configuration. The UI may still show partial output that the API does not expose.Suspected causes (none confirmed without logs)
-C lto=...,aws-lc-sys,ringC deps are memory-hungry).portal-cargo-target-amd64has been used by every previous failed run; the cache issharing=lockedso it persists across runs).code.rly.best/gofix/portal-tunnel-rs-cache-amd64specifically (REGISTRY_TOKEN may have access to the main repo but not this side repo).gcc-x86-64-linux-gnu/libc6-dev-amd64-cross/linux-libc-dev-amd64-crosspackages from a base-image change.CC_x86_64_unknown_linux_gnu=x86_64-linux-gnu-gccis set even when building natively, which forces use of a cross GCC that may behave differently from the default).Suggested next steps
CARGO_BUILD_JOBS=1) for amd64 only, or move LTO off for the release profile.portal-cargo-target-amd64cache mount andportal-tunnel-rs-cache-amd64cache repo on the registry.TARGETARCH == amd64and the runner is also amd64 — let cargo use the default linker.Release status
Until this is resolved,
v2.1.8+rs.1is effectively arm64-only. Consider one of:draftuntil amd64 is fixed.v2.1.8+rs.2once amd64 builds.Filed while attempting
v2.1.8+rs.1. Not specific to that tag — symptom predates the tag and predates the k3s-style versioning convention adopted in #51 / #52.