Make SALSA_CI_BUILD_TIMEOUT_ARGS automatically 10-15 minutes less than CI_JOB_TIMEOUT
In GitLab CI (or at least how Salsa CI pipeline is currently configured) when a job times out, the GitLab CI cache is not saved. For example in https://salsa.debian.org/otto/mariadb-server/-/jobs/7566665/viewer one can see:
/usr/lib/ccache/c++ -DBTR_CUR_ADAPT -DBTR_CUR_HASH_ADAPT -DHAVE_CONFIG_H -DHAVE_FALLOC_PUNCH_HOLE_AND_KEEP_SIZE=1 -DHAVE_LIBNUMA=1 -DHAVE_PMEM -DHAVE_SYSTEM_REGEX -DHAVE_URING -DPCRE_STATIC=1 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE=1 -Dmariadb_backup_EXPORTS -I/builds/otto/mariadb-server/debian/output/source_dir/wsrep-lib/include -I/builds/otto/mariadb-server/debian/output/source_dir/wsrep-lib/wsrep-API/v26 -I/builds/otto/mariadb-server/debian/output/source_dir/builddir/include -I/builds/otto/mariadb-server/debian/output/source_dir/include/providers -I/builds/otto/mariadb-server/debian/output/source_dir/storage/innobase/include -I/builds/otto/mariadb-server/debian/output/source_dir/storage/innobase/handler -I/builds/otto/mariadb-server/debian/output/source_dir/libbinlogevents/include -I/builds/otto/mariadb-server/debian/output/source_dir/tpool -I/builds/otto/mariadb-server/debian/output/source_dir/include -I/builds/otto/mariadb-server/debian/output/source_dir/sql -I/builds/otto/mariadb-server/debian/output/source_dir/storage/maria -I/builds/otto/mariadb-server/debian/output/source_dir/extra/mariabackup/quicklz -I/builds/otto/mariadb-server/debian/output/source_dir/extra/mariabackup -g -O2 -ffile-prefix-map=/builds/otto/mariadb-server/debian/output/source_dir=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wdate-time -D_FORTIFY_SOURCE=2 -Wdate-time -D_FORTIFY_SOURCE=2 -pie -fPIC -fstack-protector --param=ssp-buffer-size=4 -Wconversion -Wno-sign-conversion -O3 -g -DNDEBUG -g -fno-omit-frame-pointer -fno-strict-aliasing -Wno-uninitialized -fno-omit-frame-pointer -D_FORTIFY_SOURCE=2 -DDBUG_OFF -Wall -Wenum-compare -Wenum-conversion -Wextra -Wformat-security -Wmissing-braces -Wno-format-truncation -Wno-init-self -Wno-nonnull-compare -Wno-unused-parameter -Wnon-virtual-dtor -Woverloaded-virtual -Wvla -Wwrite-strings -std=gnu++17 -Wdate-time -D_FORTIFY_SOURCE=2 -DHAVE_OPENSSL -DOPENSSL_API_COMPAT=0x10100000L -UMYSQL_SERVER -DHAVE_OPENSSL -DOPENSSL_API_COMPAT=0x10100000L -MD -MT extra/mariabackup/CMakeFiles/mariadb-backup.dir/backup_copy.cc.o -MF CMakeFiles/mariadb-backup.dir/backup_copy.cc.o.d -o CMakeFiles/mariadb-backup.dir/backup_copy.cc.o -c /builds/otto/mariadb-server/debian/output/source_dir/extra/mariabackup/backup_copy.cc
Terminated
make[4]: *** [extra/mariabackup/CMakeFiles/mariadb-backup.dir/build.make:319: extra/mariabackup/CMakeFiles/mariadb-backup.dir/encryption_plugin.cc.o] Error 143
make[4]: *** Waiting for unfinished jobs....
make[4]: Leaving directory '/builds/otto/mariadb-server/debian/output/source_dir/builddir'
make[3]: *** [CMakeFiles/Makefile2:7250: extra/mariabackup/CMakeFiles/mariadb-backup.dir/all] Error 2
make[3]: Leaving directory '/builds/otto/mariadb-server/debian/output/source_dir/builddir'
make[2]: *** [Makefile:169: all] Error 2
make[2]: Leaving directory '/builds/otto/mariadb-server/debian/output/source_dir/builddir'
make[1]: *** [debian/rules:133: override_dh_auto_build] Error 2
make[1]: Leaving directory '/builds/otto/mariadb-server/debian/output/source_dir'
make: *** [debian/rules:242: binary-indep] Error 2
dpkg-buildpackage: error: debian/rules binary-indep subprocess returned exit status 2
WARNING: step_script could not run to completion because the timeout was exceeded. For more control over job and script timeouts see: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-script-and-after_script-timeouts
ERROR: Job failed: execution took longer than 2h0m0s seconds
This is not ideal, because for C/C++ builds that re-run of the build could actually be much faster if the ccache
from the failed build was used to prime the next build. Therefore, in the Salsa CI pipeline the SALSA_CI_BUILD_TIMEOUT_ARGS
variable was designed to be passed to the command timeout
, which aborts a build that takes longer than the CI timeout.
However, there is no automation to ensure that the SALSA_CI_BUILD_TIMEOUT_ARGS
is actually less than the job timeout. Therefore, as seen in the log linked above, it does run with $ su salsaci -c "timeout ${SALSA_CI_BUILD_TIMEOUT_ARGS} ${BUILD_COMMAND} && if [ "${BUILD_TWICE}" = "true" ]; then ${BUILD_COMMAND}; fi" |& OUTPUT_FILENAME=${BUILD_LOGFILE} filter-output
but the ENV are:
SALSA_CI_BUILD_TIMEOUT_ARGS= -v 2.75h
-
CI_JOB_TIMEOUT=7200
(2.0h)
Main issue: fix timeout logic always stop builds so there is time to upload cache
We should fix this by
- Add automation to ensure that
SALSA_CI_BUILD_TIMEOUT_ARGS
is always ~10 min less thanCI_JOB_TIMEOUT
OR
- Redesign the whole timeout feature using the new built-in variables
RUNNER_SCRIPT_TIMEOUT
andRUNNER_AFTER_SCRIPT_TIMEOUT
introduced in GitLab 16.4
Extra: review cache name
Related, we should also make sure that the cache key makes sense. We don't want projects to have too many unique caches as it is wasteful, but also we don't want them to be overloaded so that builds constantly evict cached objects that are still useful for many builds.
Current..
.build-definition: &build-definition
stage: build
image: $SALSA_CI_IMAGES_BASE
cache:
when: always
key: "build-${BUILD_ARCH}_${HOST_ARCH}"
paths:
- .ccache
..results in e.g.
Checking cache for build-amd64_-non_protected...
Downloading cache from https://storage.googleapis.com/debian-salsa-prod-runner-cache/project/74127/build-amd64_-non_protected ETag="8fd5af663f88e9890bc428e1b6071216"
..when the build has BUILD_ARCH=amd64
and HOST_ARCH=
.
Extra: show info about downloaded cache
Additionally, the current zeroing of ccache before the build starts is unhelpful for debugging what the downloaded cache contained:
$ ccache -z
Statistics zeroed
If would be better replace the above with ccache --show-stats --zero-stats
as it would list things as "files in cache" and "cache size" while also zeroing the stats for the run that is about to start.