Advice on dealing with packages with very large artifacts
First of all, I would like to express my thanks for all of your excellent work on Salsa CI. It makes all of our lives better.
I am seeking some advice on the best way to handle packages with very large artifacts, so large that they can't currently be uploaded. Specifically, I am working on trying to enable Salsa CI for the qt6-webengine
package.
As a little bit of background, I have ended up working on a number of packages that have very long builds or tests, longer then can run under the typical timeouts. As a solution, I have setup my on custom runner, which I can set to run as long as needed (sometimes up to 10 hours). This has generally solved my problems, and I am able to use the default Salsa CI pipeline for these repositories.
But in the case of qt6-webengine
, there is an additional problem in that the size of the artifacts are too large to upload at the end of the build. For example, at the end of this recent build they were 833M, which is above the current limit of 750M.
https://salsa.debian.org/soren/qt-6-webengine-salsa-ci/-/jobs/8371730#L520
The purpose of this post is to seek guidance about the best way to work around this issue. I have thought of a few possibilities.
- Speak to whoever sets the artifact upload limit and see if they would be amenable to raising it to 1G. I am not sure who sets that limit, but if you know who they are I would be willing to reach out to them if you think they might consider it. I like this solution the best, because then I could use the default Salsa CI pipelines. My assumption is that a 1G limit should work for
qt6-webengine
for the foreseeable future, although there might be other projects for which it is insufficient.
As a side note, this option only recently became possible with the refactoring of the default Salsa CI pipeline to remove the provisioning step. Prior to that, the unpacked source consumed 4.5G, which I was almost certain nobody was going to allow to be uploaded as an artifact.
-
I previously had a workaround that involved deleting some of the artifacts from the full build. The later tests would get some of the information they needed from the full build and the rest from the source build. Specifically, if I removed the
.orig.tar.xz
from the full build, that dropped the artifacts from 833M to 204M. In the source build, the total artifacts, including the.orig.tar.xz
, were 630M. The tests would then download all of these artifacts and combine them. However, in refactoring my custom salsa-ci.yml to better match the current default, this combination logic appears to have been removed and only the artifacts from the full build are downloaded. I could look at reimplementing this combination logic, but I wasn't sure how fragile it was and how much effort it would take to maintain it long term. As a general rule, I would like to keep my custom salsa-ci.yml as close to the upstream as possible. -
Instead of relying on the artifacts for the
.orig.tar.xz
, I could regenerate it inside of each test. Doing so is more CPU intensive than for most projects because there are a very large number of excluded files (as well as more bandwidth intensive, as the upstream tarball is significantly larger). However, if this is the best way to go, I am willing to do so, because I am running my custom runner on my own hardware, so it isn't eating into anyone else's resources.
I really want to get Salsa CI running for qt6-webengine
, because we are about to embark on a project to start a partially-automated cherry-pick of upstream-upstream security fixes from Chromium directly into the Qt6 WebEngine packages in Debian. Traditionally there is a multi-month delay in security fixes making it from Chromium into upstream Qt6 WebEngine and then into Debian packages. This cherry-picking of security fixes will inevitably lead to some regressions. Having a functioning Salsa CI will help us catch these before we push them to end users. More information about this effort is at: