Skip to content

Add FileBackend for external Debian archives

From the milestone 2 requirements:

MUST: Implement a way to mirror an external repository in debusine, building on a new concept of “collection”
  ...
  Implement a [FileBackend] that makes the content of an external repository available within debusine

Note that this is not the same as a FileBackend that stores files in a cloud service such as S3. The repository must be assumed to be managed outside debusine (by dak or similar) and so it must be treated as read-only. It's also expected to be laid out as a Debian repository rather than as a general content-addressed file store. I don't think there are any serious blockers for this, so here are some notes on what I think needs to be implemented.

  • FileBackendInterface.get_url takes a File, which only knows the file's hash and size, and that isn't enough to determine its URL in this case. However, FileInStore has a data column (currently unused, I think?), which according to #63 (closed) was intended for the case "where it might be required to store supplementary data to be able to retrieve the file through the cloud API". We'll need to require externally-hosted files to have something like a path key set there.
  • We need an ExternalDebianArchiveFileBackend (or a similar name) that implements get_url but not get_local_path, _remove_file, or add_file. `
  • FileBackendInterface.get_stream only supports local files right now, and needs to be extended to support streaming from a URL.
  • The monthly_cleanup command needs to be changed to consider only LocalFileBackends. (A file store on S3 or whatever would also be fine; the necessary distinction isn't between local and remote, but between read-write and read-only.)
  • It's important to note that this file store can't honour invariants that files in the store never change their contents or never disappear without debusine's knowledge, since the files themselves are managed externally. I think that's OK for our current uses, but it may be worth auditing existing code.
  • We'll need something to create File rows in the database too, but that isn't something that can be done generically, so I suggest that we don't do that as part of this work. Instead, the mirror operation will decide what File rows it needs to create or delete based on the archive's index files.
  • I think we should slightly reconsider how the debusine client's download_artifact method works. It currently always downloads a compressed tarball of the artifact's files, and then either saves them as a tarball or decompresses the tarball into the given destination. This requires the server to stream all the files from the store and build a tarball of them, even if they're going to be split back out into individual files, which is quite a bit of work that it really doesn't need to do; this is especially true for remote files, where if it's asked for the files separately then it just redirects to the appropriate URL. Tasks (and indeed all callers that I can find) always use the tarball=False mode right now.
Edited by Sylvain Beucler
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information