Specs for an official debusine deploy ($665) · Snippets · Freexian SARL / debusine

DSA sometimes has the postgresql host separate to the service. Do we want a dedicated postgresql on that host or do we want to use an existing database server and just get a database?

I tried to comment via git clone + git push and I can't:

carles@pinux:[main]~/git/debusine/665$ git commit --amend
[main a892ade] Update snippet
 Date: Tue Oct 24 15:00:16 2023 +0100
 1 file changed, 11 insertions(+)
carles@pinux:[main]~/git/debusine/665$ git push origin main 
remote: 
remote: ========================================================================
remote: 
remote: You are not allowed to update this snippet.
remote: 
remote: ========================================================================
remote: 
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

I will copy-paste in a new comment for now.

Regarding RAM:

CPE: side note: if I remember correctly (I can check if needed): the upload is
managed by Django and handled quite correctly (up to 2.5 MB in memory by
default only, as per
https://docs.djangoproject.com/en/3.2/topics/http/file-uploads/#where-uploaded-data-is-tored)
It's the download of artifacts in .tar.gz that we struggle, due how it's
implemented to do the .tar.gz on the fly (and that the generator is faster than
the sender).

Regarding Apache:

CPE: Using Apache should be possible if needed. We use "daphne" as a server
application and nginx as a reverse proxy. We should be able to do the same using
Apache as a frontend.

@helmutg I'd go with just getting a database without needing to manage a dedicated PostgreSQL on that service, as I never found the separate PosgreSQL access to be lacking for Django usage

@carlespina ouch, I have no idea how to give others write access to a snippet :/

I'll manually fold in your notes. Is there a better way in GitLab to do this?

The snippet now looks good enough to start a discussion with DSA

Regarding artifact download being RAM hungry: Is it reasonable to consider this a fixable implementation detail? As far as I can see, we're using ASGI here and ASGI can feed back on the speed of transmission, so the on-the-fly creation of a .tar.gz is not something that is fundamentally RAM hungry, but our current implementation is and it can be fixed when there is a need to fix. Do you confirm?

We should describe how we will be doing deployment.

I assume deployment will be debian packages? I guess we'll need a mechanism to update them, as non-root.

Alternatively, we run out of a git checkout and virtualenvs. Either way, we should describe our needs.

I very much assume that we will not be deploying Debian packages, because that removes our ability to quickly update the software. It expect us to run debusine from a git checkout while adding a dependency-package. Are there any debusine dependencies that are not available or too old in bookworm?

We aim to support bookworm. So right, virtualenv shouldn't be required.

We'll just need to give DSA a list of dependencies for their debusine puppet role / metapackage (can't remember how they handle them).

We git send-email a patch to https://salsa.debian.org/dsa-team/mirror/debian.org to debian-admin@lists.debian.org adding a metapackage probably called debian.org-debusine.debian.org that has a pile of dependencies and no content. Let me know if I should write or review said patch.

Regarding @helmutg 's question:

Regarding artifact download being RAM hungry: Is it reasonable to consider this a fixable implementation detail? As far as I can see, we're using ASGI here and ASGI can feed back on the speed of transmission, so the on-the-fly creation of a .tar.gz is not something that is fundamentally RAM hungry, but our current implementation is and it can be fixed when there is a need to fix. Do you confirm?

I've been trying to remember.

For uploading:

Upload from debusine client: this is chunked and memory consumption is ok
Upload from a Web client: there were some problems some months ago when I implemented it due to daphne. I tested it with uvicorn IIRC and was all good there. Described in https://github.com/django/daphne/issues/483 . In Debusine: we decided that we have the Nginx maximum post size of 100 MB, bigger files can be uploaded using the client and that if we want to upload bigger files via Web we should look at implementing it in Javascript to give progress, chunks, etc. (I can find the notes in the MR if needed, we discussed this with Enrico and Raphaël I think)

For downloading:

Downloading an artifact using .tar.gz: via Web or via debusine.client (it happens when the workers download a huge artifact[1]) is currently problematic because we use Django 3.2 and a relevant issue was fixed in Django 4.0. References: https://github.com/django/daphne/issues/170 (I don't think that this one is daphne specific), fixed in Django 4.0 (pretty sure that I tested it) via https://github.com/django/django/pull/16384 and ticket https://code.djangoproject.com/ticket/33735

[1]: we could download the artifact's files in chunks and avoid generating the .tar.gz and streaming it

Is it fixable? Yes, for download we could:

Upgrade to Django 4.0 if this is an option (and re-test)
Download the artifacts file-by-file (and in chunks if needed) instead of generating a .tar.gz; or other changes in the downloads

Upload bigger files via Web one of:

replace daphne with uvicorn (as I tested and it was good, needs re-testting)
Implement upload of big files in Javascript and chunks

Thanks for your research on request sizes. What you write sounds a lot like this is not a fundamental an insurmountable problem, but something that can be fixed at a later time given effort (e.g. upgrading Django). It also sounds like the current state works "well enough for now" and therefore this is non-trivial, non-essential and deferrable work item.

@stefanor said:

We aim to support bookworm. So right, virtualenv shouldn't be required.

At the moment we are supporting bullseye and bookworm. If we can drop support for bullseye it would be great :-)

I really meant to say bullseye there :)

I very much assume that we will not be deploying Debian packages, because that removes our ability to quickly update the software. It expect us to run debusine from a git checkout while adding a dependency-package

FWIW I assume the same, that's also how we handle tracker.debian.org.

I added these specs to a draft a mail to DSA at https://semestriel.framapad.org/p/qfwikuhrv0-a40v?lang=en

I didn't go into details about specific dependencies and how we deploy, as running webapps from git is quite standard in my experience.