Commit 2ef38a61 authored by Bastian Blank's avatar Bastian Blank 🙉

Update salsa-postmortem-docker-registry from reviews

parent 402f7627
......@@ -5,62 +5,45 @@ Author: Bastian Blank
Tags: debian, salsa
Status: draft
GitLab uses the registry tool out of [Docker distribution](https://github.com/docker/distribution) to provide a Docker image registry.
This software supports multiple backends for file storage, including a local filesystem, S3 and Google Cloud Storage (GCS).
As [Salsa](https://salsa.debian.org/) already uses GCS for data storage, we decided to move all the Docker registry data off to it too.
The [Salsa](https://salsa.debian.org/) admin team provides the following report about the failed migration of the Docker container registry.
  • The Salsa admin team provides the following report about the failed migration of the Docker container registry.

    The container registry is part of the Docker distribution toolset and is used to store Docker images, which are for example used in the Salsa CI toolset.

    This container registry system supports multiple backends for file storage: local, Amazon Simple Storage Service (Amazon S3) and Google Cloud Storage (GCS).

    As Salsa already uses GCS for data storage, the Salsa admin team decided migrate all of the Docker registry data to GCS which would have lowered the used file system space on Debian systems significantly.

Please register or sign in to reply
The Docker container registry stores Docker images,
which are for example used in the Salsa CI toolset.
This migration would have moved all data off to Google Cloud Storage (GCS)
and would have lowered the used file system space on Debian systems significantly.
## Migration and roolback
The Docker container registry is part of the [Docker distribution](https://github.com/docker/distribution) toolset.
This system supports multiple backends for file storage: local, Amazon Simple Storage Service (Amazon S3) and Google Cloud Storage (GCS).
As Salsa already uses GCS for data storage, the Salsa admin team decided to move all the Docker registry data off to GCS too.
On 2019-08-06 we started the migration process.
The migration itself went fine, even if it took a bit longer then anticipated.
Everything looked fine and user access worked fine.
However as not all parts of the migration had been properly tested,
## Migration and rollback
On 2019-08-06 the migration process was started.
The migration itself went fine, although it took a bit longer than anticipated.
However, as not all parts of the migration had been properly tested,
a test of the garbage collection triggered a [bug](https://github.com/docker/distribution/issues/2975) in the software.
On 2019-08-10 we started to see problems with garbage collection.
On 2019-08-10 the Salsa admins started to see problems with garbage collection.
The job running it timed out after one hour.
Within this timeframe it not even managed to collect information about all used layers to see what it can cleanup.
A source code analysis showed that this can't be fixed.
A source code analysis showed that this design flaw can't be fixed.
On 2019-08-13 we switched back to storing data on the filesystem.
On 2019-08-13 the change was rolled back to storing data on the file system.
## Docker registry data storage
The Docker registry stores all data in a file system like structure.
There is no sort of index of the contents.
There isn't anything that would make searching for stuff easy.
Everything is in the file system.
Within this structure it saves four kinds of information.
First are the manifests that make up images and show what it contains.
It saves tags that provide a name to manifests.
There are deduplicated layers or blobs, storing the real data.
Links show what deduplicated blobs belongs to an image.
All of that is stored without any reverse references.
The whole structure is built as append-only.
You can add blobs, you can also add manifests.
You can add, change and delete tags.
However cleanup anything up apart from tags is not really a things.
There is a garbage collection process.
According to the documentation you must only use it while the registry is read-only.
It can cleanup unreferenced blobs.
Since the last release it can also cleanup unreferenced manifests.
However it can't cleanup links.
The Docker registry stores all of the data sans indexing or reverse references in a file system-like structure comprised of 4 separate types of information:
Manifests of images and contents, tags for the manifests, deduplicaed layers (or blobs) which store the actual data, and lastly links which show which deduplicated blogs belong to their respective images, all of this does not allow for easy searching within the data.
## Docker registry garbage collection on external storage
The file system structure is built as append-only which allows for adding blobs and manifests, addition, modification, or deletion of tags.
However cleanup of items other than tags is not achievable within the maintenance tools.
There is a garbage collection process which can be used to clean up unreferenced blobs, however according to the documentation the process can only be used while the registry is set to read-only and unfortunately it cannot be used to clean up unused links.
For a garbage collection the registry tool needs to read a lot of information.
Remember, there is no index it could use to see what's in there.
So it goes out to the storage and downloads … everything, or at least lists every object.
The registry attached to Salsa contains around 110k files.
## Docker registry garbage collection on external storage
It has to download every manifest to get the referenced blobs.
This process somehow takes over a second for each manifest.
I haven't counted the currently available manifests,
but it is clear that this may take a lot of time.
So in the used configuration with the external storage it is simply impossible to run any cleanup.
For the garbage collection the registry tool needs to read a lot of information as there is no indexing of the data.
The tool connects to the storage medium and proceeds to download … everything, every single manifest and information about the referenced blobs, which now takes up over 1 second to process a single manifest.
This process will take up a significant amount of time, which in the current configuration of external storage would make the clean up nearly impossible.
## Leasons learned
......@@ -68,4 +51,10 @@ The Docker registry is a data storage tool that can only properly be used in app
If you never cleanup, it works well.
As soon as you want to actually remove data, it goes bad.
For Salsa we actually want to remove stuff, as the registry currently grows about 20GB per day.
For Salsa clean up of old data is actually a necessity, as the registry currently grows about 20GB per day.
## Next steps
Sadly there is not much that can be done using the existing Docker container registry.
Maybe GitLab or someone else would like to contribute a new implementation of a Docker registry,
either integrated into GitLab itself or stand-alone?
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment