Skip to content

File storage

Files uploaded by workers and users are stored locally in a temporary directory before being added to their final "file store". The temporary directory is $DEBUSINE_DATA_PATH/uploads/ and the files are named after their SHA256 checksum.

Database models for file storage

Files can only be accessed through file stores, so in the database, a file is just a simple tuple (size, SHA256 checksum) mapped to a unique numeric identifier. A file store on the other hand is identified by a string, a backend implementation and some configuration data.

class File(models.Models):
	class Meta:
		unique_together = ("checksum", "size")
	checksum = models.CharField()
	size = models.BigIntegerField()

class FileStore(models.Models):
	name = models.CharField(unique=True)
	backend = models.CharField()
	configuration = models.JSONField()
	files = models.ManyToManyField(File, through="FileInStore")

class FileInStore(models.Models):
	class Meta:
		unique_together = ("store", "file")
	store = models.ForeignKey(FileStore)
	file = models.ForeignKey(File)
	data = models.JSONField()

There's an initial "default" file store using the "local" backend with an empty configuration dict, it's created by the database migrations. In the future, there might be other ways to create them and possibly to restrict them to specific workspaces.

The generic "data" field associated to each "FileInStore" is meant to be used for more advanced file stores, notably those cloud based where it might be required to store supplementary data to be able to retrieve the file through the cloud API.

Note that the need to efficiently reclaim space and ensure that all files are properly backupped will shortly require extensions to this model so that we keep a log of add/remove operations made on each FileStore.

Interface of a file store

We expect to support multiple "file store", both local and remote (cloud-based). The interface of a file store is very basic, its main purpose is to make it possible to retrieve the content of a file stored in the file store.

class FileStoreMinimalInterface:
	# Return an URL pointing to the content when possible, otherwise returns None.
	def get_url(self, fileobj):
		pass

	# Return the path to a local copy of the file when possible, otherwise returns None.
	def get_local_path(self, fileobj):
		pass
	
	# Remove the file from the store
	def remove_file(self, fileobj):
		pass
	
	# Add a file to the store
	def add_file(self, local_path, fileobj=None):
		pass

	# Returns a file-like object that can be read. Tries first get_local_path() and opens the file,
	# or makes a request to the URL.
	@contextlib.contextmanager
	def get_stream(self, fileobj):
		...
		yield fd
		...

Local file store

The local file store is a very simple store. It takes a single configuration key "path" giving the path of the root directory where all files are stored. If that path is not set, then it uses "$DEBUSINE_DATA_PATH/store" as the default path.

Inside that directory, files are stored according to their SHA256 checksum and size as "CHAR1_2/CHAR3_4/CHAR5_6/CHECKSUM-$HEXSIZE" where $CHAR1_2 are the first two characters of the checksum, $CHAR3_4 the next two, and so on, and $HEXSIZE is the size of the file encoded in hexadecimal. Note that the possibility of SHA256 checksum collision for two different files is almost non-existent but I believe that coupled with the size, it's close to zero.

Edited by Raphaël Hertzog
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information