-
Alexandre Detiste authoredAlexandre Detiste authored
Confection: The sweetest config system for Python
confection
Configuration is a huge challenge for machine-learning code because you may want to expose almost any detail of any function as a hyperparameter. The setting you want to expose might be arbitrarily far down in your call stack, so it might need to pass all the way through the CLI or REST API, through any number of intermediate functions, affecting the interface of everything along the way. And then once those settings are added, they become hard to remove later. Default values also become hard to change without breaking backwards compatibility.
To solve this problem, confection
offers a config system that lets you easily
describe arbitrary trees of objects. The objects can be created via function
calls you register using a simple decorator syntax. You can even version the
functions you create, allowing you to make improvements without breaking
backwards compatibility. The most similar config system we’re aware of is
Gin, which uses a similar syntax, and
also allows you to link the configuration system to functions in your code using
a decorator. confection
's config system is simpler and emphasizes a different
workflow via a subset of Gin’s functionality.
⏳ Installation
pip install confection
conda install -c conda-forge confection
👩💻 Usage
The configuration system parses a .cfg
file like
[training]
patience = 10
dropout = 0.2
use_vectors = false
[training.logging]
level = "INFO"
[nlp]
# This uses the value of training.use_vectors
use_vectors = ${training.use_vectors}
lang = "en"
and resolves it to a Dict
:
{
"training": {
"patience": 10,
"dropout": 0.2,
"use_vectors": false,
"logging": {
"level": "INFO"
}
},
"nlp": {
"use_vectors": false,
"lang": "en"
}
}
The config is divided into sections, with the section name in square brackets –
for example, [training]
. Within the sections, config values can be assigned to
keys using =
. Values can also be referenced from other sections using the dot
notation and placeholders indicated by the dollar sign and curly braces. For
example, ${training.use_vectors}
will receive the value of use_vectors in the
training block. This is useful for settings that are shared across components.
The config format has three main differences from Python’s built-in
configparser
:
- JSON-formatted values.
confection
passes all values throughjson.loads
to interpret them. You can use atomic values like strings, floats, integers or booleans, or you can use complex objects such as lists or maps. - Structured sections.
confection
uses a dot notation to build nested sections. If you have a section named[section.subsection]
,confection
will parse that into a nested structure, placing subsection within section. - References to registry functions. If a key starts with
@
,confection
will interpret its value as the name of a function registry, load the function registered for that name and pass in the rest of the block as arguments. If type hints are available on the function, the argument values (and return value of the function) will be validated against them. This lets you express complex configurations, like a training pipeline wherebatch_size
is populated by a function that yields floats.
There’s no pre-defined scheme you have to follow; how you set up the top-level sections is up to you. At the end of it, you’ll receive a dictionary with the values that you can use in your script – whether it’s complete initialized functions, or just basic settings.
For instance, let’s say you want to define a new optimizer. You'd define its
arguments in config.cfg
like so:
[optimizer]
@optimizers = "my_cool_optimizer.v1"
learn_rate = 0.001
gamma = 1e-8
To load and parse this configuration using a catalogue
registry (install
catalogue
separately):
import dataclasses
from typing import Union, Iterable
import catalogue
from confection import registry, Config
# Create a new registry.
registry.optimizers = catalogue.create("confection", "optimizers", entry_points=False)
# Define a dummy optimizer class.
@dataclasses.dataclass
class MyCoolOptimizer:
learn_rate: float
gamma: float
@registry.optimizers.register("my_cool_optimizer.v1")
def make_my_optimizer(learn_rate: Union[float, Iterable[float]], gamma: float):
return MyCoolOptimizer(learn_rate, gamma)
# Load the config file from disk, resolve it and fetch the instantiated optimizer object.
config = Config().from_disk("./config.cfg")
resolved = registry.resolve(config)
optimizer = resolved["optimizer"] # MyCoolOptimizer(learn_rate=0.001, gamma=1e-08)
⚠️ Caution: Type-checkers such asmypy
will mark adding new attributes toregistry
this way - i. e.registry.new_attr = ...
- as errors. This is because a new attribute is added to the class after initialization. If you are using typecheckers, you can either ignore this (e. g. with# type: ignore
formypy
) or use a typesafe alternative: instead ofregistry.new_attr = ...
, usesetattr(registry, "new_attr", ...)
.
Under the hood, confection
will look up the "my_cool_optimizer.v1"
function
in the "optimizers" registry and then call it with the arguments learn_rate
and gamma
. If the function has type annotations, it will also validate the
input. For instance, if learn_rate
is annotated as a float and the config
defines a string, confection
will raise an error.
The Thinc documentation offers further information on the configuration system:
- recursive blocks
- defining variable positional arguments
- using interpolation
- using custom registries
- advanced type annotations with Pydantic
- using base schemas
- filling a configuration with defaults
🎛️ API
Config
class This class holds the model and training
configuration and can load and save the
INI-style configuration format from/to a string, file or bytes. The Config
class is a subclass of dict
and uses Python’s ConfigParser
under the hood.