-
Andrius Merkys authoredAndrius Merkys authored
PyCDDL: Deserialize CBOR and/or do CDDL schema validation
CDDL is a schema language for the CBOR serialization format.
pycddl
allows you to:
- Validate CBOR documents match a particular CDDL schema, based on the Rust
cddl
library. - Optionally, decode CBOR documents.
Usage
Validation
Here we use the cbor2
library to serialize a dictionary to CBOR, and then validate it:
from pycddl import Schema
import cbor2
uint_schema = Schema("""
object = {
xint: uint
}
"""
)
uint_schema.validate_cbor(cbor2.dumps({"xint", -2}))
If validation fails, a pycddl.ValidationError
is raised.
Validation + deserialization
You can deserialize CBOR to Python objects using cbor.loads()
.
However:
-
cbor2
uses C code by default, and the C programming language is prone to memory safety issues. If you are reading untrusted CBOR, better to use a Rust library to decode the data. - You will need to parse the CBOR twice, once for validation and once for decoding, adding performance overhead.
By deserializing with pycddl
, you solve the first problem, and a future version of pycddl
will solve the second problem (see https://gitlab.com/tahoe-lafs/pycddl/-/issues/37).
from pycddl import Schema
import cbor2
uint_schema = Schema("""
object = {
xint: uint
}
"""
)
deserialized = uint_schema.validate_cbor(cbor2.dumps({"xint", -2}), True)
assert deserialized == {"xint": -2}
Deserializing without schema validation
If you don't care about schemas, you can just deserialize the CBOR like so:
from pycddl import Schema
ACCEPT_ANYTHING = Schema("main = any")
def loads(encoded_cbor_bytes):
return ACCEPT_ANYTHING.validate_cbor(encoded_cbor_bytes, True)
In a future release this will become a standalone, more efficient API, see https://gitlab.com/tahoe-lafs/pycddl/-/issues/36
Reducing memory usage and safety constraints
In order to reduce memory usage, you can pass in any Python object that implements the buffer API and stores bytes, e.g. a memoryview()
or a mmap
object.
The passed-in object must be read-only, and the data must not change during validation! If you mutate the data while validation is happening the result can be memory corruption or other undefined behavior.
Supported CBOR types for deserialization
If you are deserializing a CBOR document into Python objects, you can deserialize:
- Null/None.
- Booleans.
- Floats.
- Integers up to 64-bit size. Larger integers aren't supported yet.
- Bytes.
- Strings.
- Lists.
- Maps/dictionaries.
- Sets.
Other types will be added in the future if there is user demand.
Schema validation is not restricted to this list, but rather is limited by the functionality of the cddl
Rust crate.
Release notes
0.6.3
Features:
- Support final 3.13.