Skip to content

Add support for `.pyc` file introspection

In https://github.com/NixOS/nixpkgs/issues/139292 we encountered non-determinism in pytest-xdist files.

Running examples:

Unfortunately diffoscope does not support it's diffing:

$ diffoscope a.pyc b.pyc
--- a.pyc
+++ b.pyc
@@ -1,8 +1,8 @@
-00000000: 610d 0d0a 0000 0000 bd10 3561 8937 0000  a.........5a.7..
+00000000: 610d 0d0a 0000 0000 ae81 4d61 8937 0000  a.........Ma.7..
 00000010: e300 0000 0000 0000 0000 0000 0000 0000  ................
 00000020: 0003 0000 0040 0000 0073 6800 0000 6400  .....@...sh...d.
 00000030: 6401 6c00 5a01 6400 6401 6c02 6d03 0200  d.l.Z.d.d.l.m...
 00000040: 0100 6d04 5a05 0100 6400 6402 6c06 6d07  ..m.Z...d.d.l.m.
 00000050: 5a07 0100 6400 6403 6c08 6d09 5a09 0100  Z...d.d.l.m.Z...
 00000060: 6400 6404 6c0a 6d0b 5a0b 0100 6400 6405  d.d.l.m.Z...d.d.
 00000070: 6c0c 6d0d 5a0d 0100 6400 6406 6c0e 6d0f  l.m.Z...d.d.l.m.

.pyc file format is dependent on python version, but I think it would not be too bad to use simple differ with current python.

Simple PoC:

$ diff -u <(./dump-pyc.py a.pyc) <(./dump-pyc.py b.pyc)
--- /dev/fd/63  2021-09-27 00:05:02.413774549 +0100
+++ /dev/fd/62  2021-09-27 00:05:02.413774549 +0100
@@ -1,5 +1,5 @@
 magic b'610d0d0a'
-moddate b'bd103561' (Sun Sep  5 19:47:25 2021)
+moddate b'ae814d61' (Fri Sep 24 08:43:42 2021)
 files sz 14217
...

.pyc file format slightly changes from python to python version, but generally it's a tiny header + serialized bytecode: https://github.com/python/cpython/blob/main/Lib/importlib/_bootstrap_external.py#L694-L701

def _code_to_timestamp_pyc(code, mtime=0, source_size=0):
    "Produce the data for a timestamp-based pyc."
    data = bytearray(MAGIC_NUMBER)
    data.extend(_pack_uint32(0))
    data.extend(_pack_uint32(mtime))
    data.extend(_pack_uint32(source_size))
    data.extend(marshal.dumps(code))
    return data

WDYT of adding simple differ of .pyc files?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information