Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • diffoscope diffoscope
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 132
    • Issues 132
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Reproducible BuildsReproducible Builds
  • diffoscopediffoscope
  • Issues
  • #278
Closed
Open
Issue created Sep 26, 2021 by Sergei Trofimovich@trofiContributor

Add support for `.pyc` file introspection

In https://github.com/NixOS/nixpkgs/issues/139292 we encountered non-determinism in pytest-xdist files.

Running examples:

  • a.pyc
  • b.pyc

Unfortunately diffoscope does not support it's diffing:

$ diffoscope a.pyc b.pyc
--- a.pyc
+++ b.pyc
@@ -1,8 +1,8 @@
-00000000: 610d 0d0a 0000 0000 bd10 3561 8937 0000  a.........5a.7..
+00000000: 610d 0d0a 0000 0000 ae81 4d61 8937 0000  a.........Ma.7..
 00000010: e300 0000 0000 0000 0000 0000 0000 0000  ................
 00000020: 0003 0000 0040 0000 0073 6800 0000 6400  .....@...sh...d.
 00000030: 6401 6c00 5a01 6400 6401 6c02 6d03 0200  d.l.Z.d.d.l.m...
 00000040: 0100 6d04 5a05 0100 6400 6402 6c06 6d07  ..m.Z...d.d.l.m.
 00000050: 5a07 0100 6400 6403 6c08 6d09 5a09 0100  Z...d.d.l.m.Z...
 00000060: 6400 6404 6c0a 6d0b 5a0b 0100 6400 6405  d.d.l.m.Z...d.d.
 00000070: 6c0c 6d0d 5a0d 0100 6400 6406 6c0e 6d0f  l.m.Z...d.d.l.m.

.pyc file format is dependent on python version, but I think it would not be too bad to use simple differ with current python.

Simple PoC:

  • dump-pyc.py is a simple dumper of .pyc file:
$ diff -u <(./dump-pyc.py a.pyc) <(./dump-pyc.py b.pyc)
--- /dev/fd/63  2021-09-27 00:05:02.413774549 +0100
+++ /dev/fd/62  2021-09-27 00:05:02.413774549 +0100
@@ -1,5 +1,5 @@
 magic b'610d0d0a'
-moddate b'bd103561' (Sun Sep  5 19:47:25 2021)
+moddate b'ae814d61' (Fri Sep 24 08:43:42 2021)
 files sz 14217
...

.pyc file format slightly changes from python to python version, but generally it's a tiny header + serialized bytecode: https://github.com/python/cpython/blob/main/Lib/importlib/_bootstrap_external.py#L694-L701

def _code_to_timestamp_pyc(code, mtime=0, source_size=0):
    "Produce the data for a timestamp-based pyc."
    data = bytearray(MAGIC_NUMBER)
    data.extend(_pack_uint32(0))
    data.extend(_pack_uint32(mtime))
    data.extend(_pack_uint32(source_size))
    data.extend(marshal.dumps(code))
    return data

WDYT of adding simple differ of .pyc files?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking