1. 21 May, 2018 1 commit
  2. 18 Mar, 2018 1 commit
  3. 22 Jan, 2017 2 commits
  4. 03 Jan, 2017 1 commit
  5. 28 Dec, 2016 3 commits
  6. 27 Dec, 2016 1 commit
  7. 18 Dec, 2016 1 commit
  8. 15 Aug, 2016 1 commit
  9. 30 May, 2016 1 commit
  10. 05 Dec, 2015 3 commits
    • Jérémy Bobbio's avatar
      Always compare container content if applicable · af6a0f09
      Jérémy Bobbio authored
      Now that we have a common interface for containers, we can compare their
      content as part of the default procedure. This saves having to do so
      in every compare_details() methods.
      af6a0f09
    • Jérémy Bobbio's avatar
      Make container class available on demand instead of bracketing · c359af51
      Jérémy Bobbio authored
      To access the content of a container, we used to require call to the open()
      method. Such bracketed operation makes it difficult to implement parallel
      processing. Instead, we now make container objects associated with a given file
      type available through the as_container property.
      
      The as_container property will lazily initialize an associated container
      object. The reference to the container object will be removed when the
      cleanup() method is called. Ressource can thus be deallocated on garbage
      collection.
      
      With this change, we can finally make Container.compare() return an iterator
      that can be lazily consumed instead of an evaluated list: the various files that
      need to be extracted to perform the comparisosn will be made available when
      required.
      
      File types which can be treated as containers must now define the
      CONTAINER_CLASS constant to point at a Container subclass handling this
      particular file type.
      c359af51
    • Jérémy Bobbio's avatar
      Use lazy extraction instead of explicit bracketing · 303aee94
      Jérémy Bobbio authored
      Previously, code requiring access to file content had to be explicitly
      bracketed using get_content() for files to be extracted and then deleted.
      Such construction is problematic for parallel processing as a file might be
      processed be multiple operations currently (e.g. multiple files being extracted
      from a unique archive at the same time).
      
      We thus removes the get_content() context and @needs_content decorator to
      prefer lazy path initialization: actual content will be made available through
      the path property. The extraction will happen then if necessary.
      
      The extracted file should normally be deleted when Python garbage collector
      reclaims the object. As a safety net, we still have a global registry of all
      temporary files and directories and remove them on exit.
      303aee94
  11. 21 Sep, 2015 3 commits
    • Jérémy Bobbio's avatar
      Switch to Python 3 · 84a58ee4
      Jérémy Bobbio authored
      This is the “red pill” commit where we jump from Python 2.7 to Python 3.3+:
      
       * She-bang now calls /usr/bin/python3.
       * debian/control file is updated to depends on python3-* packages.
       * py.test-3 is called instead of pytest to run the test suit.
       * We no longer need need to wrap sys.stdout as its output will be properly
         encoded depending on the system settings.
       * All “import __future__” statements are removed.
       * items() replaces viewitems() and keys() replaces viewkeys() when working
         on dicts.
         https://docs.python.org/3/whatsnew/3.0.html#views-and-iterators-instead-of-lists
       * We use the new metaclass syntax:
         https://www.python.org/dev/peps/pep-3115/
       * Msgunfmt.CHARSET_RE will work on raw bytes, so we need to add the 'b' modifier
         to the regex string.
       * str() replaces unicode() and chr() replaces unichr(): all strings are
         unicode strings now.
         https://docs.python.org/3/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
       * reduce() is no longer a built-in, so use functools.reduce() instead.
       * TarContainer.extract doesn't need to decode the returned path as all paths
         are now unicode strings.
       * We have to tweak the expected output when testing the RPM header conversion
         because repr() no longer include a trailing L anymore.
         https://docs.python.org/3/whatsnew/3.0.html#integers
      84a58ee4
    • Jérémy Bobbio's avatar
      Improve RPM header conversion · 7c645a66
      Jérémy Bobbio authored
      We now convert lists to have one item on each line, special case unicode
      strings to prepare for Python 3, and use repr to transform other types.
      7c645a66
    • Jérémy Bobbio's avatar
      Differentiate text and bytes when computing diffs · 555298a4
      Jérémy Bobbio authored
      In some situation, we want to compare streams of already encoded UTF-8 bytes,
      in some other we compare streams of unicode strings. To make it clear, we split
      Difference.from_file into Difference.from_raw_readers and
      Difference.from_text_readers.
      
      We want to avoid an overhead when we use the unfiltered output of a command to
      feed diff directly. So we assume commands output UTF-8 encoded bytes as its
      true in most cases. This leaves up to filters to similarly return UTF-8 bytes.
      They might have to decode their input from UTF-8 or other encoding if required.
      
      We also rename Difference.from_unicode into Difference.from_text to keep
      names in line.
      
      This changes are required for Python 3 which clearly separates unicode strings
      and bytes, but this should clearly help avoid encoding issues in the future.
      555298a4
  12. 18 Sep, 2015 1 commit
  13. 03 Sep, 2015 1 commit
    • Jérémy Bobbio's avatar
      Improve overloading of what gets compared in a container · e86d4449
      Jérémy Bobbio authored
      Most often, containers need to overload the set of files that gets
      compared and not the way comparisons are performed. So we introduce
      a new method get_members that returns a dictionary of names and
      members. The names will be used to match which file should actually be
      compared.
      
      We also split the compare method using a 'comparisons' generator.
      This should also allow other less conventional extensions in the future.
      
      This reduces some code duplication (especially visible in .changes)
      and make the situation for .gzip, .bz2, and .xz more straightforward.
      
      We also take the opportunity to remove the useless 'source' argument
      in Container.compare which was uncessarily complicating the code.
      e86d4449
  14. 20 Aug, 2015 1 commit
  15. 03 Aug, 2015 1 commit
    • Jérémy Bobbio's avatar
      Rename to diffoscope · 98ff014e
      Jérémy Bobbio authored
      debbindiff has grown way beyond a being just a tool to compare Debian packages.
      Let's rename it to better reflect this state of things.
      
      Kudos to Jocelyn Delalande for the name “diffoscope”!
      
      We introduce a new transitional binary package to keep too many things to
      break.
      98ff014e
  16. 29 Jul, 2015 1 commit
    • Jérémy Bobbio's avatar
      Massive rearchitecturing: make each file type have their own class · 5c02e000
      Jérémy Bobbio authored
      A good amount of the code for comparators is now based on classes
      instead of methods. Each file type gets its own classs.
      
      The base class, File, is an abstract class that can represent files
      on the filesystem but also files that can be extracted from an archive.
      This design makes room for future implementation of fuzzy-matching.
      
      Each file type class implements a class method recognizes() that will
      receives an unspecialized File instance. This is way more flexible than
      the old constrained regex table approach. The new identification method
      used for Haskell interfaces is a good illustration. Appropriate caching
      for calls to libmagic methods is there as they are still frequently used
      and tend to be rather slow.
      
      An unspecialized File object will then be typecasted into the class that
      recognized it. If that does not happen, binary comparison is implemented
      by the File class.
      
      Instead of redefining the compare() method which returns a single
      Difference or None, file type classes can implement compare_details()
      which returns an array of “inside” differences. An empty array means no
      differences were found.
      
      This new approach makes room to handle special file types better. As an
      example, device files can now be compared directly as their extraction
      from archives is problematic without root access.
      
      To reduce a good amount of boilerplate code, the Container and its
      subclass Archive has been introduced to represent anything that
      “contains” more file to be compared. While the API might still be
      improved, this already helped a good amount of code become more
      consistent. This will also make it pretty straightforward to implement
      parallel processing in a near future.
      
      Some archive formats (at least cpio and iso9660) were pretty annoying
      to work with. To get rid of some painful code, we now use
      libarchive—through the ctypes based wrapper libarchive-c—to handle these
      archives in a generic manner. One downside is that libarchive is very
      stream-oriented which is not really suited to our random-access model.
      We'll see how this impacts performance in the future.
      
      Other less crucial changes:
      
       - `find` is now used to compare directory listings.
       - The fallback code in case the `rpm` module cannot be found has been
         isolated to a `comparators.rpm_fallback` module.
       - Symlinks and devices are now compared in a consistent manner.
       - `md5sums` files in Debian packages are now only recognized when
         they are part of a Debian package.
       - Files in squashfs are now extracted one by one.
       - Text files with different encodings can be compared and this difference
         is recorded as well.
       - Test coverage is now at 92% for comparators.
      
      Sincere apologies for this unreviewable commit.
      5c02e000
  17. 27 Jun, 2015 1 commit
    • Jérémy Bobbio's avatar
      Comparators now return a single Difference · 2d362505
      Jérémy Bobbio authored
      The forest approach was often clumsy and ill-specified. Now
      comparators are expected to return a single Difference, or None.
      
      To make it easy for comparators who are producing details, a new decorator
      `returns_details` will create a wrapping Difference object for free. This
      was previously a side-effect of using the `binary_fallback` decorator.
      This new decorator will filter None from the list of differences,
      removing some boilerplate from the comparators.
      2d362505
  18. 24 Jun, 2015 1 commit
  19. 31 Mar, 2015 1 commit
  20. 30 Mar, 2015 1 commit
    • Jérémy Bobbio's avatar
      Refactor Difference constructor · e9d72ec4
      Jérémy Bobbio authored
      Difference() now takes an unified diff directly. Computing the diff is
      moved to a new static method from_content() which returns None when there are
      no differences.
      
      This paves the way for passing file descriptors to from_content() to avoid
      loading entire outputs in memory.
      e9d72ec4
  21. 27 Mar, 2015 1 commit
    • Jérémy Bobbio's avatar
      Perform content comparison when creating Difference objects · 3d5f0d7b
      Jérémy Bobbio authored
      Instead of storing the full content twice when creating Difference objects, we
      now directly run `diff` and store the unified diff. Large blocks in the diff
      are still trimmed. This results in huge memory savings and debbindiff can now
      happily compare changes for installation-guide.
      
      As tool_required() is not only for comparators anymore, we move it to
      debbindiff, together with logger.
      
      Text output becomes really straightforward as we just have to write what
      we've previously recorded.
      
      For the HTML output, we stop using vim and instead borrow code from
      diff2html.py found at <http://git.droids-corp.org/?p=diff2html.git>.
      
      Closes: #772029
      Closes: #779476
      3d5f0d7b
  22. 18 Mar, 2015 1 commit
    • Jérémy Bobbio's avatar
      Improve optional usage of external commands · ce420f6f
      Jérémy Bobbio authored
      The `tool_required` decorator now raises an exception when
      the command cannot be found. This enables more flexible handling.
      Associated Debian package is now suggested in the comment.
      
      The list of external tools is now available through the `--list-tools`
      command-line option. We also use this output to generate the Recommends field.
      ce420f6f
  23. 06 Mar, 2015 1 commit
  24. 06 Feb, 2015 1 commit
  25. 05 Feb, 2015 1 commit