...
 
Commits (180)
......@@ -3,6 +3,7 @@
/.pybuild
/*.eggs
/.cache
/.pytest_cache
/diffoscope.egg-info/
/dist/
/doc/diffoscope.1
......
before_script:
- apt-get -q update
- mount -o remount,rw /dev
- env DEBIAN_FRONTEND=noninteractive apt-get -q -y install --no-install-recommends aspcud apt-cudf
- env DEBIAN_FRONTEND=noninteractive apt-get -q -y --solver aspcud -o APT::Solver::Strict-Pinning=0 -o Debug::pkgProblemResolver=yes build-dep .
.test_template: &test
script:
- py.test-3 -vv -l -r a --cov=diffoscope --cov-report=term-missing
unstable:
<<: *test
image: debian:unstable
testing:
<<: *test
image: debian:testing
stable-bpo:
<<: *test
image: debian:stable-backports
ubuntu-devel:
<<: *test
image: ubuntu:devel
......@@ -2,44 +2,38 @@ Contributing
============
The preferred way to report bugs about diffoscope, as well as suggest fixes and
requests for improvements, is to submit reports to the Debian bug tracker for
the ``diffoscope`` package. You can do this over e-mail, simply write an email
as follows:
::
To: submit@bugs.debian.org
Subject: <subject>
Source: diffoscope
Version: <version>
Severity: <grave|serious|important|normal|minor|wishlist>
There are `more detailed instructions available
<https://www.debian.org/Bugs/Reporting>`__ about reporting a bug in the Debian bug tracker.
If you're on a Debian-based system, you can install and use the ``reportbug``
package to help walk you through the process.
You can also submit patches to the Debian bug tracker. Start by cloning the `Git
repository <https://salsa.debian.org/reproducible-builds/diffoscope.git/>`__,
make your changes and commit them as you normally would. You can then use
Git's ``format-patch`` command to save your changes as a series of patches that
can be attached to the report you submit. For example:
::
git clone https://salsa.debian.org/reproducible-builds/diffoscope.git
cd diffoscope
git checkout origin/master -b <topicname>
# <edits>
git commit -a
git format-patch -M origin/master
The ``format-patch`` command will create a series of ``.patch`` files in your
checkout. Attach these files to your submission in your e-mail client or
reportbug.
requests for improvements, is to submit reports to the issue tracker at
https://salsa.debian.org/reproducible-builds/diffoscope/issues
You can also submit patches via *merge request* to Salsa, Debian's Gitlab. Start
by forking the `diffoscope Git
repository <https://salsa.debian.org/reproducible-builds/diffoscope>`__
(see
`documentation <https://salsa.debian.org/help/gitlab-basics/fork-project.md>`__),
make your changes and commit them as you normally would. You can then push your
changes and submit a *merge request* via Salsa. See `Gitlab documentation
<https://salsa.debian.org/help/gitlab-basics/add-merge-request.md>`__ about
*merge requests*.
You can also submit bugs about Debian specific issues to the Debian bug tracker.
Add a comparator
================
Diffoscope doesn't support a specific file type? Please contribute to the
project! Each file type is handled by a comparator, and writing a new one is
usually very easy.
Here are the steps to add a new comparator:
- Add the new comparator in ``diffoscope/comparators/`` (have a look at the
other comparators in the same directory to have an idea of what to do)
- Declare the comparator File class in ``ComparatorManager`` in
``diffoscope/comparators/__init__.py``
- Add a test in ``tests/comparators/``
- If required, update the ``Build-Depends`` list in ``debian/control``
- If required, update the ``EXTERNAL_TOOLS`` list in
``diffoscope/external_tools.py``
Uploading the package
=====================
......
This diff is collapsed.
......@@ -2,3 +2,4 @@ debian/diffoscope.1
debian/diffoscope.bash-completion
diffoscope.egg-info/
.cache/
.pytest_cache/
......@@ -10,13 +10,13 @@ Uploaders:
Ximin Luo <infinity0@debian.org>,
Build-Depends:
abootimg <!nocheck>,
apktool <!nocheck>,
apktool [!ppc64el !s390x] <!nocheck>,
bash-completion,
binutils-multiarch <!nocheck>,
caca-utils <!nocheck>,
colord <!nocheck>,
db-util <!nocheck>,
debhelper (>= 11~),
debhelper-compat (= 11),
default-jdk-headless <!nocheck> | default-jdk <!nocheck>,
device-tree-compiler (>= 1.4.2) <!nocheck>,
dh-python (>= 2.20160818~),
......@@ -26,7 +26,7 @@ Build-Depends:
enjarify <!nocheck>,
flake8 <!nocheck>,
fontforge-extras <!nocheck>,
fp-utils <!nocheck>,
fp-utils [!ppc64el !s390x] <!nocheck>,
ghc <!nocheck>,
ghostscript <!nocheck>,
giflib-tools <!nocheck>,
......@@ -40,14 +40,18 @@ Build-Depends:
libjs-jquery-isonscreen <!nocheck>,
libjs-jquery-tablesorter <!nocheck>,
libjs-jquery-throttle-debounce <!nocheck>,
linux-image-amd64 [amd64] <!nocheck> | linux-image-generic [amd64] <!nocheck>,
llvm <!nocheck>,
lz4 <!nocheck> | liblz4-tool <!nocheck>,
mono-utils <!nocheck>,
mplayer <!nocheck>,
ocaml-nox <!nocheck>,
odt2txt <!nocheck>,
oggvideotools <!nocheck>,
# oggvideotools [!s390x] <!nocheck>,
openssh-client <!nocheck>,
pdftk <!nocheck>,
pgpdump <!nocheck>,
poppler-utils <!nocheck>,
# procyon-decompiler <!nocheck>,
python-argcomplete,
python3-all,
python3-binwalk <!nocheck>,
......@@ -59,6 +63,7 @@ Build-Depends:
python3-libarchive-c,
python3-magic,
python3-progressbar <!nocheck>,
python3-pypdf2 <!nocheck>,
python3-pytest <!nocheck>,
python3-pytest-cov <!nocheck>,
python3-pyxattr <!nocheck>,
......@@ -74,7 +79,8 @@ Build-Depends:
unzip <!nocheck>,
xmlbeans <!nocheck>,
xxd <!nocheck> | vim-common <!nocheck>,
Standards-Version: 4.1.4
Build-Conflicts: graphicsmagick-imagemagick-compat
Standards-Version: 4.3.0
Rules-Requires-Root: no
Homepage: https://diffoscope.org
Vcs-Git: https://salsa.debian.org/reproducible-builds/diffoscope.git
......@@ -84,10 +90,6 @@ Package: diffoscope
Architecture: all
Suggests:
libjs-jquery,
Breaks:
debbindiff (<< 29),
Replaces:
debbindiff (<< 29),
Depends:
python3-distutils | libpython3.5-stdlib | libpython3.6-stdlib (<< 3.6.5~rc1-2),
python3-pkg-resources,
......
......@@ -44,24 +44,11 @@ override_dh_auto_build: debian/diffoscope.bash-completion
dh_auto_build -O--buildsystem=pybuild
dh_auto_build -O--buildsystem=makefile -Ddoc
override_dh_auto_clean:
dh_auto_clean -O--buildsystem=pybuild
dh_auto_clean -O--buildsystem=makefile -Ddoc
find -type d -name '__pycache__' -empty -delete
override_dh_python3:
dh_python3 -p diffoscope \
--depends=distro \
--recommends=argcomplete \
--recommends=binwalk \
--recommends=defusedxml \
--recommends=guestfs \
--recommends=jsondiff \
--recommends=progressbar \
--recommends=python-debian \
--recommends=pyxattr \
--recommends=rpm-python \
--recommends=tlsh \
--depends-section=distro_detection \
--recommends-section=cmdline \
--recommends-section=comparators \
override_dh_gencontrol:
bin/diffoscope --list-debian-substvars >> debian/diffoscope.substvars
......@@ -76,3 +63,35 @@ diffoscope/presenters/icon.py: favicon.png
favicon.png: logo.svg
inkscape -w 32 -h 32 -e $@ $<
override_dh_auto_clean:
@echo "Generating the debian/tests/control file..."
@echo "# DON'T MANUALLY MODIFY!" > debian/tests/control.tmp
@echo "# EDIT debian/tests/control.in INSTEAD!" >> debian/tests/control.tmp
@echo "#" >> debian/tests/control.tmp
@cat debian/tests/control.in >> debian/tests/control.tmp
@sed -i "s#%RECOMMENDS%#$(shell bin/diffoscope --list-debian-substvars | cut -d= -f2)#" debian/tests/control.tmp
@sed -i "s#%PYRECOMMENDS%#$(shell python3 -c "import distutils.core; \
setup = distutils.core.run_setup('setup.py'); \
print(', '.join(sorted(['python3-{}'.format(x) for y in setup.extras_require.values() for x in y])))" \
)#" debian/tests/control.tmp
@sed -i "s,python3-python-debian,python3-debian," debian/tests/control.tmp
@sed -i "s,python3-rpm-python,python3-rpm," debian/tests/control.tmp
@sed -i "s,apktool,apktool [!ppc64el !s390x]," debian/tests/control.tmp
@sed -i "s,fp-utils,fp-utils [!ppc64el !s390x]," debian/tests/control.tmp
#@sed -i "s,oggvideotools,oggvideotools [!s390x]," debian/tests/control.tmp
@sed -i "s/oggvideotools, //" debian/tests/control.tmp
@sed -i "s/procyon-decompiler, //" debian/tests/control.tmp
@set -e ; if ! diff -q debian/tests/control debian/tests/control.tmp ; then \
echo ;\
echo "The generated control file differs from the actual one." ;\
echo "A sourceful upload of this package is needed." ;\
echo ;\
echo "Differences:" ;\
diff -u debian/tests/control debian/tests/control.tmp ;\
else \
rm debian/tests/control.tmp ;\
fi
dh_auto_clean -O--buildsystem=pybuild
dh_auto_clean -O--buildsystem=makefile -Ddoc
find -type d -name '__pycache__' -empty -delete
# This will mainly be used to double check that what we upload to Debian
# is also in our "archive".
debian-watch-file-in-native-package
# We like to share our own keys for others to check the released signature.
public-upstream-key-in-native-package
Tests: pytest
Depends: diffoscope, python3-pytest
Restrictions: needs-recommends
# DON'T MANUALLY MODIFY!
# EDIT debian/tests/control.in INSTEAD!
#
# To regenerate:
#
# $ debian/rules clean
# $ mv debian/tests/control.tmp debian/tests/control
Tests: pytest-with-recommends
Depends: diffoscope, python3-pytest, file, linux-image-amd64 [amd64] | linux-image-generic [amd64], abootimg, acl, apktool [!ppc64el !s390x], binutils-multiarch, bzip2, caca-utils, colord, db-util, default-jdk-headless | default-jdk | java-sdk, device-tree-compiler, docx2txt, e2fsprogs, enjarify, ffmpeg, fontforge-extras, fp-utils [!ppc64el !s390x], genisoimage, gettext, ghc, ghostscript, giflib-tools, gnumeric, gnupg, imagemagick, jsbeautifier, libarchive-tools, llvm, lz4 | liblz4-tool, mono-utils, ocaml-nox, odt2txt, openssh-client, pgpdump, poppler-utils, r-base-core, rpm2cpio, sng, sqlite3, squashfs-tools, tcpdump, unzip, xmlbeans, xxd | vim-common, xz-utils, zip, python3-argcomplete, python3-binwalk, python3-defusedxml, python3-distro, python3-guestfs, python3-jsondiff, python3-progressbar, python3-pypdf2, python3-debian, python3-pyxattr, python3-rpm, python3-tlsh
Test-Command: debian/tests/pytest
Depends: diffoscope, python3-pytest
# but without Recommends
Tests: pytest
Depends: diffoscope, python3-pytest, file
Tests: basic-command-line
Depends: diffoscope
......
# To regenerate:
#
# $ debian/rules clean
# $ mv debian/tests/control.tmp debian/tests/control
Tests: pytest-with-recommends
Depends: diffoscope, python3-pytest, file, linux-image-amd64 [amd64] | linux-image-generic [amd64], %RECOMMENDS%, %PYRECOMMENDS%
Tests: pytest
Depends: diffoscope, python3-pytest, file
Tests: basic-command-line
Depends: diffoscope
Restrictions: allow-stderr
# without Recommends
pytest
\ No newline at end of file
This diff is collapsed.
......@@ -17,4 +17,4 @@
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
VERSION = "95"
VERSION = "111"
......@@ -257,7 +257,6 @@ class Changes(object):
self.get_changes_file()],
shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
gpg_output, gpg_output_stderr = pipe.communicate()
print(gpg_output)
if pipe.returncode != 0:
raise ChangesFileException(
......@@ -265,7 +264,6 @@ class Changes(object):
# contains verbose human readable GPG information
gpg_output_stderr = str(gpg_output_stderr, encoding='utf8')
print(gpg_output_stderr)
gpg_output = gpg_output.decode(encoding='UTF-8')
......
......@@ -19,8 +19,13 @@
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
import sys
import logging
import importlib
import traceback
from ..logging import line_eraser
logger = logging.getLogger(__name__)
......@@ -53,17 +58,20 @@ class ComparatorManager(object):
('elf.StaticLibFile',),
('llvm.LlvmBitCodeFile',),
('sqlite.Sqlite3Database',),
('wasm.WasmFile',),
('fonts.TtfFile',),
('fontconfig.FontconfigCacheFile',),
('gettext.MoFile',),
('ipk.IpkFile',),
('rust.RustObjectFile',),
('ffprobe.FfprobeFile',),
('gnumeric.GnumericFile',),
('gzip.GzipFile',),
('haskell.HiFile',),
('icc.IccFile',),
('iso9660.Iso9660File',),
('java.ClassFile',),
('lz4.Lz4File',),
('mono.MonoExeFile',),
('pdf.PdfFile',),
('png.PngFile',),
......@@ -77,9 +85,10 @@ class ComparatorManager(object):
('xz.XzFile',),
('apk.ApkFile',),
('odt.OdtFile',),
('ocaml.OcamlInterfaceFile',),
('docx.DocxFile',),
('zip.ZipFile',),
('zip.MozillaZipFile',),
('zip.ZipFile',),
('image.JPEGImageFile',),
('image.ICOImageFile',),
('cbfs.CbfsFile',),
......@@ -107,6 +116,7 @@ class ComparatorManager(object):
self.classes = []
for xs in self.COMPARATORS:
errors = []
for x in xs:
package, klass_name = x.rsplit('.', 1)
......@@ -114,16 +124,22 @@ class ComparatorManager(object):
mod = importlib.import_module(
'diffoscope.comparators.{}'.format(package)
)
except ImportError:
except ImportError as e:
errors.append((x, e))
continue
self.classes.append(getattr(mod, klass_name))
break
else: # noqa
raise ImportError("Could not import {}{}".format(
"any of" if len(xs) > 1 else '',
logger.error("Could not import {}{}".format(
"any of " if len(xs) > 1 else '',
', '.join(xs)
))
for x in errors:
logger.error("Original error for %s:", x[0])
sys.stderr.buffer.write(line_eraser())
traceback.print_exception(None, x[1], x[1].__traceback__)
sys.exit(2)
logger.debug("Loaded %d comparator classes", len(self.classes))
......
......@@ -36,6 +36,14 @@ try:
import binwalk
except ImportError:
binwalk = None
else:
# Disable binwalk's own user configuration for predictable results and to
# ensure it does not create (!) unnecessary directories, etc. (re. #903444)
def fn(self):
if not hasattr(fn, '_temp_dir'):
fn._temp_dir = get_temporary_directory('binwalk').name
return fn._temp_dir
binwalk.core.settings.Settings._get_user_config_dir = fn
logger = logging.getLogger(__name__)
......
......@@ -91,6 +91,11 @@ class DebContainer(LibarchiveContainer):
def perform_fuzzy_matching(self, my_members, other_members):
matched = set()
# Create local copies because they will be modified by consumer
my_members = dict(my_members)
other_members = dict(other_members)
for name1 in my_members.keys():
main, ext = os.path.splitext(name1)
candidates = [name2 for name2 in other_members.keys() - matched
......@@ -110,7 +115,7 @@ class DebFile(File):
control_tar = self.as_container.control_tar
md5sums_file = control_tar.as_container.lookup_file(
'./md5sums') if control_tar else None
if md5sums_file:
if isinstance(md5sums_file, Md5sumsFile):
self._md5sums = md5sums_file.parse()
else:
logger.debug("Unable to find a md5sums file")
......@@ -201,6 +206,6 @@ class DebDataTarFile(File):
isinstance(file.container.source.container.source, DebFile)
def compare_details(self, other, source=None):
return [Difference.from_text_readers(list_libarchive(self.path),
list_libarchive(other.path),
return [Difference.from_text_readers(list_libarchive(self.path, ignore_errors=True),
list_libarchive(other.path, ignore_errors=True),
self.path, other.path, source="file list")]
......@@ -158,7 +158,7 @@ def compare_meta(path1, path2):
try:
differences.append(Difference.from_command(Getfacl, path1, path2))
except RequiredToolNotFound:
logger.warning(
logger.info(
"Unable to find 'getfacl', some directory metadata differences might not be noticed.")
try:
lsattr1 = lsattr(path1)
......
......@@ -25,6 +25,7 @@ import collections
from diffoscope.exc import OutputParsingError
from diffoscope.tools import get_tool_name, tool_required
from diffoscope.config import Config
from diffoscope.tempfiles import get_named_temporary_file
from diffoscope.difference import Difference
......@@ -234,7 +235,7 @@ class ObjdumpDisassembleSection(ObjdumpSection):
# disassembled instructions.
# objdump can get the debugging information from the elf or from the
# stripped symbols file specified in the .gnu_debuglink section
return ['--line-numbers', '--disassemble', '--demangle']
return ['--line-numbers', '--disassemble', '--demangle', '--reloc']
def filter(self, line):
line = super().filter(line)
......@@ -417,8 +418,12 @@ class ElfContainer(Container):
super().__init__(*args, **kwargs)
logger.debug("Creating ElfContainer for %s", self.source.path)
cmd = [get_tool_name('readelf'), '--wide',
'--section-headers', self.source.path]
cmd = [
get_tool_name('readelf'),
'--wide',
'--section-headers',
self.source.path,
]
output = subprocess.check_output(
cmd, shell=False, stderr=subprocess.DEVNULL)
has_debug_symbols = False
......@@ -480,6 +485,15 @@ class ElfContainer(Container):
if not isinstance(deb, DebFile) or not deb.container:
return
# If the .deb in question is the top-level of the source we have passed
# a .deb directly to diffoscope (versus finding one specified in a
# .changes or .buildinfo file). In this case, don't automatically
# search for a -dbgsym file unless the user specified
# `Config().use_dbgsym`.
if not hasattr(deb.container.source, 'container') and \
not Config().use_dbgsym:
return
# Retrieve the Build ID for the ELF file we are examining
build_id = get_build_id(self.source.path)
debuglink = get_debug_link(self.source.path)
......@@ -581,6 +595,8 @@ class StaticLibFile(File):
FILE_TYPE_RE = re.compile(r'\bar archive\b')
FILE_EXTENSION_SUFFIX = '.a'
ENABLE_FALLBACK_RECOGONIZES = False
def compare_details(self, other, source=None):
differences = [Difference.from_text_readers(
list_libarchive(self.path),
......
# -*- coding: utf-8 -*-
#
# diffoscope: in-depth comparison of files, archives, and directories
#
# Copyright © 2019 Chris Lamb <lamby@debian.org>
#
# diffoscope is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# diffoscope is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
import re
from diffoscope.tools import tool_required
from diffoscope.difference import Difference
from .utils.file import File
from .utils.command import Command
class Ffprobe(Command):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.flag = False
def start(self):
super().start()
self.stderr = ''
@property
def stdout(self):
return self._process.stderr.splitlines(True)
@tool_required('ffprobe')
def cmdline(self):
return ('ffprobe', self.path)
def filter(self, line):
if self.flag:
return line
elif line == b' Metadata:\n':
self.flag = True
return b''
class FfprobeFile(File):
DESCRIPTION = "Multimedia metadata"
FILE_TYPE_RE = re.compile(r'^Audio file')
def compare_details(self, other, source=None):
return [Difference.from_command(
Ffprobe,
self.path,
other.path,
source='ffprobe',
)]
......@@ -22,6 +22,7 @@ import logging
import os.path
from diffoscope.difference import Difference
from diffoscope.exc import ContainerExtractionError
from .utils.file import File
from .utils.archive import Archive
......@@ -66,21 +67,35 @@ class FsImageContainer(Archive):
self.g.close()
def get_member_names(self):
if not guestfs:
return []
return [os.path.basename(self.source.path) + '.tar']
def extract(self, member_name, dest_dir):
dest_path = os.path.join(dest_dir, member_name)
logger.debug('filesystem image extracting to %s', dest_path)
self.g.tar_out('/', dest_path)
return dest_path
class FsImageFile(File):
DESCRIPTION = "ext2/ext3/ext4/btrfs filesystems"
DESCRIPTION = "ext2/ext3/ext4/btrfs/fat filesystems"
CONTAINER_CLASS = FsImageContainer
FILE_TYPE_RE = re.compile(r'^(Linux.*filesystem data|BTRFS Filesystem).*')
@classmethod
def recognizes(cls, file):
# Avoid DOS / MBR file type as it generate a lot of false positives,
# manually check "System identifier string" instead
with open(file.path, 'rb') as f:
f.seek(54)
if f.read(8) in (b'FAT12 ', b'FAT16 '):
return True
f.seek(82)
if f.read(8) == b'FAT32 ':
return True
return super().recognizes(file)
def compare_details(self, other, source=None):
differences = []
my_fs = ''
......
......@@ -61,4 +61,4 @@ class GnumericFile(File):
))
with open(t.name) as f:
return f.read()
return f.read().strip()
......@@ -37,9 +37,9 @@ class ProcyonDecompiler(Command):
super().__init__(path, *args, **kwargs)
self.real_path = os.path.realpath(path)
@tool_required('procyon-decompiler')
@tool_required('procyon')
def cmdline(self):
return ['procyon-decompiler', '-ec', self.path]
return ['procyon', '-ec', self.path]
def filter(self, line):
if re.match(r'^(//)', line.decode('utf-8')):
......@@ -80,14 +80,17 @@ class ClassFile(File):
decompilers = [ProcyonDecompiler, Javap]
def compare_details(self, other, source=None):
diff = None
diff = []
for decompiler in self.decompilers:
try:
diff = [
Difference.from_command(decompiler, self.path, other.path)
]
if diff:
single_diff = Difference.from_command(
decompiler,
self.path,
other.path
)
if single_diff:
diff.append(single_diff)
break
except RequiredToolNotFound:
logger.debug("Unable to find %s. Falling back...",
......
......@@ -35,18 +35,12 @@ class JSONFile(File):
@classmethod
def recognizes(cls, file):
with open(file.path, 'rb') as f:
# Try fuzzy matching for JSON files
is_text = any(
file.magic_file_type.startswith(x)
for x in ('ASCII text', 'UTF-8 Unicode text'),
)
if is_text and not file.name.endswith('.json'):
buf = f.read(10)
if not any(x in buf for x in b'{['):
return False
f.seek(0)
# Try fuzzy matching for files not called .json
if not file.name.endswith('.json'):
if b'{' not in file.file_header or b'[' not in file.file_header:
return False
with open(file.path, 'rb') as f:
try:
file.parsed = json.loads(
f.read().decode('utf-8', errors='ignore'),
......@@ -66,18 +60,7 @@ class JSONFile(File):
)
if difference:
if jsondiff is not None:
a = getattr(self, 'parsed', {})
b = getattr(other, 'parsed', {})
diff = {repr(x): y for x, y in jsondiff.diff(a, b).items()}
difference.add_comment("Similarity: {}%".format(
jsondiff.similarity(a, b),
))
difference.add_comment("Differences: {}".format(
json.dumps(diff, indent=2, sort_keys=True),
))
self.compare_with_jsondiff(difference, other)
return [difference]
......@@ -91,6 +74,26 @@ class JSONFile(File):
return [difference]
def compare_with_jsondiff(self, difference, other):
if jsondiff is None:
return
a = getattr(self, 'parsed', {})
b = getattr(other, 'parsed', {})
try:
diff = {repr(x): y for x, y in jsondiff.diff(a, b).items()}
except Exception:
return
difference.add_comment("Similarity: {}%".format(
jsondiff.similarity(a, b),
))
difference.add_comment("Differences: {}".format(
json.dumps(diff, indent=2, sort_keys=True),
))
@staticmethod
def dumps(file, sort_keys=True):
if not hasattr(file, 'parsed'):
......
......@@ -32,6 +32,11 @@ class LlvmBcAnalyzer(Command):
def cmdline(self):
return ['llvm-bcanalyzer', '-dump', self.path]
def filter(self, line):
if line.decode('utf-8', 'ignore').startswith('Summary of '):
return b'Summary:'
return line
class LlvmBcDisassembler(Command):
@tool_required('llvm-dis')
......
# -*- coding: utf-8 -*-
#
# diffoscope: in-depth comparison of files, archives, and directories
#
# Copyright © 2018 Xavier Briand <xavierbriand@gmail.com>
#
# diffoscope is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# diffoscope is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
import re
import os.path
import logging
import subprocess
from diffoscope.tools import tool_required
from .utils.file import File
from .utils.archive import Archive
logger = logging.getLogger(__name__)
class Lz4Container(Archive):
def open_archive(self):
return self
def close_archive(self):
pass
def get_member_names(self):
return [self.get_compressed_content_name('.lz4')]
@tool_required('lz4')
def extract(self, member_name, dest_dir):
dest_path = os.path.join(dest_dir, member_name)
logger.debug('lz4 extracting to %s', dest_path)
with open(dest_path, 'wb') as fp:
subprocess.check_call(
["lz4", "-d", "-c", self.source.path],
shell=False, stdout=fp, stderr=None)
return dest_path
class Lz4File(File):
DESCRIPTION = "LZ4 compressed files"
CONTAINER_CLASS = Lz4Container
FILE_TYPE_RE = re.compile(r'^LZ4 compressed data \([^\)]+\)$')
# Work around file(1) Debian bug #876316
FALLBACK_FILE_EXTENSION_SUFFIX = ".lz4"
FALLBACK_FILE_TYPE_HEADER_PREFIX = b"\x04\x22M\x18"
......@@ -44,7 +44,7 @@ class Otool(Command):
def filter(self, line):
# Strip filename
prefix = '{}:'.format(self._path)
if line.decode('utf-8', 'ignore').index(prefix) == 0:
if line.decode('utf-8', 'ignore').startswith(prefix):
return line[len(prefix):].strip()
return line
......
# -*- coding: utf-8 -*-
#
# diffoscope: in-depth comparison of files, archives, and directories
#
# Copyright © 2018 Chris Lamb <lamby@debian.org>
#
# diffoscope is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# diffoscope is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
import re
from diffoscope.tools import tool_required
from diffoscope.difference import Difference
from .utils.file import File
from .utils.command import Command
class Ocamlobjinfo(Command):
@tool_required('ocamlobjinfo')
def cmdline(self):
return ('ocamlobjinfo', self.path)
def filter(self, line):
val = line.decode('utf-8')
if val.startswith('File '):
return b''
return line
class OcamlInterfaceFile(File):
DESCRIPTION = "OCaml interface files"
FILE_TYPE_RE = re.compile(r'^OCaml interface file ')
def compare_details(self, other, source=None):
return [Difference.from_command(
Ocamlobjinfo,
self.path,
other.path,
source="ocamlobjinfo",
)]
......@@ -40,7 +40,7 @@ class Tcpdump(Command):
class PcapFile(File):
DESCRIPTION = "tcpdump capture files (.pcap)"
FILE_TYPE_RE = re.compile(r'^tcpdump capture file\b')
FILE_TYPE_RE = re.compile(r'^(tcpdump|pcap) capture file\b')
def compare_details(self, other, source=None):
return [Difference.from_command(
......
......@@ -25,6 +25,11 @@ from diffoscope.difference import Difference
from .utils.file import File
from .utils.command import Command
try:
import PyPDF2
except ImportError: # noqa
PyPDF2 = None
class Pdftotext(Command):
@tool_required('pdftotext')
......@@ -32,19 +37,38 @@ class Pdftotext(Command):
return ['pdftotext', self.path, '-']
class Pdftk(Command):
@tool_required('pdftk')
def cmdline(self):
return ['pdftk', self.path, 'output', '-', 'uncompress']
def filter(self, line):
return line.decode('latin-1').encode('utf-8')
class PdfFile(File):
DESCRIPTION = "PDF documents"
FILE_TYPE_RE = re.compile(r'^PDF document\b')
def compare_details(self, other, source=None):
return [Difference.from_command(Pdftotext, self.path, other.path),
Difference.from_command(Pdftk, self.path, other.path)]
xs = []
if PyPDF2 is not None:
difference = Difference.from_text(
self.dump_pypdf2_metadata(self),
self.dump_pypdf2_metadata(other),
self.path,
other.path,
)
if difference:
difference.add_comment("Document info")
xs.append(difference)
xs.append(Difference.from_command(Pdftotext, self.path, other.path))
return xs
@staticmethod
def dump_pypdf2_metadata(file):
try:
pdf = PyPDF2.PdfFileReader(file.path)
document_info = pdf.getDocumentInfo()
except PyPDF2.utils.PdfReadError as exc:
return "(Could not extract metadata: {})".format(exc)
xs = []
for k, v in sorted(document_info.items()):
xs.append("{}: {!r}".format(k.lstrip('/'), v))
return "\n".join(xs)
......@@ -65,11 +65,12 @@ class PpuFile(File):
def recognizes(cls, file):
if not super().recognizes(file):
return False
with open(file.path, 'rb') as f:
magic = f.read(3)
if magic != b"PPU":
return False
ppu_version = f.read(3).decode('ascii', errors='ignore')
if not file.file_header.startswith(b'PPU'):
return False
ppu_version = file.file_header[3:6].decode('ascii', errors='ignore')
if not hasattr(PpuFile, 'ppu_version'):
try:
with profile('command', 'ppudump'):
......
......@@ -23,7 +23,7 @@ from .utils.file import File
class AbstractRpmFile(File):
FILE_TYPE_RE = re.compile('^RPM\s')
FILE_TYPE_RE = re.compile(r'^RPM\s')
class RpmFile(AbstractRpmFile):
......
......@@ -157,7 +157,9 @@ class Container(object, metaclass=abc.ABCMeta):
for my_name, other_name, score in self.perform_fuzzy_matching(my_members, other_members):
comment = "Files similar despite different names" \
" (difference score: {})".format(score)
" (score: {}, lower is more similar)".format(score)
if score == 0:
comment = "Files identical despite different names"
yield prep_yield(my_name, other_name, comment)
if Config().new_file:
......
......@@ -154,6 +154,7 @@ class File(object, metaclass=abc.ABCMeta):
return _run_tests(all, all_tests) if all_tests else False
ENABLE_FALLBACK_RECOGONIZES = True
FALLBACK_FILE_EXTENSION_SUFFIX = None
FALLBACK_FILE_TYPE_HEADER_PREFIX = None
......@@ -176,6 +177,9 @@ class File(object, metaclass=abc.ABCMeta):
# not valid, they have to re-implement it
return False
if not cls.ENABLE_FALLBACK_RECOGONIZES:
return False
all_tests = [test for test in (
(cls.FALLBACK_FILE_EXTENSION_SUFFIX,
str.endswith, file.name),
......@@ -373,7 +377,8 @@ class File(object, metaclass=abc.ABCMeta):
if self.magic_file_type != 'data' else ''
difference.add_comment(
"Format-specific differences are supported for this "
"file format but none were detected{}".format(suffix))
"file format, but no file-specific differences were "
"detected. Falling back to a binary diff.{}".format(suffix))
except subprocess.CalledProcessError as e:
difference = self.compare_bytes(other, source=source)
if e.output:
......
......@@ -34,7 +34,7 @@ def perform_fuzzy_matching(members1, members2):
if tlsh is None or Config().fuzzy_threshold == 0:
return
already_compared = set()
# Perform local copies because they will be modified by consumer
# Create local copies because they will be modified by consumer
members1 = dict(members1)
members2 = dict(members2)
for name1, (file1, _) in members1.items():
......
......@@ -80,6 +80,14 @@ if not hasattr(libarchive.ffi, 'entry_gname'):
'entry_gname', [libarchive.ffi.c_archive_entry_p], ctypes.c_char_p)
libarchive.ArchiveEntry.gname = property(
lambda self: libarchive.ffi.entry_gname(self._entry_p))
# Monkeypatch libarchive-c (>= 2.8)
# Wire mtime_nsec attribute as some libarchive versions (>=2.8) don't expose it
# for ArchiveEntry. Doing this allows a unified API no matter which version is
# available.
if not hasattr(libarchive.ArchiveEntry, 'mtime_nsec') and hasattr(libarchive.ffi, 'entry_mtime_nsec'):
libarchive.ArchiveEntry.mtime_nsec = property(
lambda self: libarchive.ffi.entry_mtime_nsec(self._entry_p))
# Monkeypatch libarchive-c so we always get pathname as (Unicode) str
# Otherwise, we'll get sometimes str and sometimes bytes and always pain.
......@@ -87,32 +95,36 @@ libarchive.ArchiveEntry.pathname = property(lambda self: libarchive.ffi.entry_pa
self._entry_p).decode('utf-8', errors='surrogateescape'))
def list_libarchive(path):
with libarchive.file_reader(path) as archive:
for entry in archive:
if entry.isblk or entry.ischr:
size_or_dev = '{major:>3},{minor:>3}'.format(
major=entry.rdevmajor, minor=entry.rdevminor)
else:
size_or_dev = entry.size
mtime = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(entry.mtime)
) + '.{:06d}'.format(entry.mtime_nsec // 1000)
if entry.issym:
name_and_link = '{entry.name} -> {entry.linkname}'.format(
entry=entry)
else:
name_and_link = entry.name
if entry.uname:
user = '{user:<8} {uid:>7}'.format(user=entry.uname.decode(
'utf-8', errors='surrogateescape'), uid='({})'.format(entry.uid))
else:
user = entry.uid
if entry.gname:
group = '{group:<8} {gid:>7}'.format(group=entry.gname.decode(
'utf-8', errors='surrogateescape'), gid='({})'.format(entry.gid))
else:
group = entry.gid
yield '{strmode} {entry.nlink:>3} {user:>8} {group:>8} {size_or_dev:>8} {mtime:>8} {name_and_link}\n'.format(strmode=entry.strmode.decode('us-ascii'), entry=entry, user=user, group=group, size_or_dev=size_or_dev, mtime=mtime, name_and_link=name_and_link)
def list_libarchive(path, ignore_errors=False):
try:
with libarchive.file_reader(path) as archive:
for entry in archive:
if entry.isblk or entry.ischr:
size_or_dev = '{major:>3},{minor:>3}'.format(
major=entry.rdevmajor, minor=entry.rdevminor)
else:
size_or_dev = entry.size
mtime = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(entry.mtime)
) + '.{:06d}'.format(entry.mtime_nsec // 1000)
if entry.issym:
name_and_link = '{entry.name} -> {entry.linkname}'.format(
entry=entry)
else:
name_and_link = entry.name
if entry.uname:
user = '{user:<8} {uid:>7}'.format(user=entry.uname.decode(
'utf-8', errors='surrogateescape'), uid='({})'.format(entry.uid))
else:
user = entry.uid
if entry.gname:
group = '{group:<8} {gid:>7}'.format(group=entry.gname.decode(
'utf-8', errors='surrogateescape'), gid='({})'.format(entry.gid))
else:
group = entry.gid
yield '{strmode} {entry.nlink:>3} {user:>8} {group:>8} {size_or_dev:>8} {mtime:>8} {name_and_link}\n'.format(strmode=entry.strmode.decode('us-ascii'), entry=entry, user=user, group=group, size_or_dev=size_or_dev, mtime=mtime, name_and_link=name_and_link)
except libarchive.exception.ArchiveError:
if not ignore_errors:
raise
class LibarchiveMember(ArchiveMember):
......@@ -203,11 +215,14 @@ class LibarchiveContainer(Archive):
raise KeyError('%s not found in archive', member_name)
def get_filtered_members(self):
with libarchive.file_reader(self.source.path) as archive:
for entry in archive:
if any_excluded(entry.pathname):
continue
yield entry.pathname, self.get_subclass(entry)
try:
with libarchive.file_reader(self.source.path) as archive:
for entry in archive:
if any_excluded(entry.pathname):
continue
yield entry.pathname, self.get_subclass(entry)
except libarchive.exception.ArchiveError:
pass
def extract(self, member_name, dest_dir):
self.ensure_unpacked()
......
# -*- coding: utf-8 -*-
#
# diffoscope: in-depth comparison of files, archives, and directories
#
# Copyright © 2018 Joachim Breitner <nomeata@debian.org>
#
# diffoscope is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# diffoscope is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
from diffoscope.tools import tool_required
from diffoscope.difference import Difference
from .utils.file import File
from .utils.command import Command
WASM_MAGIC = b"\x00asm"
class Wasm2Wat(Command):
@tool_required('wasm2wat')
def cmdline(self):
return ['wasm2wat', '--no-check', self.path]
class WasmFile(File):
DESCRIPTION = "WebAssembly binary module"
FILE_EXTENSION_SUFFIX = '.wasm'
@classmethod
def recognizes(cls, file):
if not super().recognizes(file):
return False
return file.file_header.startswith(WASM_MAGIC)
def compare_details(self, other, source=None):
return [Difference.from_command(Wasm2Wat, self.path, other.path)]
......@@ -25,6 +25,7 @@ import zipfile
from diffoscope.tools import tool_required
from diffoscope.difference import Difference
from diffoscope.exc import ContainerExtractionError
from .utils.file import File
from .directory import Directory
......@@ -56,6 +57,43 @@ class ZipinfoVerbose(Zipinfo):
return ['zipinfo', '-v', self.path]
class Zipnote(Command):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.flag = False
@tool_required('zipnote')
def cmdline(self):
return ['zipnote', self.path]
def filter(self, line):
"""
Example output from zipnote(1):
@ foo
hello
@ (comment above this line)
@ (zip file comment below this line)
goodbye
"""
if line == b'@ (zip file comment below this line)\n':
self.flag = True
return b'Zip file comment: '
if line == b'@ (comment above this line)\n':
self.flag = False
return b'\n\n' # spacer
if line.startswith(b'@ '):
filename = line[2:-1].decode()
self.flag = True
return "Filename: {}\nComment: ".format(filename).encode()
return line[:-1] if self.flag else b''
class BsdtarVerbose(Command):
@tool_required('bsdtar')
def cmdline(self):
......@@ -98,9 +136,18 @@ class ZipContainer(Archive):
# any weird character so we can get to the bytes.
targetpath = os.path.join(dest_dir, os.path.basename(member_name)).encode(
sys.getfilesystemencoding(), errors='replace')
with self.archive.open(member_name) as source, open(targetpath, 'wb') as target:
shutil.copyfileobj(source, target)
return targetpath.decode(sys.getfilesystemencoding())
try:
with self.archive.open(member_name) as source, \
open(targetpath, 'wb') as target:
shutil.copyfileobj(source, target)
return targetpath.decode(sys.getfilesystemencoding())
except RuntimeError as exc:
# Handle encrypted files see line 1292 of zipfile.py
is_encrypted = self.archive.getinfo(member_name).flag_bits & 0x1
if is_encrypted:
raise ContainerExtractionError(member_name, exc)
raise
def get_member(self, member_name):
zipinfo = self.archive.getinfo(member_name)
......@@ -112,13 +159,18 @@ class ZipContainer(Archive):
class ZipFile(File):
CONTAINER_CLASS = ZipContainer
FILE_TYPE_RE = re.compile(
r'^(Zip archive|Java archive|EPUB document|OpenDocument (Text|Spreadsheet|Presentation|Drawing|Formula|Template|Text Template))\b')
r'^(Zip archive|Java archive|EPUB document|OpenDocument (Text|Spreadsheet|Presentation|Drawing|Formula|Template|Text Template)|Google Chrome extension)\b')
def compare_details(self, other, source=None):
differences = []
zipinfo_difference = Difference.from_command(Zipinfo, self.path, other.path) or \
Difference.from_command(ZipinfoVerbose, self.path, other.path) or \
Difference.from_command(BsdtarVerbose, self.path, other.path)
return [zipinfo_difference]
zipnote_difference = Difference.from_command(Zipnote, self.path, other.path)
for x in (zipinfo_difference, zipnote_difference):
if x is not None:
differences.append(x)
return differences
class MozillaZipCommandMixin(object):
......
......@@ -63,6 +63,7 @@ class Config(object):
self.exclude_directory_metadata = False
self.compute_visual_diffs = False
self.max_container_depth = 50
self.use_dbgsym = False
self.force_details = False
def __setattr__(self, k, v):
......
......@@ -72,6 +72,9 @@ EXTERNAL_TOOLS = {
'debian': 'device-tree-compiler',
'arch': 'dtc',
},
'ffprobe': {
'debian': 'ffmpeg',
},
'file': {
'debian': 'file',
'arch': 'file',
......@@ -145,6 +148,10 @@ EXTERNAL_TOOLS = {
'arch': 'e2fsprogs',
'FreeBSD': 'e2fsprogs',
},
'lz4': {
'debian': 'lz4 | liblz4-tool',
'FreeBSD': 'lz4',
},
'msgunfmt': {
'debian': 'gettext',
'arch': 'gettext',
......@@ -166,6 +173,9 @@ EXTERNAL_TOOLS = {
'debian': 'binutils-multiarch',
'arch': 'binutils',
},
'ocamlobjinfo': {
'debian': 'ocaml-nox',
},
'odt2txt': {
'debian': 'odt2txt',
'arch': 'odt2txt',
......@@ -177,10 +187,6 @@ EXTERNAL_TOOLS = {
'debian': 'pgpdump',
'arch': 'pgpdump',
},
'pdftk': {
'debian': 'pdftk',
'FreeBSD': 'pdftk',
},
'pdftotext': {
'debian': 'poppler-utils',
'arch': 'poppler',
......@@ -237,6 +243,9 @@ EXTERNAL_TOOLS = {
'arch': 'sqlite',
'FreeBSD': 'sqlite3',
},
'wasm2wat': {
'arch': 'wabt',
},
'tar': {
'debian': 'tar',
'arch': 'tar',
......@@ -264,11 +273,14 @@ EXTERNAL_TOOLS = {
'arch': 'unzip',
'FreeBSD': 'unzip',
},
'procyon-decompiler': {
'zipnote': {
'debian': 'zip',
},
'procyon': {
'debian': 'procyon-decompiler',
},
'dumpxsb': {
'debian': 'xmlutils',
'debian': 'xmlbeans',
},
}
......
......@@ -17,10 +17,27 @@
# You should have received a copy of the GNU General Public License
# along with diffoscope. If not, see <https://www.gnu.org/licenses/>.
import sys
import contextlib
import logging
def line_eraser(fd=sys.stderr) -> bytes:
eraser = b'' # avoid None to avoid 'NoneType + str/bytes' failures
if fd.isatty():
from curses import tigetstr, setupterm
setupterm(fd=fd.fileno())