Skip to content
Commit d3c7ac8e authored by Chris Lamb's avatar Chris Lamb 👀
Browse files

Add support to Difference.from_command_exc and friends to ignore specific...

Add support to Difference.from_command_exc and friends to ignore specific returncodes from the called program and treat them as "no" difference.
parent cc9a7301
Loading
Loading
Loading
Loading
  • Chris Lamb 👀 @lamby

    mentioned in commit 13283c47

    ·

    mentioned in commit 13283c47

    Toggle commit list
  • Contributor

    This commit causes regressions in the use cases I previously fixed in commit efd5f959.

    1. To reproduce, generate two .a files as described below. I understand .a files are "not supported any more", however I don't think this regression is specific to .a files, it's just the only example I have. I suspect this can be reproduced with any other container file and recursive comparison.
    rm mix1.a rm mix2.a
    ar qU mix1.a some.o COPYING
    ar qU mix2.a some.o COPYING

    mix1.a and mix2.a differ only by some timestamp(s). The only purpose of the random text file "COPYING" is to trigger readelf failures.

    Before this commit, a useful comparison:

    --- mix1.a
    +++ mix2.a
    │┄ Command `readelf --wide --file-header mix1.a` exited with return code 1. Output:
    │┄     readelf: mix1.a: Error: Not an ELF file - it has the wrong magic bytes at the start
    │┄
    @@ -1,10 +1,10 @@
     00000000: 213c 6172 6368 3e0a 2f20 2020 2020 2020  !<arch>./       
     00000010: 2020 2020 2020 2020 3135 3636 3934 3035          15669405
    -00000020: 3736 2020 3020 2020 2020 3020 2020 2020  76  0     0     
    +00000020: 3830 2020 3020 2020 2020 3020 2020 2020  80  0     0     
     00000030: 3020 2020 2020 2020 3134 2020 2020 2020  0       14      
     00000040: 2020 600a 0000 0001 0000 0052 6d61 696e    `........Rmain
     00000050: 0000 7838 362e 6f2f 2020 2020 2020 2020  ..x86.o/        
     00000060: 2020 3135 3636 3933 3930 3138 2020 3130    1566939018  10
     00000070: 3031 2020 3130 3034 2020 3130 3036 3434  01  1004  100644
     00000080: 2020 3838 3936 2020 2020 2020 600a 7f45    8896      `..E
     00000090: 4c46 0101 0100 0000 0000 0000 0000 0100  LF..............

    After this commit, three large and relatively cryptic stack traces and zero comparison result:

      ... long stack traces skipped ...
    
      File "diffoscope/diffoscope/difference.py", line 292, in from_command_exc
        assert False, exc.__dict__
    AssertionError: {'returncode': 1, 'cmd': ['readelf', '--wide', '--file-header', 'mix1.a'], 'output':
     'readelf: mix1.a: Error: Not an ELF file - it has the wrong magic bytes at the start\n', 'stderr': None}
    1. Another, similar use case is to trigger objdump failures with a single, cross-compiled .o file:

    Before this commit, a fantastically useful comparison:

    2019-08-27 21:19:48 E: diffoscope.comparators.elf: Command '['objdump', '--disassemble', '--demangle',
     '--section=.text.main', 'em1.a']' returned non-zero exit status 1.
    --- em1.a
    +++ em2.a
     ├── file list
     @@ -1,2 +1,2 @@
    -----------   0        0        0       14 2019-07-17 00:19:18.000000 /
    +----------   0        0        0       14 2019-07-17 00:19:07.000000 /
     ?rw-r--r--   0        0        0   199684 1970-01-01 00:00:00.000000 main.c.obj

    After this commit, three large and relatively cryptic stack traces and zero comparison result:

      ... long stack traces skipped ...
    
      File "//diffoscope/diffoscope/difference.py", line 292, in from_command_exc
        assert False, exc.__dict__
    AssertionError: {'returncode': 1, 'cmd': ['objdump', '--line-numbers', '--disassemble', '--demangle',
     '--reloc', '--section=.text', 'em1.a'], 'output': "objdump: can't disassemble for architecture
    UNKNOWN!\n\nobjdump: section '.text' mentioned in a -j option, but not found in any input file\n", 'stderr': None}
    Edited by Marc Herbert
  • Contributor

    SORRY I bisected too fast, didn't notice the errors evolved over the git history.

    • Use case number 2 above (only one foreign .o in the .a) is already fixed in the commits after this one and works again on the current master branch. Ignore use case 2, again sorry for the noise.

    • The more artificial use case 1 (specially crafted when trying to understand what's going on) still fails on the current master branch (tag 121 and later) but with a completely different and typical python3 unicode error:

       ...
    subprocess.CalledProcessError: Command '['readelf', '--wide', '--section-headers', 'mix1.a']' returned
     non-zero exit status 1.
    
    During handling of the above exception, another exception occurred:
      ...
      File "diffoscope/diffoscope/comparators/utils/file.py", line 458, in compare
        re.sub(r'^', '    ', e.output, flags=re.MULTILINE)
      File "/usr/lib64/python3.7/re.py", line 192, in sub
        return _compile(pattern, flags).sub(repl, string, count)
    TypeError: cannot use a string pattern on a bytes-like object

    This quick and dirty hack avoids this unicode issue temporarily:

    --- a/diffoscope/comparators/utils/file.py
    +++ b/diffoscope/comparators/utils/file.py
    @@ -455,7 +455,7 @@ class File(object, metaclass=abc.ABCMeta):
                     output = '<none>'
                     if e.output:
                         output = '\n{}'.format(
    -                        re.sub(r'^', '    ', e.output, flags=re.MULTILINE)
    +                        re.sub(r'^', '    ', e.output.decode("utf-8"), flags=re.MULTILINE)
                         )
                     difference.add_comment(
                         "Command `{}` exited with return code {}. Output: {}".format(
    
  • Contributor
    File "/usr/lib64/python3.7/re.py", line 192, in sub
        return _compile(pattern, flags).sub(repl, string, count)
    TypeError: cannot use a string pattern on a bytes-like object

    This unicode error is also reproduced when trying (and failing) to compare the two files strip-nondeterminism/t/fixtures/ar/one.a.{in,out} BUT ONLY when they have been renamed ar/in.a and ar/out.a?!? (with a .a suffix)

  • Author Owner

    This unicode error is also reproduced when trying (and failing) to compare the two files strip-nondeterminism/t/fixtures/ar/one.a.{in,out} BUT ONLY when they have been renamed ar/in.a and ar/out.a?!? (with a .a suffix)

    I could reproduce this. Inspecting the output of --debug and a small bit of Git archeology reveals this is because of this commit of mine. This might need a separate issue.

    For the rest of these comments after poking at it for about an hour, can I ask you to:

    • Ensure you are using the very latest Git master (I made some changes this morning)

    • If you can still reproduce this problem please copy-paste any relevant sections into a new issue with the exact SHA1 whenever you say "current master", for obvious reasons. :)

    • Please attach your test files (eg. mix1.a) so that my own method of constructing them is not flawed in anyway (I just want to 100% rule this time-consuming and useless diversion if so...)

    Thanks for such a detailed and good report here and my apologies I can't quickly fix it for you without asking you to do more "busywork" of sorts here...

  • Contributor

    Just tested again with HEAD at 2f101b8b. Pretty much everything works now. Interesting to see my utf8 hardcoding hack become production :-)

    One issue left: diffoscope strip-nondeterminism/t/fixtures/ar/one.a.{in,out} looks good. However when I rename the files to more common names in.a and out.a then I get a readelf error message and the nice, ls -l listing of file permissions and dates is gone! Just a hexdump instead. More test files at strip-nondeterminism!4 (closed) (the pipeline seems out of order)

    1. I think I understand why the readelf triggers a hexdump, however it's sad to lose the ls -l style display when this happens. Can't we have both the hexdump and the listing?

    2. So it looks like the decision to run readelf on the archive is made based on the .a extension of the archive file. I think it would make more sense to make that decision based on the .o, .obj, ... extensions of the files inside the archive instead. For instance: if no file inside looks like a compiled ELF, then don't run readelf. Note I'm not suggesting extracting members and running readelf on each. Hybrid archives are rare.

    Let me know if you want the above in a new bug or something.

    Thanks for such a detailed and good report

    I wish!

    Edited by Marc Herbert
  • Author Owner
    1. So it looks like the decision to run readelf on the archive is made based on the .a extension of the archive file. […]

    Please file this (or the other issue) as a new bug, we are unfortunately very much losing context in the multiple conversations here (eg. I think you missed my reference to this previous commit of mine) but, yes, each issue new bug please with enough "fresh" context; it is far easier to merge than to split.. :)

  • Marc Herbert @marc-guest

    mentioned in issue #64 (closed)

    ·

    mentioned in issue #64 (closed)

    Toggle commit list
  • Contributor

    OK, I did more testing and got to the bottom of this. The tentative fix looks simple: #64 (closed) Remove elf.StaticLibFile: it's a serious design flaw

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment