Remove elf.StaticLibFile
The primary purpose of .a files is to implement "static libraries" which are for the most part containers of .o/.obj/etc. "object files". HOWEVER this venerable archive format is generic and not limited to that specific purpose.
- https://sourceware.org/binutils/docs-2.32/binutils/ar.html "ar is considered a binary utility because archives of this sort are MOST OFTEN used as libraries holding commonly needed subroutines."
- https://www.freebsd.org/cgi/man.cgi?query=ar "The normal use of ar is for the creation and maintenance of libraries suitable for use with the link editor ld(1), although it is NOT RESTRICTED TO THIS PURPOSE"
(emphasis mine)
For convenience, the readelf
utility supports .a files: it can recursively examine members of .a archives. This of course makes sense only when the .a file argument is actually a static ELF library made of ELF members. The .a archive format is certainly not limited to ELF files, some operating systems or toolchains use alternatives to ELF.
As of version 121-16-g2f101b8b1b75, diffoscope invokes readelf
on .a files (elf.StaticLibFile). This is:
- harmful because it runs even when the .a file is not (just) a static ELF library.
- unnecessary because diffoscope is already very capable of recursing into archive files. See multiple demonstrations below.
Test 1: one short ASCII file
- Version 121-16-g2f101b8b1b75 falls back on hexdump. Even members metadata is lost:
--- text1.a
+++ text2.a
│┄ Command `readelf --wide --section-headers text1.a` exited with return code 1. (No output)
@@ -1,5 +1,5 @@
00000000: 213c 6172 6368 3e0a 7465 7874 5f66 696c !<arch>.text_fil
-00000010: 652f 2020 2020 2020 3135 3637 3138 3334 e/ 15671834
-00000020: 3238 2020 3130 3031 2020 3130 3034 2020 28 1001 1004
-00000030: 3130 3036 3634 2020 3131 2020 2020 2020 100664 11
-00000040: 2020 600a 736f 6d65 5f77 6f72 6473 0a0a `.some_words..
+00000010: 652f 2020 2020 2020 3135 3637 3138 3530 e/ 15671850
+00000020: 3738 2020 3130 3031 2020 3130 3034 2020 78 1001 1004
+00000030: 3130 3036 3634 2020 3132 2020 2020 2020 100664 12
+00000040: 2020 600a 736f 6d65 5f77 6f72 6473 320a `.some_words2.
- But after either this one-line patch,..
--- a/diffoscope/comparators/__init__.py
+++ b/diffoscope/comparators/__init__.py
@@ -55,7 +55,7 @@ class ComparatorManager(object):
('elf.ElfFile',),
('macho.MachoFile',),
('fsimage.FsImageFile',),
- ('elf.StaticLibFile',),
+# ('elf.StaticLibFile',),
('llvm.LlvmBitCodeFile',),
('sqlite.Sqlite3Database',),
('wasm.WasmFile',),
... OR renaming testN.a
files to testN.a.anything
(!!), OR comparing .tar
OR .cpio
containers instead of .a archives, beautiful diff:
--- text1.a
+++ text2.a
├── file list
│ @@ -1 +1 @@
- -rw-rw-r-- 0 1001 1004 11 2019-08-30 16:43:48.000000 text_file
+ -rw-rw-r-- 0 1001 1004 12 2019-08-30 17:11:18.000000 text_file
├── text_file
│ @@ -1 +1 @@
-some_words
+some_words2
Test 2: one short ELF file + one short ASCII file
- Version 121-16-g2f101b8b1b75 falls back on hexdump. Even members metadata is lost:
--- 1mix.a
+++ 2mix.a
│┄ Command `readelf --wide --section-headers 1mix.a` exited with return code 1. Standard output:
│┄ File: 1mix.a(return42.o)
│┄ There are 7 section headers, starting at offset 0x128:
│┄
│┄ Section Headers:
│┄ [Nr] Name Type Address Off Size ES Flg Lk Inf Al
│┄ [ 0] [...]
@@ -1,21 +1,21 @@
00000000: 213c 6172 6368 3e0a 2f20 2020 2020 2020 !<arch>./
00000010: 2020 2020 2020 2020 3135 3637 3138 3334 15671834
-00000020: 3436 2020 3020 2020 2020 3020 2020 2020 46 0 0
+00000020: 3533 2020 3020 2020 2020 3020 2020 2020 53 0 0
00000030: 3020 2020 2020 2020 3138 2020 2020 2020 0 18
00000040: 2020 600a 0000 0001 0000 0056 7265 7475 `........Vretu
00000050: 726e 3432 0000 7265 7475 726e 3432 2e6f rn42..return42.o
-00000060: 2f20 2020 2020 3135 3637 3138 3332 3930 / 1567183290
+00000060: 2f20 2020 2020 3135 3637 3138 3333 3637 / 1567183367
00000070: 2020 3130 3031 2020 3130 3034 2020 3130 1001 1004 10
00000080: 3036 3634 2020 3734 3420 2020 2020 2020 0664 744
00000090: 600a 7f45 4c46 0201 0100 0000 0000 0000 `..ELF..........
000000a0: 0000 0100 3e00 0100 0000 0000 0000 0000 ....>...........
000000b0: 0000 0000 0000 0000 0000 2801 0000 0000 ..........(.....
000000c0: 0000 0000 0000 4000 0000 0000 4000 0700 ......@.....@...
-000000d0: 0600 5548 89e5 b82a 0000 005d c300 0000 ..UH...*...]....
+000000d0: 0600 5548 89e5 b82b 0000 005d c300 0000 ..UH...+...]....
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000f0: 0000 0000 0000 0000 0000 0100 0000 0400 ................
00000100: f1ff 0000 0000 0000 0000 0000 0000 0000 ................
- Successful disassembly diff after the same one-line patch that disables
elf.StaticLibFile
, OR renamingtestN.a
files totestN.a.anything
, OR using .tar files OR using .cpio files:
--- 1mix.a
+++ 2mix.a
├── file list
│ @@ -1,3 +1,3 @@
----------- 0 0 0 18 2019-08-30 16:44:06.000000 /
--rw-rw-r-- 0 1001 1004 744 2019-08-30 16:41:30.000000 return42.o
+---------- 0 0 0 18 2019-08-30 16:44:13.000000 /
+-rw-rw-r-- 0 1001 1004 744 2019-08-30 16:42:47.000000 return42.o
│ -rw-rw-r-- 0 1001 1004 11 2019-08-30 16:43:48.000000 text_file
├── return42.o
│ ├── objdump --line-numbers --disassemble --demangle --reloc --section=.text {}
│ │ @@ -3,10 +3,10 @@
│ │
│ │ Disassembly of section .text:
│ │
│ │ 0000000000000000 <return42>:
│ │ return42():
│ │ 0: 55 push %rbp
│ │ 1: 48 89 e5 mov %rsp,%rbp
- 4: b8 2a 00 00 00 mov $0x2a,%eax
+ 4: b8 2b 00 00 00 mov $0x2b,%eax
│ │ 9: 5d pop %rbp
│ │ a: c3 retq
Test 3: while very popular, the ELF format is not universal. Here's a macOS example with version 121-16-g2f101b8b1b75 and one small Mach-O object file
--- 1.a
+++ 2.a
│┄ 'readelf' not available in path. Falling back to binary comparison.
@@ -1,18 +1,18 @@
00000000: 213c 6172 6368 3e0a 2331 2f32 3020 2020 !<arch>.#1/20
00000010: 2020 2020 2020 2020 3135 3637 3132 3537 15671257
-00000020: 3334 2020 3530 3220 2020 3230 2020 2020 34 502 20
+00000020: 3438 2020 3530 3220 2020 3230 2020 2020 48 502 20
00000030: 3130 3036 3434 2020 3434 2020 2020 2020 100644 44
00000040: 2020 600a 5f5f 2e53 594d 4445 4620 534f `.__.SYMDEF SO
00000050: 5254 4544 0000 0000 0800 0000 0000 0000 RTED............
00000060: 7000 0000 0800 0000 5f6d 6169 6e00 0000 p......._main...
00000070: 2331 2f31 3220 2020 2020 2020 2020 2020 #1/12
-00000080: 3135 3637 3132 3436 3738 2020 3530 3220 1567124678 502
+00000080: 3135 3637 3132 3435 3930 2020 3530 3220 1567124590 502
00000090: 2020 3230 2020 2020 3130 3036 3434 2020 20 100644
-000000a0: 3131 3536 2020 2020 2020 600a 312e 6f00 1156 `.1.o.
+000000a0: 3131 3536 2020 2020 2020 600a 322e 6f00 1156 `.2.o.
000000b0: 0000 0000 0000 0000 cffa edfe 0700 0001 ................
000000c0: 0300 0000 0100 0000 0400 0000 0802 0000 ................
@@ -51,15 +51,15 @@
00000370: 55c0 488d 3d69 0000 0089 45b4 b000 e800 U.H.=i....E.....
00000380: 0000 0048 8b0d 0000 0000 488b 0948 8b55 ...H......H..H.U
-00000390: f848 39d1 8945 b00f 850b 0000 00b8 0300 .H9..E..........
+00000390: f848 39d1 8945 b00f 850b 0000 00b8 0200 .H9..E..........
000003a0: 0000 4883 c450 5dc3 e800 0000 000f 0b73 ..H..P]........s
000003b0: 697a 656f 6628 6170 293d 257a 640a 0061 izeof(ap)=%zd..a
000003c0: 7272 3d25 702c 2061 703d 2570 0a00 257a rr=%p, ap=%p..%z
...
Beautiful disassembly diff on macOS when storing the same Mach-O files in .tar or .cpio archives instead:
--- 1.cpio
+++ 2.cpio
├── file list
│ @@ -1 +1 @@
--rw-r--r-- 1 502 20 1140 2019-08-30 00:24:38.000000 1.o
+-rw-r--r-- 1 502 20 1140 2019-08-30 00:23:10.000000 2.o
│ --- 1.o
├── +++ 2.o
│ ├── otool -arch x86_64 -tdvV {}
│ │┄ Code for architecture x86_64
│ │ @@ -38,13 +38,13 @@
│ │ 000000000000009e callq _printf
│ │ 00000000000000a3 movq ___stack_chk_guard(%rip), %rcx
│ │ 00000000000000aa movq _main(%rcx), %rcx
│ │ 00000000000000ad movq -0x8(%rbp), %rdx
│ │ 00000000000000b1 cmpq %rdx, %rcx
│ │ 00000000000000b4 movl %eax, -0x50(%rbp)
│ │ 00000000000000b7 jne 0xc8
-00000000000000bd movl $0x3, %eax
+00000000000000bd movl $0x2, %eax
│ │ 00000000000000c2 addq $0x50, %rsp
│ │ 00000000000000c6 popq %rbp
│ │ 00000000000000c7 retq
│ │ 00000000000000c8 callq ___stack_chk_fail
│ │ 00000000000000cd ud2
Note the simpler workarounds don't work yet on macOS, more portability work required after elf.StaticLibFile
is removed.
Other examples and/or more background
-
d3c7ac8e
Add support to Difference.from_command_exc and frie...
-
strip-nondeterminism!4 (closed)
ar.pm: Don't corrupt tables of symbols and long filenames
-
16d519a9
Revert "Don't assume all files called ".a" are ELF binaries.
-
63ce5bf2
TODO: this would also be useful for Go archives. Currently those are handled by StaticLibFile, but then readelf complains with "Error: Not an ELF file".