Skip to content

Remove elf.StaticLibFile

The primary purpose of .a files is to implement "static libraries" which are for the most part containers of .o/.obj/etc. "object files". HOWEVER this venerable archive format is generic and not limited to that specific purpose.

(emphasis mine)

For convenience, the readelf utility supports .a files: it can recursively examine members of .a archives. This of course makes sense only when the .a file argument is actually a static ELF library made of ELF members. The .a archive format is certainly not limited to ELF files, some operating systems or toolchains use alternatives to ELF.

As of version 121-16-g2f101b8b1b75, diffoscope invokes readelf on .a files (elf.StaticLibFile). This is:

  • harmful because it runs even when the .a file is not (just) a static ELF library.
  • unnecessary because diffoscope is already very capable of recursing into archive files. See multiple demonstrations below.

Test 1: one short ASCII file

  • Version 121-16-g2f101b8b1b75 falls back on hexdump. Even members metadata is lost:
--- text1.a
+++ text2.a
│┄ Command `readelf --wide --section-headers text1.a` exited with return code 1. (No output)
@@ -1,5 +1,5 @@
 00000000: 213c 6172 6368 3e0a 7465 7874 5f66 696c  !<arch>.text_fil
-00000010: 652f 2020 2020 2020 3135 3637 3138 3334  e/      15671834
-00000020: 3238 2020 3130 3031 2020 3130 3034 2020  28  1001  1004  
-00000030: 3130 3036 3634 2020 3131 2020 2020 2020  100664  11      
-00000040: 2020 600a 736f 6d65 5f77 6f72 6473 0a0a    `.some_words..
+00000010: 652f 2020 2020 2020 3135 3637 3138 3530  e/      15671850
+00000020: 3738 2020 3130 3031 2020 3130 3034 2020  78  1001  1004  
+00000030: 3130 3036 3634 2020 3132 2020 2020 2020  100664  12      
+00000040: 2020 600a 736f 6d65 5f77 6f72 6473 320a    `.some_words2.
  • But after either this one-line patch,..
--- a/diffoscope/comparators/__init__.py
+++ b/diffoscope/comparators/__init__.py
@@ -55,7 +55,7 @@ class ComparatorManager(object):
         ('elf.ElfFile',),
         ('macho.MachoFile',),
         ('fsimage.FsImageFile',),
-        ('elf.StaticLibFile',),
+#        ('elf.StaticLibFile',),
         ('llvm.LlvmBitCodeFile',),
         ('sqlite.Sqlite3Database',),
         ('wasm.WasmFile',),

... OR renaming testN.a files to testN.a.anything (!!), OR comparing .tar OR .cpio containers instead of .a archives, beautiful diff:

--- text1.a
+++ text2.a
├── file list
│ @@ -1 +1 @@
- -rw-rw-r--   0     1001     1004       11 2019-08-30 16:43:48.000000 text_file
+ -rw-rw-r--   0     1001     1004       12 2019-08-30 17:11:18.000000 text_file
├── text_file
│ @@ -1 +1 @@
-some_words
+some_words2

Test 2: one short ELF file + one short ASCII file

  • Version 121-16-g2f101b8b1b75 falls back on hexdump. Even members metadata is lost:
--- 1mix.a
+++ 2mix.a
│┄ Command `readelf --wide --section-headers 1mix.a` exited with return code 1. Standard output:
│┄     File: 1mix.a(return42.o)
│┄     There are 7 section headers, starting at offset 0x128:
│┄
│┄     Section Headers:
│┄       [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
│┄       [ 0]                 [...]
@@ -1,21 +1,21 @@
 00000000: 213c 6172 6368 3e0a 2f20 2020 2020 2020  !<arch>./       
 00000010: 2020 2020 2020 2020 3135 3637 3138 3334          15671834
-00000020: 3436 2020 3020 2020 2020 3020 2020 2020  46  0     0     
+00000020: 3533 2020 3020 2020 2020 3020 2020 2020  53  0     0     
 00000030: 3020 2020 2020 2020 3138 2020 2020 2020  0       18      
 00000040: 2020 600a 0000 0001 0000 0056 7265 7475    `........Vretu
 00000050: 726e 3432 0000 7265 7475 726e 3432 2e6f  rn42..return42.o
-00000060: 2f20 2020 2020 3135 3637 3138 3332 3930  /     1567183290
+00000060: 2f20 2020 2020 3135 3637 3138 3333 3637  /     1567183367
 00000070: 2020 3130 3031 2020 3130 3034 2020 3130    1001  1004  10
 00000080: 3036 3634 2020 3734 3420 2020 2020 2020  0664  744       
 00000090: 600a 7f45 4c46 0201 0100 0000 0000 0000  `..ELF..........
 000000a0: 0000 0100 3e00 0100 0000 0000 0000 0000  ....>...........
 000000b0: 0000 0000 0000 0000 0000 2801 0000 0000  ..........(.....
 000000c0: 0000 0000 0000 4000 0000 0000 4000 0700  ......@.....@...
-000000d0: 0600 5548 89e5 b82a 0000 005d c300 0000  ..UH...*...]....
+000000d0: 0600 5548 89e5 b82b 0000 005d c300 0000  ..UH...+...]....
 000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
 000000f0: 0000 0000 0000 0000 0000 0100 0000 0400  ................
 00000100: f1ff 0000 0000 0000 0000 0000 0000 0000  ................
  • Successful disassembly diff after the same one-line patch that disables elf.StaticLibFile, OR renaming testN.a files to testN.a.anything , OR using .tar files OR using .cpio files:
--- 1mix.a
+++ 2mix.a
├── file list
│ @@ -1,3 +1,3 @@
-----------   0        0        0       18 2019-08-30 16:44:06.000000 /
--rw-rw-r--   0     1001     1004      744 2019-08-30 16:41:30.000000 return42.o
+----------   0        0        0       18 2019-08-30 16:44:13.000000 /
+-rw-rw-r--   0     1001     1004      744 2019-08-30 16:42:47.000000 return42.o
│  -rw-rw-r--   0     1001     1004       11 2019-08-30 16:43:48.000000 text_file
├── return42.o
│ ├── objdump --line-numbers --disassemble --demangle --reloc --section=.text {}
│ │ @@ -3,10 +3,10 @@
│ │  
│ │  Disassembly of section .text:
│ │  
│ │  0000000000000000 <return42>:
│ │  return42():
│ │     0:	55                   	push   %rbp
│ │     1:	48 89 e5             	mov    %rsp,%rbp
-   4:	b8 2a 00 00 00       	mov    $0x2a,%eax
+   4:	b8 2b 00 00 00       	mov    $0x2b,%eax
│ │     9:	5d                   	pop    %rbp
│ │     a:	c3                   	retq

Test 3: while very popular, the ELF format is not universal. Here's a macOS example with version 121-16-g2f101b8b1b75 and one small Mach-O object file

--- 1.a
+++ 2.a
│┄ 'readelf' not available in path. Falling back to binary comparison.
@@ -1,18 +1,18 @@
 00000000: 213c 6172 6368 3e0a 2331 2f32 3020 2020  !<arch>.#1/20   
 00000010: 2020 2020 2020 2020 3135 3637 3132 3537          15671257
-00000020: 3334 2020 3530 3220 2020 3230 2020 2020  34  502   20    
+00000020: 3438 2020 3530 3220 2020 3230 2020 2020  48  502   20    
 00000030: 3130 3036 3434 2020 3434 2020 2020 2020  100644  44      
 00000040: 2020 600a 5f5f 2e53 594d 4445 4620 534f    `.__.SYMDEF SO
 00000050: 5254 4544 0000 0000 0800 0000 0000 0000  RTED............
 00000060: 7000 0000 0800 0000 5f6d 6169 6e00 0000  p......._main...
 00000070: 2331 2f31 3220 2020 2020 2020 2020 2020  #1/12           
-00000080: 3135 3637 3132 3436 3738 2020 3530 3220  1567124678  502 
+00000080: 3135 3637 3132 3435 3930 2020 3530 3220  1567124590  502 
 00000090: 2020 3230 2020 2020 3130 3036 3434 2020    20    100644  
-000000a0: 3131 3536 2020 2020 2020 600a 312e 6f00  1156      `.1.o.
+000000a0: 3131 3536 2020 2020 2020 600a 322e 6f00  1156      `.2.o.
 000000b0: 0000 0000 0000 0000 cffa edfe 0700 0001  ................
 000000c0: 0300 0000 0100 0000 0400 0000 0802 0000  ................
@@ -51,15 +51,15 @@
 00000370: 55c0 488d 3d69 0000 0089 45b4 b000 e800  U.H.=i....E.....
 00000380: 0000 0048 8b0d 0000 0000 488b 0948 8b55  ...H......H..H.U
-00000390: f848 39d1 8945 b00f 850b 0000 00b8 0300  .H9..E..........
+00000390: f848 39d1 8945 b00f 850b 0000 00b8 0200  .H9..E..........
 000003a0: 0000 4883 c450 5dc3 e800 0000 000f 0b73  ..H..P]........s
 000003b0: 697a 656f 6628 6170 293d 257a 640a 0061  izeof(ap)=%zd..a
 000003c0: 7272 3d25 702c 2061 703d 2570 0a00 257a  rr=%p, ap=%p..%z
...

Beautiful disassembly diff on macOS when storing the same Mach-O files in .tar or .cpio archives instead:

--- 1.cpio
+++ 2.cpio
├── file list
│ @@ -1 +1 @@
--rw-r--r--   1      502       20     1140 2019-08-30 00:24:38.000000 1.o
+-rw-r--r--   1      502       20     1140 2019-08-30 00:23:10.000000 2.o
│   --- 1.o
├── +++ 2.o
│ ├── otool -arch x86_64 -tdvV {}
│ │┄ Code for architecture x86_64
│ │ @@ -38,13 +38,13 @@
│ │  000000000000009e	callq	_printf
│ │  00000000000000a3	movq	___stack_chk_guard(%rip), %rcx
│ │  00000000000000aa	movq	_main(%rcx), %rcx
│ │  00000000000000ad	movq	-0x8(%rbp), %rdx
│ │  00000000000000b1	cmpq	%rdx, %rcx
│ │  00000000000000b4	movl	%eax, -0x50(%rbp)
│ │  00000000000000b7	jne	0xc8
-00000000000000bd	movl	$0x3, %eax
+00000000000000bd	movl	$0x2, %eax
│ │  00000000000000c2	addq	$0x50, %rsp
│ │  00000000000000c6	popq	%rbp
│ │  00000000000000c7	retq
│ │  00000000000000c8	callq	___stack_chk_fail
│ │  00000000000000cd	ud2

Note the simpler workarounds don't work yet on macOS, more portability work required after elf.StaticLibFile is removed.

Other examples and/or more background

  • d3c7ac8e Add support to Difference.from_command_exc and frie...
  • strip-nondeterminism!4 (closed) ar.pm: Don't corrupt tables of symbols and long filenames
  • 16d519a9 Revert "Don't assume all files called ".a" are ELF binaries.
  • 63ce5bf2 TODO: this would also be useful for Go archives. Currently those are handled by StaticLibFile, but then readelf complains with "Error: Not an ELF file".
Edited by Chris Lamb
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information