BUGFIX: more gracefully handle sockets or named pipes within the scope of the comparison
Diffoscope versions through 195 (and current git head) have problematic behavior when recursing through a directory containing UNIX domain sockets or named pipes: if the socket/pipe is not currently connected, or the diffoscope process doesn't have appropriate permissions, you'll get a crash (in diffoscope/comparators/utils/file.py", line 103, in guess_file_type) when attempting to open or read the file. Furthermore, if the socket/pipe is connected but no data or producer is ready, diffoscope can hang indefinitely waiting for a synchronous read, behavior that wouldn't happen with a regular file.
Intentionally trying to 'diff' a socket doesn't sound like something reasonable users often do, but it's very easy to hit this with a directory diff, and the only workaround is to manually exclude the pipe/socket files. The attached patch adds socket/FIFO recognition capability to diffoscope, including for supported archive types that might embed a named pipe. It defines a 'match' between sockets/FIFOs as having the same type (socket or FIFO) and the same apparent filesystem endpoint. All tests pass except for test_epub, which is also broken on my machine under the git head against which this patch is based (i.e. test fails even before the patch is applied).
One very minor caveat is that since diffoscope (after applying the patch) no longer tries to access the data stream for a socket/FIFO, it's not possible to use it to compare the socket/FIFO data stream against the contents of an ordinary file (as unpatched diffoscope can do if there happens to be an active data producer on the other end of the socket/pipe). I think this is probably an extremely rare use case, and anyone who wants to do that (e.g. for testing an IPC protocol or server) can just use regular diff, so it's not worth any extra codebase complexity to support. Still, if anyone feels it's important, I'm happy to add a command-line flag to force actual sockets/pipe reads instead of only looking at their metadata, and/or to automatically invoke that behavior when the socket/pipe is manually specified on the command line (as opposed to being encountered while recursing through a directory). For "sockets" stored within an archive format (waiting to be extracted into real socket endpoints), there generally won't be an active IPC channel, so we'd have to stick with the metadata comparison.
diffoscope-handle-sockets.patch
New file (for diffoscope/comparators): socket_or_fifo.py