APK diff is slow b/c libmagic takes minutes to identify 40k .smali files
Edit: update
Running file
(or using magic
from Python like diffoscope
does) on all 40k .smali
files takes just under two minutes.
Twice that -- since we have two APKs -- is pretty close to the overhead we see.
Original: APK diff is (unnecessarily) slow because it decompiles classes in .dex files twice
I was using diffoscope
on an APK with only differences in classes8.dex
.
It seems to be wasting a significant amount of time somewhere:
$ time diffoscope --text diff-apk.txt --text-color always a.apk b.apk
real 9m35.038s
user 15m58.099s
sys 0m41.146s
$ mkdir A B
$ unzip -d A a.apk classes8.dex
$ unzip -d B b.apk classes8.dex
$ time diffoscope --text diff-dex.txt --text-color always A/classes8.dex B/classes8.dex
real 2m31.845s
user 8m26.106s
sys 0m25.237s
Running apktool
only takes 30 seconds, so that's not the cause; it's what happens afterwards.
My hypothesis is that the difference is caused by comparing the 42228 .smali
files generated by apktool
(for this particular APK) as well.
So the issue is that apktool
decompiles the .dex
files into .smali
files, whereas we also convert the .dex
into a .jar
using enjarify
and then subsequently decompile the .class
files in the .jar
with procyon
.
I can't speak for other users, but I personally don't see the value in performing the .smali
comparison as well in this case.
So I would suggest ignoring the .smali
files (I don't know if you can tell apktool
not to generate them) if enjarify
and procyon
are available, at least by default.
I can't rule out it could be useful to have both in some cases (since the different methods of decompilation do not produce identical output), so being able to opt-in to the double comparison does seem useful.