Skip to content

add specialize_as(), use it to speed up .smali comparison in APKs

FC (Fay) Stegerman requested to merge obfusk/diffoscope:smali-fix into master

Possible fix for #324.

NB: this does not currently take into account the possibility that there could be files that are not text files but do have the extension .smali in an APK; I've never encountered that before and I see no reason for them to be present, but they could be, theoretically.


To summarise:

  • as part of ApkContainer, diffoscope calls apktool which decompiles the classes in the .dex files in the APK to smali{,_classesN}/**/*.smali files;
  • these are text files and there are often thousands of them;
  • using libmagic to identity the file type for all these files (as automatically happens via .recognizes()) takes minutes;
  • we know we don't need this identification step (and can safely skip it) since these files are generated by diffoscope itself using apktool.

This MR:

  • splits off specialize_as() from try_recognize() to allow us to explicitly specialize a file in those rare cases we need to;
  • modifies specialize()
    • to perform the isinstance() check on all ComparatorManager().classes before running .recognizes() on any of them (instead of just before calling .recognizes() on each class in turn);
    • which IMO makes more sense regardless of also enabling this optimisation (i.e. avoiding use of libmagic via .recognizes()), since currently a file already specialized as a BarFile would not be returned as-is but turned into a FooFile as well if FooFile is earlier in the list than BarFile and .recognizes() the file;
  • uses specialize_as() in ApkContainer.get_member() to explicitly specialize smali{,_classesN}/**/*.smali ArchiveMembers as TextFile to avoid the libmagic call that is for these specific cases both unnecessary and expensive;
  • does not affect any other files that happen to have the extension .smali or rely on the order of ComparatorManager().classes.
Edited by FC (Fay) Stegerman

Merge request reports

Loading