New upstream version 0.0~git20161124~dcd2a9e

parents
bin
build
This diff is collapsed.
# LibScout
LibScout is a light-weight and effective static analysis tool to detect third-party libraries in Android apps. The detection is resilient against<br>
common bytecode obfuscation techniques such as identifier renaming or code-based obfuscations such as reflection-based API hiding or control-flow randomization.<br>
LibScout requires the original library SDKs (compiled .jar/.aar files) to extract library profiles that can be used for detection on Android apps.
Unique features:
* Library detection resilient against many kinds of bytecode obfuscation
* Capability of pinpointing the exact library version (in some cases to a set of 2-3 candidate versions)
* Capability of handling dead-code elimination, by computing a similarity score against baseline SDKs
For technical details and large-scale evaluation results, please refer to our publication:<br>
> Reliable Third-Party Library Detection in Android and its Security Applications<br>
> https://www.infsec.cs.uni-saarland.de/~derr/publications/pdfs/derr_ccs16.pdf
For comments, feedback, etc. contact: Erik Derr [lastname@cs.uni-saarland.de]
## LibScout Repo Structure
<pre><code>
|_ build.xml (ant build file to generate runnable .jar)
|_ data
| |_ library-data.sqlite (library meta data)
| |_ library-profiles.zip (all library profiles)
| |_ app-version-codes.csv (app packages with valid version codes)
|_ lib
| pre-compiled WALA libs, Apache commons*, log4j, Android SDK
|_ logging
| |_ logback.xml (log4j configuration file)
|_ scripts
| |_ mvn-central
| |_ mvn-central-crawler.py (script to retrieve complete library histories from mvn-central)
|_ src
source directory of LibScout (de/infsec/tpl). Includes some open-source,
third-party code to parse AXML resources / app manifests etc.
</code></pre>
## Getting Started
<ol>
<li>LibScout requires Java 1.7 or higher. If you're using OpenJDK you need to use either 1.7 <b>or</b> 1.9 (1.8 seems to have some strange bytecode verification bug)<br>
A runnable jar can be generated with the build.xml</li>
<li>LibScout has three modes of operation:<br>
<ol type="a">
<li>
Generate library profiles from original library SDKs:<br>
<pre>java -jar LibScout.jar -o profile -a lib/android-X.jar -x ${lib-dir/library.xml} ${lib-dir/lib.jar} </pre>
</li>
<li>
Detect libraries in apps using pre-generated profiles (log to directory + serialize results):<br>
<pre>java -jar LibScout.jar -o match -a lib/android-X.jar -p &lt;path-to-lib-profiles&gt; -s -d &lt;log-dir&gt; $someapp.apk </pre>
</li>
<li>
Generate a SQLite database from library profiles and serialized app stats:<br>
<pre>java -jar LibScout.jar -o db -p &lt;path-to-lib-profiles&gt; -s &lt;path-to-app-stats&gt; </pre>
</li>
</ol>
</li>
<li>
Some classes to start with:
<ul>
<li><b>de.infsec.tpl.TplCLI</b>: &nbsp;&nbsp; Starting class including CLI parsing and logging init</li>
<li><b>de.infsec.tpl.LibraryHandler</b>:&nbsp;&nbsp; Starting class to extract library profiles</li>
<li><b>de.infsec.tpl.LibraryIdentifier</b>:&nbsp;&nbsp; Code to match lib profiles and application bytecode</li>
<li><b>de.infsec.tpl.hash.HashTree</b>:&nbsp;&nbsp; main data structures used for profiles</li>
</li>
<li><i>How to aggregate per-app results during large-scale evaluation?</i><br>
While the tool consumes one app at a time, it can serialize the app results to disk. Using
operation mode c), LibScout loads all app results to generate one convenient SQLite file<br>
(the DB structure can be found in class de.infsec.tpl.stats.SQLStats)
</li>
</ol>
## Library Profiles
While we can not make the original library SDks publicly available for legal reasons, we provide the following:<br>
<ul>
<li>all library profiles (ready-to-use for detection in apps)&nbsp;&nbsp; [data/library-profiles.zip]</li>
<li>an accompanying SQLite DB with parsed library data (name, version, release date, ..)&nbsp;&nbsp; [data/library-data.sqlite]</li>
<li>a python script to automatically download complete version histories from maven-central
incl. config script&nbsp;&nbsp; [scripts/mvn-central/mvn-central-crawler.py]</li>
</ul>
<project name="LibScout" default="build" basedir=".">
<property name="app.name" value="LibScout"/>
<property name="app.developer" value="Erik Derr"/>
<property name="main.class" value="de.infsec.tpl.TplCLI"/>
<property name="build.dir" location="build"/>
<property name="bin.dir" location="bin"/>
<property name="src.dir" location="src"/>
<property name="lib.dir" location="lib"/>
<!-- dependencies -->
<path id="myclasspath">
<fileset dir="lib">
<include name="**/*.jar"/>
</fileset>
</path>
<!-- compile target -->
<target name="compile" depends="clean" description="compile the source">
<mkdir dir="${bin.dir}"/>
<echo message="Using Java version ${ant.java.version}."/>
<javac srcdir="${src.dir}" destdir="${bin.dir}" optimize="true" debug="false" deprecation="off" includeantruntime="false">
<classpath refid="myclasspath"/>
</javac>
</target>
<!-- build tpl.jar file -->
<target name="build" depends="compile" description="build the jar file">
<mkdir dir="${build.dir}"/>
<!-- necessary for android xml parsing -->
<unjar src="${lib.dir}/android-23.jar" dest="${bin.dir}">
<patternset>
<include name="android/content/res/*.class" />
<include name="org/xmlpull/v1/*.class" />
<include name="android/util/AttributeSet.class" />
</patternset>
</unjar>
<unjar src="${lib.dir}/joana.api.jar" dest="${bin.dir}">
<patternset>
<exclude name="com/ibm/wala/cast/" />
<exclude name="edu/kit/joana/api/" />
<exclude name="edu/kit/joana/ui/" />
<exclude name="junit/" />
<exclude name="ch/" /> <!-- logback -->
<exclude name="net/"/> <!-- cscott -->
<exclude name="gnu/"/> <!-- trove -->
<exclude name="stubs/"/> <!-- stubs -->
<exclude name="tests/"/> <!-- tests -->
<exclude name="org/junit/" />
<exclude name="org/slf4j/" />
<exclude name="org/apache/commons/io/" />
</patternset>
</unjar>
<unjar src="${lib.dir}/wala-dalvik.jar" dest="${bin.dir}">
<patternset>
<include name="com/ibm/" />
<include name="com/google/common/base/*.class" />
<include name="org/" />
</patternset>
</unjar>
<unjar dest="${bin.dir}">
<fileset dir="${lib.dir}">
<include name="**/commons-cli-1.2.jar" />
<include name="**/commons-io-2.4.jar" />
<include name="**/sqlite-jdbc-3.7.15-SNAPSHOT.jar" />
</fileset>
<fileset dir="${lib.dir}/logging">
<include name="**/*.jar" />
</fileset>
</unjar>
<!-- Put the jar file into build dir -->
<jar jarfile="${build.dir}/LibScout.jar" basedir="${bin.dir}" compress="true">
<manifest>
<attribute name="Built-By" value="${app.developer}" />
<attribute name="Application" value="${app.name}" />
<attribute name="Main-Class" value="${main.class}" />
</manifest>
</jar>
</target>
<!-- clean target -->
<target name="clean" description="clean up">
<delete dir="${bin.dir}"/>
<delete dir="${build.dir}"/>
</target>
</project>
This diff is collapsed.
<configuration>
<!-- WALA dex frontend -->
<logger name="com.ibm.wala.dalvik" level="off"/>
<!-- LibScout log config -->
<logger name="de.infsec.tpl.LibraryHandler" level="info"/>
<logger name="de.infsec.tpl.LibraryIdentifier" level="info"/>
<logger name="de.infsec.tpl.profile.ProfileMatch" level="info"/>
<logger name="de.infsec.tpl.hash.HashTree" level="debug"/>
<logger name="de.infsec.tpl.utils.WalaUtils" level="info"/>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss} %-5level %-25logger{0} : %msg%n</pattern>
<!--<pattern>%d{HH:mm:ss} %-5level %-25logger{0} : %msg%n</pattern>-->
</encoder>
</appender>
<appender name="FILE" class="ch.qos.logback.classic.sift.SiftingAppender">
<discriminator>
<key>appPath</key>
<defaultValue>./defaultApp</defaultValue>
</discriminator>
<sift>
<appender name="${appPath}" class="ch.qos.logback.core.FileAppender">
<file>${appPath}.log</file>
<append>false</append>
<layout class="ch.qos.logback.classic.PatternLayout">
<pattern>%d{HH:mm:ss} %-5level %-25logger{0} : %msg%n</pattern>
</layout>
</appender>
</sift>
</appender>
<root level="info">
<appender-ref ref="CONSOLE" />
<appender-ref ref="FILE" />
</root>
</configuration>
This diff is collapsed.
#
# Copyright (c) 2015-2016 Erik Derr [derr@cs.uni-saarland.de]
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
# file except in compliance with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under
# the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the specific language governing
# permissions and limitations under the License.
#
#!/usr/bin/python
#
# Crawler for libraries hosted at mvn central
# Retrieves jar|aar files along with some meta data
import json
import urllib2
import datetime
import os
import errno
import zipfile
import traceback
from retrying import retry # may require "pip install retrying"
## functions ##
def unix2Date(unixTime):
unixTime = int(str(unixTime)[:-3])
return datetime.datetime.fromtimestamp(unixTime).strftime('%d.%m.%Y')
def make_sure_path_exists(path):
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
def write_library_description(fileName, libName, category, version, date, comment):
make_sure_path_exists(os.path.dirname(fileName))
# write lib description in xml format
with open(fileName, "w") as desc:
desc.write("<?xml version=\"1.0\"?>\n")
desc.write("<library>\n")
desc.write(" <!-- library name -->\n")
desc.write(" <name>{}</name>\n".format(libName))
desc.write("\n")
desc.write(" <!-- Advertising, Analytics, Android, SocialMedia, Cloud, Utilities -->\n")
desc.write(" <category>{}</category>\n".format(category))
desc.write("\n")
desc.write(" <!-- optional: version string -->\n")
desc.write(" <version>{}</version>\n".format(version))
desc.write("\n")
desc.write(" <!-- optional: date (format: DD/MM/YYYY) -->\n")
desc.write(" <releasedate>{}</releasedate>\n".format(date))
desc.write("\n")
desc.write(" <!-- optional: comment -->\n")
desc.write(" <comment>{}</comment>\n".format(comment))
desc.write("</library>\n")
# TODO: decorator does not work
@retry(urllib2.URLError, tries=3, delay=3, backoff=1)
def urlopen_with_retry(URL):
return urllib2.urlopen(URL)
def downloadFile(targetDir, groupid, artefactid, version, filetype):
make_sure_path_exists(os.path.dirname(targetDir + "/"))
# assemble download URL
baseURL = "http://search.maven.org/remotecontent?filepath="
artefactid_r = artefactid.replace(".","/")
groupid_r = groupid.replace(".","/")
URL = baseURL + groupid_r + "/" + artefactid_r + "/"
# sometimes it just returns the type "bundle", we then access the jar file
if filetype == "bundle":
filetype = "jar"
fileName = artefactid_r + "-" + version + "." + filetype
URL = URL + version + "/" + fileName
# retrieve and save file
targetFile = targetDir + "/" + fileName
try:
libFile = urllib2.urlopen(URL)
with open(targetFile,'wb') as output:
output.write(libFile.read())
# if filetype is aar unzip classes.jar (since WALA currently does not handle aar's directly)
if filetype == "aar":
fh = open(targetFile, 'rb')
z = zipfile.ZipFile(fh)
for f in z.namelist():
if f == "classes.jar":
z.extract(f, targetDir)
fh.close()
return 0
except urllib2.HTTPError, e:
print 'HTTPError = ' + str(e.code)
return 1
except urllib2.URLError, e:
print 'URLError = ' + str(e.reason)
return 1
except Exception, excp:
print 'Download failed (' + str(excp) + ')'
return 1
def updateLibrary(libName, category, comment, groupId, artefactId):
# replace all blanks with dash
libName = libName.replace(" ", "-")
print " # check library " + libName + " [" + category + "] (g:\"" + groupId + "\" AND a:\"" + artefactId + "\")"
baseDirName = rootDir + category + "/" + libName + "/"
dir = os.path.dirname(baseDirName)
make_sure_path_exists(dir);
# Assemble mvn central search URL and retrieve meta data
try:
mvnSearchURL = "http://search.maven.org/solrsearch/select?q=g:%22" + groupId + "%22+AND+a:%22" + artefactId + "%22&rows=100&core=gav"
response = urllib2.urlopen(mvnSearchURL)
data = json.loads(response.read())
except urllib2.URLError, e:
print 'URLError = ' + str(e.reason)
return
except Exception, excp:
print 'Could not retrieve meta data for ' + libName + ' [SKIP] (' + str(excp) + ')'
return
# DEBUG: pretty print json
#print json.dumps(data, indent=4, sort_keys=True)
#print
numberOfVersions = data["response"]["numFound"]
print " - retrieved meta data for " + str(numberOfVersions) + " versions:"
numberOfUpdates = 0
if numberOfVersions > 0:
for version in data["response"]["docs"]:
# skip lib version if already existing
if not os.path.isfile(baseDirName + "/" + version["v"] + "/" + libDescriptorFileName):
numberOfUpdates += 1
date = unix2Date(version["timestamp"])
targetDir = baseDirName + version["v"]
print " - update version: {} type: {} date: {} target-dir: {}".format(version["v"], version["p"], date, targetDir)
result = downloadFile(targetDir, groupId, artefactId, version["v"], version["p"])
if result == 0:
# write lib description
fileName = targetDir + "/" + "library.xml"
write_library_description(fileName, libName, category, version["v"], date, comment)
if numberOfUpdates == 0:
print " -> all versions up-to-date"
## Main functionality ##
inputFile = "libraries.json"
libDescriptorFileName = "library.xml"
rootDir = "lib-sdks" ### change this directory to your lib-sdks dir ###
print "== mvn central crawler =="
# load iterate over lib json
with open(inputFile) as ifile:
data = json.load(ifile)
# update each lib
for lib in data["libraries"]:
updateLibrary(lib["name"], lib["category"], lib["comment"], lib["groupid"], lib["artefactid"])
This diff is collapsed.
/*
* Copyright 2008 Android4ME
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package android.content.res;
import java.io.IOException;
/**
* @author Dmitry Skiba
*
*/
class ChunkUtil {
public static final void readCheckType(IntReader reader,int expectedType) throws IOException {
int type=reader.readInt();
if (type!=expectedType) {
throw new IOException(
"Expected chunk of type 0x"+Integer.toHexString(expectedType)+
", read 0x"+Integer.toHexString(type)+".");
}
}
}
/*
* Copyright 2008 Android4ME
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package android.content.res;
import java.io.EOFException;
import java.io.IOException;
import java.io.InputStream;
/**
* @author Dmitry Skiba
*
* Simple helper class that allows reading of integers.
*
* TODO:
* * implement buffering
*
*/
public final class IntReader {
public IntReader() {
}
public IntReader(InputStream stream,boolean bigEndian) {
reset(stream,bigEndian);
}
public final void reset(InputStream stream,boolean bigEndian) {
m_stream=stream;
m_bigEndian=bigEndian;
m_position=0;
}
public final void close() {
if (m_stream==null) {
return;
}
try {
m_stream.close();
}
catch (IOException e) {
}
reset(null,false);
}
public final InputStream getStream() {
return m_stream;
}
public final boolean isBigEndian() {
return m_bigEndian;
}
public final void setBigEndian(boolean bigEndian) {
m_bigEndian=bigEndian;
}
public final int readByte() throws IOException {
return readInt(1);
}
public final int readShort() throws IOException {
return readInt(2);
}
public final int readInt() throws IOException {
return readInt(4);
}
public final int readInt(int length) throws IOException {
if (length<0 || length>4) {
throw new IllegalArgumentException();
}
int result=0;
if (m_bigEndian) {
for (int i=(length-1)*8;i>=0;i-=8) {
int b=m_stream.read();
if (b==-1) {
throw new EOFException();
}
m_position+=1;
result|=(b<<i);
}
} else {
length*=8;
for (int i=0;i!=length;i+=8) {
int b=m_stream.read();
if (b==-1) {
throw new EOFException();
}
m_position+=1;
result|=(b<<i);
}
}
return result;
}
public final int[] readIntArray(int length) throws IOException {
int[] array=new int[length];
readIntArray(array,0,length);
return array;
}
public final void readIntArray(int[] array,int offset,int length) throws IOException {
for (;length>0;length-=1) {
array[offset++]=readInt();
}
}
public final byte[] readByteArray(int length) throws IOException {
byte[] array=new byte[length];
int read=m_stream.read(array);
m_position+=read;
if (read!=length) {
throw new EOFException();
}
return array;
}
public final void skip(int bytes) throws IOException {
if (bytes<=0) {
return;
}
long skipped=m_stream.skip(bytes);
m_position+=skipped;
if (skipped!=bytes) {
throw new EOFException();
}
}
public final void skipInt() throws IOException {
skip(4);
}
public final int available() throws IOException {
return m_stream.available();
}
public final int getPosition() {
return m_position;
}
/////////////////////////////////// data
private InputStream m_stream;
private boolean m_bigEndian;
private int m_position;
}
/*
* Copyright 2008 Android4ME
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package android.content.res;
import java.io.IOException;
/**
* @author Dmitry Skiba
*
* Block of strings, used in binary xml and arsc.
*
* TODO:
* - implement get()
*
*/
public class StringBlock {
/**
* Reads whole (including chunk type) string block from stream.
* Stream must be at the chunk type.
*/
public static StringBlock read(IntReader reader) throws IOException {
ChunkUtil.readCheckType(reader,CHUNK_TYPE);
int chunkSize=reader.readInt();
int stringCount=reader.readInt();
int styleOffsetCount=reader.readInt();
/*?*/reader.readInt();
int stringsOffset=reader.readInt();
int stylesOffset=reader.readInt();
StringBlock block=new StringBlock();
block.m_stringOffsets=reader.readIntArray(stringCount);
if (styleOffsetCount!=0) {
block.m_styleOffsets=reader.readIntArray(styleOffsetCount);
}
{
int size=((stylesOffset==0)?chunkSize:stylesOffset)-stringsOffset;
if ((size%4)!=0) {
throw new IOException("String data size is not multiple of 4 ("+size+").");
}
block.m_strings=reader.readIntArray(size/4);
}
if (stylesOffset!=0) {
int size=(chunkSize-stylesOffset);
if ((size%4)!=0) {
throw new IOException("Style data size is not multiple of 4 ("+size+").");
}
block.m_styles=reader.readIntArray(size/4);
}
return block;
}
/**
* Returns number of strings in block.
*/
public int getCount() {
return m_stringOffsets!=null?
m_stringOffsets.length:
0;
}
/**
* Returns raw string (without any styling information) at specified index.
*/
public String getString(int index) {
if (index<0 ||
m_stringOffsets==null ||
index>=m_stringOffsets.length)
{
return null;
}
int offset=m_stringOffsets[index];
int length=getShort(m_strings,offset);