Skip to content
Commits on Source (14)
Release 1.20 - 12/17/2018
* Upgrade to POI 4.0.1 (TIKA-2751).
* Integrate/parameterize new angles handling in
PDFBox (TIKA-2779).
* Upgrade to PDFBox 2.0.13 (TIKA-2788).
* Prevent content within <style/> and <script/> elements
to be written in the ToTextContentHandler (TIKA-2550).
* Switch child to parent communication to a shared memory-mapped
file in tika-server's -spawnChild mode.
* Fix bug in tika-server when run in legacy mode (not -spawnChild)
that caused it to return 503 on documents submitted after
it hit an OutOfMemoryError (TIKA-2776).
* Upgrade jaxb-runtime and javax.activation (TIKA-2778).
* tika-app in batch mode now requires an interrupt or
kill signal to the parent process to stop the parent
and the child processes (TIKA-2780).
* Bulk upgrade of dependencies (TIKA-2775).
* Improve language id efficiency in tika-eval (TIKA-2777).
* Upgrade sqlite "provided" dependency to 3.25.2 (TIKA-2773).
* Remove duplication of notes in PPT slides (TIKA-2735)
* Use -javaHome or $JAVA_HOME (if they exist) when
spawning child in tika-server's -spawnChild mode.
* Fixed closing of styles around Hyperlinks in Word Parser
Contributed by Ronan O'Sullivan (TIKA-2599).
Release 1.19.1 - 10/4/2018
* Update PDFBox to 2.0.12, jempbox to 1.8.16
and jbig2 to 3.0.2 (TIKA-2745).
* Fix regression in parser for MP3 files (TIKA-2730).
* Updated Python Dependency Check for TesseractOCR (TIKA-2740).
* Improve SAXParser robustness (TIKA-2727).
* Remove dependency on slf4j-log4j12 by upgrading jmatio (TIKA-2742).
* Replace com.sun.xml.bind:jaxb-impl and jaxb-core with
org.glassfish.jaxb:jaxb-runtime and jaxb-core (TIKA-2743)
Release 1.19 - 9/14/2018
* Require Java 8 (TIKA-2679).
* Enable building with Java 11 (TIKA-2668)
* Add an option to make tika-server robust against infinite loops,
OOMs, and memory leaks (TIKA-2725).
* Allow configuration of the Tesseract parser via the standard
tika-config.xml options (TIKA-2705).
* Improve handling of empty cells across table-based
formats (TIKA-2479).
* Add a Standards compliant HTML encoding detector
via Gerard Bouchar (TIKA-2673).
* Improved XML parsing -- limited default entity expansions to 20.
To raise this limit, add -Djdk.xml.entityExpansionLimit=XXX to
your commandline.
* Mime magic improvements for Olympus RAW (TIKA-2658), interpreted
server-side languages via HTTP (TIKA-2648), MHTML (TIKA-2723)
* Add absolute timeout to ForkParser rather than testing
for active (TIKA-2656).
* Make the RecursiveParserWrapper work with the ForkParser (TIKA-2655).
* Allow the ForkParser to specify a directory containing tika-app.jar
for use by the ForkServer. This allows users to keep most of the
parser dependencies out of their code; and it allows for an easy
addition of optional jars for Parser dependencies,
such as the xerial sqlite jar (TIKA-2653).
* Use a pool for SAXParsers and DOMBuilders rather than creating
a new parser/builder for every parse.
For better performance, set XMLReaderUtils.setPoolSize() to the
number of threads you're using with Tika (TIKA-2645).
* Add the RecursiveParserWrapperHandler to improve the RecursiveParserWrapper
API slightly (TIKA-2644).
* Upgraded to Commons-Compress 1.18 (TIKA-2707).
* Upgraded to Apache POI 4.0.0 (TIKA-2552).
* Upgraded to Apache PDFBox 2.0.11 (TIKA-2681).
* Upgraded to deeplearning4j 1.0.0-beta2 (TIKA-2672).
* Upgraded jmatio to 1.4 (TIKA-2667)
* Upgraded Apache Lucene to 7.4.0 in tika-eval and tika-examples (TIKA-2695).
* Upgraded junrar to 1.0.1 (TIKA-2664).
* Numerous other upgrades (TIKA-2692).
* Excluded Spring as a transitive dependency (TIKA-2721).
Release 1.18 - 4/20/2018
* Upgrade Jackson to 2.9.5 (TIKA-2634).
* Upgrade jackson to 2.9.5 (TIKA-2634).
* Add support for brotli (TIKA-2621).
......@@ -58,11 +175,26 @@ Release 1.18 - 4/20/2018
* Fixed bug where TesseractOCRParser ignores configured ImageMagickPath,
and set rotation script to ignore Python warnings (TIKA-2509)
* Upgrade geo-apis to 3.0.1 (TIKA-2535).
* Upgrade geo-apis to 3.0.1 (TIKA-2535)
* Mime definition and magic improvements for text-based programming
and config formats (TIKA-2554, TIKA-2567, TIKA-1141)
* Added local Docker image build using dockerfile-maven-plugin to allow
images to be built from source (TIKA-1518).
* Support for SAS7BDAT data files (TIKA-2462)
* Handle .epub files using .htm rather than .html extensions for the
embedded contents (TIKA-1288)
* Mime magic for ACES Images (TIKA-2628) and DPX Images (TIKA-2629)
* For sparse XLSX and XLSB files, always output missing cells to
the left of filled ones (matching XLS), and optionally output
missing rows on all 3 formats if requested via the
OfficeParserContext (TIKA-2479)
Release 1.17 - 12/8/2017
***NOTE: THIS IS THE LAST VERSION OF TIKA THAT WILL RUN
......
......@@ -342,194 +342,101 @@ JcWAy7md7XR9MiVgSQuw040wqSzcSA5M6RCFZ9gN+G0kP1CNZ5vDz+JktV4nJZzh
JF8xV9E4P/Msl8hqmOOocZ4LDJdw/nt1UWlUmattMLBVWdSeuu0=
=pYQ7
-----END PGP PUBLIC KEY BLOCK-----
pub 4096R/EF0CF38A 2017-05-16
pub rsa4096 2018-12-04 [SC]
184454FAD8697760F3E00D2E4A51A45B944FFD51
uid [ultimate] Tim Allison (ASF signing key) <tallison@apache.org>
sig 3 EF0CF38A 2017-05-16 Tim Allison (ASF signing key) <tallison@apache.org>
sig DE7B39EC 2017-05-17 [User ID not found]
sig 13B86349 2017-05-18 Ismaël Mejía <iemejia@gmail.com>
sig 5ECBB314 2017-05-17 Rob Tompkins <chtompki@apache.org>
sig 00B6899D 2017-05-17 Christopher L Tubbs II (Christopher) <ctubbsii@gmail.com>
sig 26518FEE 2017-05-18 Dale LaBossiere (CODE SIGNING KEY) <dlaboss@apache.org>
sig 3 A400FD50 2017-05-16 Akira Ajisaka <aajisaka@apache.org>
sig 9FCC82D0 2017-05-17 Benedikt Ritter (CODE SIGNING KEY) <britter@apache.org>
sig 1AD84DFF 2017-05-26 Daniel Ruggeri (http://home.apache.org/~druggeri/) <druggeri@apache.org>
sig 9C1750C5 2017-05-26 Coty Sutherland <csutherl@fedoraproject.org>
sig 8BD1DCE8 2017-05-26 BigBlueHat <byoung@bigbluehat.com>
sig 47085518 2017-05-26 Marcus Christie (CODE SIGNING KEY) <machristie@apache.org>
sig 3 0FB52BC6 2017-05-24 Ashish Paliwal <apaliwal@apache.org>
sub 4096R/B0007A00 2017-05-16
sig EF0CF38A 2017-05-16 Tim Allison (ASF signing key) <tallison@apache.org>
sig 3 4A51A45B944FFD51 2018-12-04 Tim Allison (ASF signing key) <tallison@apache.org>
sig E4032DC4EF0CF38A 2018-12-04 Tim Allison (ASF signing key) <tallison@apache.org>
sig 214DFD8C4C75EA05 2018-12-06 Kevin A. McGrail (CODE SIGNING KEY) <kmcgrail@apache.org>
sig 3 317C6DF83C7705CF 2018-12-13 David Fisher <dave2wave@comcast.net>
sub rsa4096 2018-12-04 [E]
sig 4A51A45B944FFD51 2018-12-04 Tim Allison (ASF signing key) <tallison@apache.org>
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v2
mQINBFkbU+0BEADI5xZ5rYbBfbZb52mxVwzhNcVoOBOC3zc/AQ2QuBgLo5MNfWFv
C+ns8Ze/H63r/5CKXz8k2pHbdqggRKJopW37L3/L4IpG3i2331XUNnFJ4UtPkIHN
FKW1FRs+nSaWCjy4gpQ7qyFMWS1G6iI3HvuvpdBWYzbBc2XvmIRTBL45AJFp3Z+p
4jOc0nKVIxjeTkk8RVOrzn3OemNQX5LlyVwBXhAcDZeVij8qyjh+/cpBBP8Z8T8t
jyoCc0YEz9DD+p9feFdJ4s4MmGO8XNU9AjkrgVy9CtkrUk7rbE4JRfUP1RAQZgBs
uzCcZ110oggPimk4wIf3rKsHZwglz4jO3Tfw3asC5YSsuUL3DTaXzohBeMvY5d9y
QHBC850IRBGlYTeP03Luwkdh+NQ++Hb4wk6wO9CUTREHxXRhQeOrk/1FUEZJrr5d
RknXX542dNQrmMO9Tv+I7AFljuNui2VO0nnZXypO1/PH2TsFeg+VoP2eNdGqOaKT
y9gv87g/fBDcQrqYBRdh4Sk1pGwd4wurfLWLie9GOeZSw6yfmY4xi+3qkzzy/A+8
YzXxRuSaSPN2S1wrQgtt0ldkVd8Ls1k4VTKZOu85dNk/yAoXv7gQSi7L3+SOgmur
NfeCae4tNWDGLOWZcQ+qhESTZIjZADtPgJSesdU91OWmy9ZEbx7QQfkolwARAQAB
mQINBFwGkIQBEADBkuDMpFPE7nepfcYCWYYE3htn9apn6TudnT+GgeYue1g+XKct
c5fq/TiX0OOEpBt9xKOikSOwMTgdW6A2LKm568qgDN79wMleHFHel2VOiV59z8bH
AquBxLk8ZV26QODEjm9voOMXsf/qe2Y+lRiulL+TJP9Ze5LklB4s0rc7EKERLhQ6
EQC7GXDTp5IY5OHp+ghXknekjgN/KbQqmZm892qQ1qgRV3Rb54CxPJeEebWK2epC
wQSjGraka4Hrre7eIB0Uuhri2ll9O03eaBqTAQzVdz5Chl1WnbGWsxIb7tL+rw8J
uO+R2vHazZZIVoRSXfsohPu/ZkmpRZXp1IPLeJWs3Z4YeZ6w/PDCYRx0rbSlwT45
DUQQ3AK6iT5nh7FqPAnYom1JXELogw+bMQtWVSGHujazC62cox9UmvZdZs9H+bIW
xRqhnqltuZaON/u7bKsHZWLifWJOd/8D/r3u+Ly4pf+LNyQY0cSC7hfcwg+s8cb8
IbyuNIAoQvOfrwgjlHqfeWErWe26aQkBjtFtiI9AytkeofRFwhwjzBhCa3L8xMIX
Ee+1y7yrvsgyblMq0be1+amQ3KXdf7TI+6nar6Hy/+G95kMvynwzwd1ph+CyXF3p
Ym0u6Mgu8CEf+Bx0YZSEFfHO8r8w+m4K5tQIRE9fOYs5s67+FdTOqXl2KwARAQAB
tDNUaW0gQWxsaXNvbiAoQVNGIHNpZ25pbmcga2V5KSA8dGFsbGlzb25AYXBhY2hl
Lm9yZz6JAjkEEwEIACMFAlkbU+0CGwMHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIX
gAAKCRDkAy3E7wzzigaOD/wP+TsPCzYASlMaZARXD6rNUSKx1CCasJFKBUL5vbhs
8X3LBp9KTVkuSsURQkTnT5swOOWWuDvCASxmbtbjZJfs1b7/lDnDl9ggeU9wGjQ7
/9tyzRvhf1a56qiIQ6Tc0F91rdeBssIIe95jgw4IhoCxp121RWTXgUaSVfUuQru8
aQbpdRt5qWCs3s3x3X1UWh4L2i7f7+3JUEiq/1LIVMZZRbkm6MaoBuKm9GXJJX29
zbg243vA7cGZJNgdZSWz3B5qmmjwEovsKqnY/oV8f2Kom0Ggm1fGiWJUnVc5N4FF
qmXooraY04dW374osKK4MZopp4OSxBhmEGMsfE/YX0jK2tsjTCHZlMDQZBfj/5J/
XJusRSXYkQBKIWbIJQz5sT2FgfCrAoI1VQAmAZS7IPr2s7px9xoIDAZ9w2XA6k0j
tjJXXZ6bMJiV6PzCHXnQ5BEKTOE59+7TDxrlUrSYaDO5Aix3qEiyMhZZdZGv+2J2
vXoThynV5N1GL3WDlctrg/r/BfbDW6y1OnhqZCdWy23wa6XV/IcNlr/YpwWNeYwC
M/THN3qdwRj4o0FRbZIytQT+f0J1aPiKrTvZA30S3ocm99vwdI+zAW6iuEqZlVbB
nPWFf4tJS69CjQKjHJ5WDjjs78vj+mIfoz/KUOln8ilwnl9Nq70hWGl1LIRgGA6C
WIkBMwQQAQgAHRYhBIc4OGrKFRr/qK4z3S1tQQDeeznsBQJZG6h+AAoJEC1tQQDe
eznsj4oIAJDeoDhPzqFCXoaFrDXMl6NYwz4JJxUsacEclvwXTwUT2xgt4HIsp/mg
7JjTjyl7jZt05yKl9tciKhhXANpLErFWDdAZf6ogb5n8rnxwmqQ0V/kEHt+4NqiT
ijPJ2USmvbuD6X8n2pq9q5G3HzVflHOi9QWzySa9Zbtbbx8oECL/b0T1Z4xyJY/i
/Q51lpRpw5o8bh9vfFwAbKK0GBaXsqJFlkXEv5iRdjCOAwK6rUnrQKFrGj4AUVKi
cHEomUA9jmTAR9otI1FWF/1e3R6AMj7127GOqi/hDsxnq8mBfwmbvqS/c1vF/EQT
TERu7apr2uTGgfymLn3QPD+Cpgb9x3WJAhwEEAECAAYFAlkdeUkACgkQCp2vZxO4
Y0lMMQ//TY92TfHGNrSHzC2Al+zMbyTJpasOIKLG57ZqiG0/kNacUPeURo/nrIso
OmjsoAkgw6eIglrPmIrcpT8LzPOGI2FSkYqXf3rsWkbBTLYGbx/cGXwjTZLxVOR2
uDPKaDN4pSt2IviiKqdYRt0vC8DUedCj77NeY3oG3R/dwBStYI6iSTh4Fpd1KjkV
V4RS/Q1UX44iRyZECAYjO50YrQcF3Pe0Ki/UWVGpozC5chi1ilHlwyNLkqzT0di6
S8wVp2RG0T+B6+UQejmAeSQPJopBtcXWz1RCu3CiT5uSG7fdQO7HTLWtdC4cvi01
idReRbywE4sqe1UMm0d7LfTfi+60GPN5Okmx5aHuXJEIg61R+k8Y8S1mn/dyc5HP
V/nvCyoDItaNNMUOBgQDDWGgndOX4BSqlIUG0JYi97rvhdxtyKo20Bzn//CF7bea
ABbi2LuVMl5IuNtw5XwaYJr5nF/aVQUSwXUghps2bfaHpCDIdjF/hLJ6d/YT1PX0
Cpzxs91RSkWRePO+FYQ4gkKJpcT+kaEd09xfwjDXoi1loRJM20IDmR4Skp9lWCVl
9qcL+ynJ6KWEKnRdDpIz+n0FBnzg/FEqJKQnjUyFPhLS1e8O3stMOiQax4N4U1tT
JID2KtzXxmd3TvwOVAyGBH9abYybp7PEWRtX8WlGe5VliQ3zMtCJAhwEEAEIAAYF
AlkclMoACgkQP6rSzV7LsxRrLw//Rnl2dHj1e+2F2cckT8YQWv29Pp/2M3LV5cC5
i35igmow5d+SSFZUqYCQTa+WCAMaGt27ImL1Bkw73dPf7eTAvOcZEoLQ384W4g4b
5CYAysvoCY1nEr4PnZ9MYLW6W1F8OKCrD1elB5HEaDrfzbaqGB3dxU5DsZHYeGzT
1C/MqV31SDwztiZ7wwAmT2mgEHwFesqKNeolP45knh6YMaNjrdGw08ecaiEUe1Xl
iL1CKeFO8OMPxEErsQdlFNK9AG0/boqTjDqYMZtztPy3WMCEKhyDXOwdWwlyY0t3
PpMwH3VWEGngyFtnZgMEUwsMDGGQKTugmTt7Apjpb0Ytv9HPwIZNMkG1BdaVITcB
imZATH+zYEVQIPBZ69ZOib/0eyVwUBtIDjdAAmxXdT4H1RcyCIV3yz1AbWU/KBa1
Pxri+bFGnbpL9dia4mG1ODbM2Ffdhl0YDn/p5UPWXTTrd/tL6SCna2VG/8i75yJZ
apxd2lIRBIPNiFZpPYRjSVd5WnSinMHzNudjXjAWDNPVe/B5pcjCgzxTbhD0Qh/3
6RGq0mduHLnoDtndqQXS0N73Unx3kO5/ED1mULSMFXtC/NVDiZrODpFx3bVTIlt6
pnUQyD4QuinKYrVYCmIdI5IZkqd4yCO8x7IfZ/jxCWaF6NOU6M8oVjh6qWDmzkoi
NOflf5eJAhwEEAEKAAYFAlkbqTsACgkQbwza5wC2iZ0CbQ/8CAAfDCFUCkOPadT4
U+ZzNXRUY4OVFi6dRipjYk01G7UJEkMSKBe7agEYFU31sx2NZYsn4GPTDAChRHzO
w2WQPx6pjHyY7Skj4B54p5cv8RZJZp+F3hAoRJ6/P1/zqEN4PdtPhbQvGrDk8S/u
Y24MH0EfACaPhCRnDsrXwToEOlN2YYk651ZWLb04X3rSMb6C4aSU79HBbKTrScHc
t6sWKqISNztQzzFUXIDZWW97iuWw5+QNTAJ9bPxTOTDz4eb8hfpjRTu1FDn23yP4
CDW9O0mSc7ad5R9Vrji+EDMvTHYORW/hAJ8XvxlPlmOgbuCN9wQlKtYwjZ4L+by4
3bKqSpG19soURgfjmYubD0Prkl7wQx1N1uh0S/T1R5EjlQ4pro3vWdlRliPQs3/j
ue3x3eFXIRqHlvTVbtgTE+VuhKTEBb8eLtvQdjhN4GpWIqY1VUllb8sDCbyOQiVl
v4eXBHssS/6onNPGIFjyKDjxseXjhsrjs+Ovo53NnU+rkoTomfT0KHWdgn3h/Inz
31OXCqNzsgfrW2pDLCQn/GG8Kb5nOOfMaWmG2CsjmHyNXwAyRnzS0bVmSLbdI4wd
R/8093GvSgGB8NpPIRMTYcwPNZMNz64+Uk0A2SfuH+u7Exsk44OpMPtZzrTbP1bP
4W/ftBGkeiIHmFCltNDrji/yviCJAhwEEAEKAAYFAlkeK/MACgkQoqsIHyZRj+7v
dA//RxcgtmpcaUOZaEUZgay7x8oqVntf4BKuj5BmWmjOFbShA3V5erqm2o9OlC1V
B5pG0JTA7EnQ26XlVSHe3TyfcJyuHubHKoYjZBUoOT0Wj+9CwmXZyfuoDmq6zTpK
Fn9jQXnJniF58soxW0vtlUMzgi2uuU7VOKjG5P27aoFaqSLPc3U7apSwZkU15JjV
6Q2P2PC25cRLgA9WRLCOwptkzqlGi1RihXSmyX5i2ROTEwbhcCCw/C6U7xu+Vh3T
RkL58PGpmMGngXCDZNJzlqeqh9qIxIVI/lNONEh62ig35AhmZp8aZ/zDOX8+s7PH
fHCIAsKM+LdTGtkFxdSXbo4n4Ov5v/QWzdGcWTl4j08gG/ikToR3Jc/BYqPbPHcl
Ib2AFAFSmQysdV5hsHQNojWSc6NBbpfdJmg4rgL0VPou5aUazm+Izv8aLcyNlDmf
cP93o19hVbu25LT5ZEBtBI1dFfjSwXTUWSFonkm9LY7i+wy4/o/sJf9Yp39jAWb9
ZwumVjQJOSSM9AQFw6Aia3JyRvUI4E/4FBJi9MUuIi3b4ojlqO7cvQs8x3ngdr9C
0WUgGTHq3mWvb+zXEJxM4jx2sPzHMGvvufx4WIqICovvXtrlGn8cwCt3iTSp5yQG
w2WDNgw3N/pAHc4FU9dftj0uSwf5NRpWNUPZKvyzQECc4haJAiIEEwEKAAwFAlkb
kPcFgweGH4AACgkQwe27nKQA/VBeQhAAmimPj9Vb6zMvpVDopj2v7rdKbLH3wijg
ntvhRCQn7RApw9I3sm6p3QXc9SXiRthHEef4zJyR3DFt38okqjWuN1UZGGPVJuTp
Fwpig1IheVzHXucltpa9WtMJDLS38MVg4jo5b/a/dG96rNZMi5nc0SR7yg7L6FsL
jzhMeeJ9MoIVtNQHFQ2+rShF+8JrQFji1RrXqMIxS/vglWYdt2DSfLHgQ7MRstyW
sbAi0mLdh91anPX9tCAnHTc3X0Uomp/pM8BVHEINJTOTqyfHLf7fSna98rB+kkKN
eUjGN6cPGa0oPCSWrTquC8AhxFCwgpd+8x3fuTMwrPkqFC3pvReXrUwTvjJsJotn
QfQVZ3I2OcVCavPN7Topwwg/sryw0qJ2TvTciXzAIC6rgQSkajYO2vrY+UOAdsNv
W9lYhovSL70N8WTmDDB6Nk5U91oZ4Oq03OoL8Nxoqk+r6Oj0qbWO4QMefcEoAuSh
PyROrKfrl2dPbqsYQGeCBjeUjvm3ipaMpp3zjw3m8NsbmlI95NCdcW1eKG+LSrTE
N7g5c46b2XHLOilrwDogwPCvx8+NnFh+YdqMS8Cc/+gTvi2JTyg1g0wSATGzCQO0
VkWHH5/n3Ds4U3mnGUL2TqC2xx9QPH+u537j4b+AnJe8xQGfsBqLvBsYMGyBYYwu
JSBpR+cFkRSJAjMEEAEKAB0WIQTNVGQxXwuYx35ujs2dqtwcn8yC0AUCWRxKegAK
CRCdqtwcn8yC0FhuD/9l+pvZaEsZtn/5E8E6EUFJzQ+lLb0SG2bE4gRa48eLfGdd
6hbz/wOYm3u/PBlL3hd7Yj5eMo40/WUdHbOk7e/x4yBe5RH+i/ZwMhGfZsodYQF7
b90z/EAbClZ0ni7qg1MFbL9ZQYPLXJAP0mzCBKHjz9FefpHgO1wjY99hMmZakJjh
gpg6axCr6AJRKZBWSWusw9BwGQQtdgeva+1SBS4ZNkN/jXnyBbqbtt1Q9ZQJX7IN
RUG93SyWWNQ0FHEs9q5jHbFcQm6jpgxPfNEDWluZt+Z6AGLUEdlBgg1saou/NPUA
xfrLG9TYCl8F9LdiZlvXk5EkNFDmsYHc5iRFPnXPPLm3kDoVf5AXEmBfwJVOqeOu
o1BmEeJkf3GTsVaVww3In3UniqLhdYzgQGuZaJNsq5nzIcmcx2APjEjC8ih9r6FD
xBeS2s2PdmlcCBhpjaftBfbpfWh9tL8pi10ZWKbk/ZfSGLobjLxFpIz0jt4ya9EN
fEIREOWPug/d2qCAliPwl0+ckXU2JVp6Oo6b192OYxmjpUH5AzrbSvoE5MNeP55J
jvxxwL1ue3J6oiZ7b9d/OURoQhJDEHK2Fz4HSHUxSgS56bhWuu0z0SZF6mmTTpMY
R8et+0WFBBp3jOMAeEyzYp0gAjlBkQGIgMg0Zr7QjZ/VlYnelddQpDjXNowJ4IkB
HAQQAQIABgUCWSib9gAKCRCZXjUiGthN/4oUB/oDXVmi9aIT4409stS9VdKCxSaP
DUs9V00n6QUSoPREiIL9nCh8W66J08vB14WlCP36unhWgctLFDD+UMzyV0nLxCIp
qGkYLuai370JvbQcFLgp+cJeC7igaIA5HOGCEhhmlxMLfVMePZqWszFccgk2lxDp
9LVCJSImKIC8kRjSaORLqHQrugvWPkQGOCJR0TuS6CTkVBejZalV/iJcwql5dZAu
Mf0QZEMu2Pxf701SzyMSMn+9J0OCNZ+xgN26af3ZPrhQKLP8c3q1Gg+tkhd70l89
XkOkevkg0r6x576SoL74OEzrQep795AjnFGqNYtFhkjtJGBkwWZoyGTKwUpziQEc
BBABCAAGBQJZKDiUAAoJEAPClpScF1DFYdoH/jv0kHnPk971+MLmYDHjaLJzkwC6
m42nHIFTQruVJfYr8JBPZBhRb+mfeNTbeJfuzxUqBJnsp8ZqhW6JLTm5A4l6im6o
6w1wE0hfKqIdSvZDtqZ/qNIqEcuRh8t9lLDgvQcpwwIfxbWgqqEcSnPnZ0JkZVD5
KQXIjYJngJns4e3RA7sfFPXlGAW5jPM3DwEub6/nYvj3IlSVpBWd5jzJB9y1VcDs
Jt782/PBFiPwuKoSWGLrjPII9+zzs7RiZzvML+ajDgaAYDH8qq2lBP6K5QKgnIff
mzTHg4JMYAt4DkcidQJsGE/FBCrZY7V1NJpm24MFDQLYTu1VZf73ScrMrSSJAhwE
EAECAAYFAlkoorYACgkQIXRkoovR3OhwnBAAhbnzAYPjoabyyUvOABzvT7KEAyBe
TJaQ7bpWyzDpULvSef4HECEhiC5VVGq+vU6xLhjuVG0vwV+ipx4jIxtFjRhDh07m
AT0q07FnGQEDLOGQCPcbSy2L57bXEh8HRyaTMU0tzz+hKrG9ZdDxsLnPk1rxbBa5
y4FwUzdA1odZmCAYyHvBMpcCbXrFnNvcncFsmxCs3PfE1EkxoTRJ6soqO18xELRV
Jo4J5BBDoul2u4u1nuNVtHm2nGHiu3jmrk49YcBaw+OTKZ9Dx2tQbQMlld/4CGr7
H09l7NUcsDIbS06V6icCACJ86KtTGZRwalQKm7ScBf9tpmAnt+0mJIYyhykZWdzi
swW5TOzxghfg/wa0bAiJ6LRwoh4pJ6qSk6XLPdf7Kgje+ivD32Jv9uG/AtG9mNu1
8jmzaNcrdvqzIsTqGTJOISvTX3dT+5IgdjTgBp65e40uFMeQ3aT3c1EMAqU806zq
QOGgESRid9F6gtxXwXpdmE0Sp8s7MFsOubsdbCxuvupXIeFZ2TQqsqJMyWLiDYWZ
vWTbK4nmryNj+M8HsPRY3mqVIOqw0SUTIGg9MvyyxKCspI6R9NhLw/TCqGbRfizN
HX6ZhLhtkESVym1tGpl4aWmEy8mYl5fccntUrBiIMJv1XbXFfG/dEnfQSRYXuktn
mjIqJaicTCgVoBCJAhwEEAEKAAYFAlkojR4ACgkQnEn0IUcIVRg5Pg//R0OdbG/9
pZmGIxJeRExWdurVLUxrzwI3YCOJ0U/9an6QeLm13J6o3UTqPwaLwMZsX/9GUmcr
YHRiDmOkL0RLPYbRTaEsEnOmaxrse2jCXhFl1A0pFUfpeMs8iatRc28DOFztLeyo
rvUiidkScwvhBnRV8N0S54dAIUwXnLD5ApoE0xqPNTzKmVsl1/vdL7TAnWL19JVQ
nVN1UHPB4+8rDkTlierA7uTfzAe5VTIVYhglAPan/bRzwXnW025nHQttd4px5mO/
iEVWjOwg5/JTbJVDLLKkgKdI2TGRZH5xkV96rlvLuMs26pC3UfzGu7lBgf3Y2UP+
touSJqnXFyKWt0JBLcQwNzSOZR7Ryfp9bWjbFd+vF+749fgIsTZejcAAk0pf3W94
hgNEYUauO3GevDIXJB8eVTJq2DKKDEW/3+naU+Ezs16DUicWPGM/KquB4Bifn1Bp
qPWxC0GzBYFVCBEJIaQnlxRXRFuZ3QoSTO5i1K5nZdOhR77/mQuuyQz9bbEpTsGD
9NxlSIvKennUE2NiYHleXRbiREpPTRsG0spTYp/iGJ4shAlZrE0CMaCSv2mJELsI
thJ/xbIEaeZzqgbBtKPiuK95eAgbxXQFl0bDMXTtzXrjiH2/EHIRMxYuQQ7ByyyN
3Ve/wH3GxPS5xxRyNJhZ52NsPdvKnXSYJjaJAiIEEwEKAAwFAlkk9GkFgweGH4AA
CgkQA+K/Hg+1K8aPzg//e52C5JNm6BpMSQSrwpG1YV8J52dJZ+JizpI0iOYyLVAu
FSJ4q8oe+gMW4wG4QAbyuwKpEGC/bM/zJLs2UxgUF1tYkMWPx1FTGOOzCauNFSOo
/PgJZoVHbyNY0Ltxb1MIFPbSuSz9V/jkOmbFXDwibvQgLH4Iy0opyHbOAjuf+55T
7oQ+2CVtMjY+NFDx1Z2AfWXwOjsttmqGEeCA/NGtOJzNrARArM05kBBY7BQUNJvD
iluoYWAGCduseGnUL+aTxLdefZ4VU+8f7cbYELhaYFSkdSORroWYYcyvl6i8foVi
0G3DT3UIZl029oSbxyZRoa09X02oqPluyq3/KaGmDNGIYBchxF0Za7+x97IoYtVi
wUgh76UGnPU16GKBCjevnWYMW2beJ+0ry0PG3fMDOe+0q4uhWSEx89yr9vjNP2q2
KnW6hiRAvIOC0QUo/7A8hLcdQjb7xQd76VVinxNsUhVtpf3N6q8NudIrR6HLKNlf
W6RXR0U3Y6bW2AiQWyehN095GounpOK4XU7JvetP6+79wB7gPAcbKxIWfsh1N5Kc
VqbE0q5kac+C2t7gO7N5Ac7DkgxfZgSZ0/QojC8Hn15OswwSVmpPdjAXj1YkktYl
3htV4+uDr/3LYNrllkbvLkgnybJcF2dyP7myDdvsBVf1J7ygKt7IqE9iqS3JRTi5
Ag0EWRtT7QEQALcx2V96b1JAX5Bp+xjcU7YGOQK0bVx61AQTIW2uCwyjFHqJ5Tmc
/u8ndQA6Fw7ANzrpXEmMQfVzg/1tqpYBJa8xfDTOSnDHGvbXvqz3IthpatbJszqC
xL0qh/8pHHIJcW1FzF6+aL9BjuMTsOF/jIOmr08T29Tl7WH2lS5V9XDgFgooXumh
5Vm/dGNqqszWHdrrqLkDgdo+CjjicdADLoEhPK9f6PDTL1GZfTmIfES0jDmv1bG3
Zosj/XZQRbkrhkvKwXjOolTCaArqc5G5ZvBccAP3ZpyZvASM2KbdASapcHc4I8UW
TtsYBm8ecjntnxZHLfQxKNrj+EFOOzNelsVESjmNdu2Q4d70tD2Bjm3pfnRAGp/t
qwBi7P/9mYTYECH3LfyFR8/ZiSKl6+DSDeGh0eojhgLbIJi1/mZhvd8i5Ezr50ys
yHW7UGq+zyXkB86sDsHTh30N7y4gxfHxdKq0TjC1Vc7aNtA6Cpv3QKj9s0FCwvBE
awMtAG5TJXu5LjcH3aXIfnwkJqzntB3dAzh70YUaID19ySzPc4vUzH9rj6ccoN9s
zgmSTM8I2IcDcbIKbT2dmqr07ma/VrJr1rTTG6vJ4CaO/4qHHCi+ZmHGcaf7mR4b
kGZoXP3hCGSDvU+QNkC7DV5wstoMvzxnqAKr7+evN024H4iUM9Qt5sQTABEBAAGJ
Ah8EGAEIAAkFAlkbU+0CGwwACgkQ5AMtxO8M84rlthAAi9XH8Y7N3OmIv5MLB+MZ
C3BjZs08C0CcThsdPg1mzyEzOJp+FF1rHTMA3edHBuo6l72rUqdk7/Ddt/4p/oEE
rWSfwj+iLUoP0Zl4bNg5EH7wGfufQ4AAl7Vi3S8L0xliF1w9QNgEy5pu342EC8jJ
gcxXVwqpU8c+p+3M1xW6hWO/qODjhngHPRmnK2E8ULdsVL2eoz63ZXuNjNnac+fY
SYfAjdOuXYIOZFO9tX9b1Kll6XZPy4fvhSYKPzunxmaJ2S2Rlv7tAcKSg0Hu8VkU
h3hie2AHr2saJnSqlNx675Aw85F8D43/OT617V+/pzMDRnxlPHsBe8gqhYadSQCy
v36VNlXrHe1YNqiSNmI6QvmnYTdcXa80g9zqybBS2Uju22EMHP/7O/VyJnPH9AaV
LL1YUuOepblLnJYKVqF2mnPpDg2vwhmrqKMWsWHVxHMNalwXN4AfeFFuy5BrZ6JG
j2+9b4FHq2BFRpEJKSiUw7HTii20blgFr41A8M37VHifSWObGw2ynmIKJ++yRTIn
ZDUTTDfgOsmugADD2JBNbm6ydRtwtknsJPUlhmTovePrTi3QyoBvLMhK9Po3Eg5x
L0CyRJS02Y9AQHH8UVAOMihn/qUkdP5sQNYqjl4DyOXO/GfCiVmjLJEMGrTW4Ozu
sgtLYRPuFHFHzap0YZCqEGM=
=/R0q
Lm9yZz6JAk4EEwEIADgWIQQYRFT62Gl3YPPgDS5KUaRblE/9UQUCXAaQhAIbAwUL
CQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRBKUaRblE/9UcF7D/wJ4+f6BQSwjSXv
cer6Idxk8EI9BBS29ShiczfumED1pW4jvNdnwfloOc6BnuX1Vp4/ip3RNqZSwnBE
c2Dzn99FIvhqlQVhjv+271sZDF0ADX3B49Qx1GAW7i9khpjuaw+2b0+uPoEWDrKE
GaZP9aKPHmiT9CpPgLqEG8E5JZdxnBGU41xM/iL2UVpRnh8wQyUGBS+Q7CdZRDLQ
TCd6s97sVmPqniX6EjGznhon93skRkAMMfYFAsXCXEHGZzhybZDfBzf1p5T8WJqp
rr2i9dv7HEyPeBWR1LavlbCTxSYshTyLTFhYY9QnjmXZyPcNmA2pNq87xdvaWqav
eSf6Zk0ZWMPKpsQzEm9Ls07hX+j7EuEEvCkyc9/wClCv99K6KLZpKGQLHL+V5RWR
c2MJTLSHalTYnOzbsFncd8ysE3k7kneqMyIH84jFo9HekGlpZkYWfhilEXbkIbjL
ONtmerAKO3DnIwH1CXbleFawQwFE1Hcc0PfDkq1Gj4szr85GYAl/xBbRmX9NKdEz
6kL8O9ar99UYyBnzPMnKS6JCLRguKKFNVbyqz8nSHsTVuMFqqOMjB7NcBTcwY3iq
mPA7xxXWqTZ3sbEcqCf9I1jcwgEL8Z2ICdA+RP0Pc+Fcj220xDa9ieDYpouP0TwJ
Rj5hpXXh8N2p08Ff9/8uRfmdMlTzjYkCMwQQAQgAHRYhBIM8HMSSbB3eKbuHMeQD
LcTvDPOKBQJcBpHiAAoJEOQDLcTvDPOKzaIP/2KUdzsMUpU70MDxlgq9oECs+2Ko
FHb/FWiO+NqXYuSvmvLm0TsUasXlB2KZCJ2dS7pzNulbsIblBZX331/qF4Uw7zww
SwzK+hksZ3mtGANGCHN8lBxi6yVP83JgXoy2hTnWrV13W43k5+zrkAJmGE1lss/U
rU2KVpc8SXe4nGLu/pR50qLFdJ//NYRsTcR8ChkwGm8LKfzqQijOGB1aenjM9Tgg
tzD5C8uCaizjkEj6Z9z8/+6J7uQNEz4SJrDUcmVhXWU6Du8U8UHlrbTJS5AhPSvQ
fkvJ6WAz+dde3DjNX8a01DgPfGF1pVC2DKpIS2dIrfpeqT3yUr331fV6Jsh6Z8BB
Z6kRTQV4lAmv1V0JzXgN71vthb2iJowuJvrGh9VLOxv7FcdGm8EUnmSo+cWMCx1z
FTlF8L2BW3ONL5s2MlfQKqo07mBlgxeHEpWofuPGutKP1JBj+8PgQP0bFXAIBhe2
sBHmlWeXlHv1XcthUqpQXuUq8Do9krcTauFqnDBjoYuOHnsdYaxmFt5CxI7rUlbM
Oo03gXI8NmPbTtyrej7HrHIO/u9sx7DUsI1EhMRQrGS0oesOXfWNekrGOrW4RhhC
WHHbtaxIGpr6foKslaJCeNFWcr2Pt32+wEm83zLTyDhmTguXje3x+/LwpdejXIwn
4/EAgz45rqH3dTMkiQIcBBABAgAGBQJcCUt1AAoJECFN/YxMdeoFleYP/1++dmUZ
CnRQICBe55WQG1Objftx+XdSiqFuGH0GJFbBXoSyGKkUyHbxI6sYiIHv73UgqmwR
Tj053jFXINBAmho50objoTWDAaJDjsPRGUHZBXrpWMbX8OTiaYQTC3i4pX/6xMQ3
iT5SXmkXa56f2tcgo+HgUhzRMw/QKoxVSg9iuU6ug5sRRK2Fiam467xbj9OCiJg0
hd2yS7RravcX8vmuIYGYfsxKu1o75Exq6KfnE3gf8geuXEXpTya++F2ZdQegTNP3
zULlmOCpZJ1zwvSAnW29kumma4GlE30daCpJKJ3gmi9XF6IaqZy+XbGU3G0AkP+z
8vj5kF2oNDrJMTSNnEDJypUwaf/T9vJLH6yRRtqCmLbbR0NRxbvgEJzPtub9UblF
qmUjmkSSRpbxF/RTJHbUOA3/Xo8t6nVGYOlqZIGpSZ3lH83CZUaDCgWT8Fr9m2cR
8KQlYxZzRkN9XsUI5k9LlwgAVsafLPBn/fd88RAMm62ZXCY/mxlnb17pEiTxZHUP
RKh0ypZHMap+Qx/YyNkDO2zSIYXixUOHgBGSQY3fX0hUBWZI7GqL5apQ/NZ2JsSb
Dug+1lKo7JdHF/XO43q8KiQgvDcgjCvNYbdA7ReMqQsNnhWFWryzXPXOgfMaaAVw
QZy2HtbCVsorPzj8mFACe4WrHLZTdkY1ImfHiQIzBBMBCgAdFiEE1wGSaCaYM3G/
6+XhMXxt+Dx3Bc8FAlwSsNsACgkQMXxt+Dx3Bc8DcBAArMK3cIpJxqsvJli34Pbm
A8ZizGRlg0inBddFDp2s4xrtTwZhzwIvgTbELXalH3SiB0NdOQ7+OeA8K5YS+KqT
M7IXr3NMqOJM2Ozo+7Bvs8KGwXBW76c++SnJKhKytmAQqNSakYQ7bjUvOOLADOSz
H6c/pwETN2b9eKxgf14kQq1ysLUnRb0vMQfMuK9lrm/rtHzwVV7K0XB4HuunmhmL
X/NQp1JyPEKdewS+eJhA9D2F2bsY2XkusSxjkCO0/zUcb1ud9UqXxHvYN5V1wMBh
2W+fyUDi4uNWr/SzlmHm3d3XOjd29GJfmOmx8Utz1/DxNIQWZFePAJ3n+2HGn3/U
2Vg/0z1kvvEoyHgG0ds2RZi7hMGuKOhIBG/iju8PCWgyRihrbRAIT8EN/+mWf9iT
4/3tGxJn+tEbFB1YeS4L1RPn5PT8Rea66284I68bTf99vAi6cvQhzV5zBdPZJPIm
npRiFwMH7rXZRR/aJ9+Od7dai4D+u35ZhLkSenBJKcabxOZPH5Raqyye4QBI6T7I
iCb3ZmvDc2r60sKJlcaJairOMfCEF54/hAseStKEMR7zIjU0+46cKWi6PVKI91eb
ZpMjW7UJQV1a2jfGtxXTOmdVrH+sT1MGApD4G+vOD2i+JF3maOjO2NfGFZJd4zEh
R2/q/qO1y7gGuyM0TXe7kIC5Ag0EXAaQhAEQAOU04C5DBsaZqJ5jvTHGeVhF1JNu
mLwJ9rn/UBsAtqGv3Z4TRF/RwecqXrKxanIWgLwTPNmFyW2s5lRTBsoYsfsjkZd1
0lRqofijSmwdqEb65lVGMeYf38BGYoJG+1aVKY4cqFl8hXgbvmGQlHgBq3ctfjZ0
d1v8lePuQB+jXZDzNfcH1zFifj0PRViVDb/RrB2+EyOyK/bXsszOqqxvtEYc2c98
qi2t4pssravadVUIw6PIhUHnxUKEe9Z8Dx1J74hCr1k9xLD6wAaY1I2TS4wP+WKL
edNwes/ynK5sAH6barEr4EZZqcvpgNvVBm1lLiNd1OHJyhiJEvr9Yt+2ymdHfc1s
1K9Q3NnoEEaTPS1bRMld3W4VfvOkp0IK1WaJGtGoZFk4vxwpaxleAXKNkmOUex52
jCqegB8s/hIOWIfJTyFFVHFj10dqtDVMp4gpv6RvsiVC656m1s6ab63B0CfBM4U3
8h/PacOddPCF+5duG4DDRqaY4mErcVDf1bL+gPMo0JhDtOvwRrd91nIetx+k9g6D
VLVTM/rOzoHXTCTCxKEyVIm/kGhVCvprh1uHFtKXY62VoijnUrJrx8Jco3OxkRjJ
NZEwg2lkj5w8Ub9tESi3sgKcSIDxJgIphRdC4f0ZmYqYNmXl3OELZZRwxPidI9fE
jYRZXeq6K5A6rl5vABEBAAGJAjUEGAEIACAWIQQYRFT62Gl3YPPgDS5KUaRblE/9
UQUCXAaQhAIbDAAKCRBKUaRblE/9UX6kD/i0B4k6knoMmT5/7g/JWJlKreHYcRmm
QrhWci8jFovIGq30HguI14WjzXnIcePyer1iTzUrczjUstxh4/lGd4HigsdF3JVi
bgN+2eF3NDhUqGVgYgDSerC5SlRTxAEgGZtG0z7b80poqPo1gkxcRXkO302KlQ35
w2+CBhi3LZp397XNfL135vg3xwIUX2pAfdIf8iDDgFIYPgtO6W0KRAK2jOgf8yNU
8WCJBKVVxxyf+iIRzWQNsdmxDxQWgvOJSO1ugeaPMjpzrgn/yURVGe17q6cKAi9j
/f8QneB+MoQL8/E+uh1C0SKmb0gjrEhsh89e57pBkcxuZezFulmcNY02EMGaWP1S
OWCoabHJjw3asWthXRyZ9lxkWLGSuLEd9tkbuQPlgcvC4xAB/lNoEiipAGi62E/z
Uy+XomflZTJK3dL5foxC5vKr0mqyfpHXnUCOL4A7StXlB2333WGUwQTUgY5ttQ2u
oD/fMu+xOYttRwAb6r65RFFJgZCoAbQBIRC4FlkjBvLLXFO7uu2PpNMQXfu1IwLz
FDG114O1f3b8dsXNxnBM8vYZ085cF5rVxMmVSOi+0lSkbEj3GXeQt87t/OEj9SaC
O6T6MBRp9OPGnfk7v3JHvrCaJXEbW914XzMKKAHta4Eiy9YjQjP5avrJ+xn+ZPAq
orRsKaZk4Tip
=32/q
-----END PGP PUBLIC KEY BLOCK-----
\ No newline at end of file
......@@ -208,7 +208,7 @@ APACHE TIKA SUBCOMPONENTS
Apache Tika includes a number of subcomponents with separate copyright notices
and license terms. Your use of these subcomponents is subject to the terms and
conditions of the following licenses.
-------------------------------
MIME type information from file-4.26.tar.gz (http://www.darwinsys.com/file/)
Copyright (c) Ian F. Darwin 1986, 1987, 1989, 1990, 1991, 1992, 1994, 1995.
......@@ -240,6 +240,7 @@ MIME type information from file-4.26.tar.gz (http://www.darwinsys.com/file/)
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
-------------------------------
Charset detection code from ICU4J (http://site.icu-project.org/)
Copyright (c) 1995-2009 International Business Machines Corporation
......@@ -272,7 +273,7 @@ Charset detection code from ICU4J (http://site.icu-project.org/)
dealings in this Software without prior written authorization of the
copyright holder.
-------------------------------
Parsing functionality provided by the NetCDF Java Library (http://www.unidata.ucar.edu/software/netcdf-java/)
Copyright 1993-2010 University Corporation for Atmospheric Research/Unidata
......@@ -301,7 +302,7 @@ Parsing functionality provided by the NetCDF Java Library (http://www.unidata.uc
OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE ACCESS,
USE OR PERFORMANCE OF THIS SOFTWARE.
-------------------------------
IPTC Photo Metadata descriptions are taken from the IPTC Photo Metadata
Standard, July 2010, Copyright 2010 International Press Telecommunications
Council.
......@@ -323,7 +324,7 @@ Council.
15. This Specifications License Agreement may only be modified in writing signed by an authorized representative of the IPTC.
16. This Specifications License Agreement is governed by the law of United Kingdom, as such law is applied to contracts made and fully performed in the United Kingdom. Any disputes arising from or relating to this Specifications License Agreement will be resolved in the courts of the United Kingdom. You consent to the jurisdiction of such courts over you and covenant not to assert before such courts any objection to proceeding in such forums.
-------------------------------
JUnRAR (https://github.com/edmund-wagner/junrar/)
JUnRAR is based on the UnRAR tool, and covered by the same license
......@@ -367,10 +368,12 @@ JUnRAR (https://github.com/edmund-wagner/junrar/)
Thank you for your interest in RAR and UnRAR. Alexander L. Roshal
Sqlite (bundled in org.xerial's sqlite-jdbc)
-------------------------------
Sqlite (optional) (bundled in org.xerial's sqlite-jdbc)
This product bundles Sqlite, which is in the Public Domain. For details
see: https://www.sqlite.org/copyright.html
-------------------------------
Sample DXF file testDXF.dxf (in tika-parsers/src/test/resources/test-documents)
Copyright 2012 Ho Thanh Tam, www.cadkit.net
......@@ -379,6 +382,7 @@ Sample DXF file testDXF.dxf (in tika-parsers/src/test/resources/test-documents)
that the above copyright notice, author statement appear in all copies
of this software and related documentation.
-------------------------------
H2 Database in tika-eval
This software contains unmodified binary redistributions for
H2 database engine (http://www.h2database.com/),
......@@ -387,6 +391,7 @@ H2 Database in tika-eval
An original copy of the license agreement can be found at:
http://www.h2database.com/html/license.html
-------------------------------
org.brotli.dec dependency of commons-compress (MIT License)
Copyright (c) 2009, 2010, 2013-2016 by the Brotli Authors.
......
This diff is collapsed.
tika (1.20-1) unstable; urgency=medium
* New upstream release
- Fixes CVE-2018-8017: Infinite loop in the IptcAnpaParser (Closes: #914643)
- Refreshed the patches
- New dependency on libgeronimo-annotation-1.3-spec-java
- Depend on libapache-poi-java (>= 4.0)
- New build dependency on libmaven-shade-plugin-java
-- Emmanuel Bourg <ebourg@apache.org> Tue, 22 Jan 2019 10:19:46 +0100
tika (1.18-1) unstable; urgency=medium
* New upstream release
......
......@@ -9,7 +9,7 @@ Build-Depends:
default-jdk,
libandroid-json-org-java,
libapache-mime4j-java (>= 0.8.1),
libapache-poi-java (>= 3.17),
libapache-poi-java (>= 4.0),
libasm-java (>= 5.0),
libbcmail-java,
libboilerpipe-java,
......@@ -17,6 +17,7 @@ Build-Depends:
libcommons-compress-java,
libcommons-csv-java,
libcommons-exec-java,
libgeronimo-annotation-1.3-spec-java,
libgoogle-gson-java,
libhttpmime-java,
libisoparser-java (>= 1.1.18),
......@@ -30,6 +31,7 @@ Build-Depends:
libjsoup-java,
libjuniversalchardet-java,
libmaven-bundle-plugin-java,
libmaven-shade-plugin-java,
libmetadata-extractor-java (>= 2.8~),
libpdfbox2-java (>= 2.0.13-2~),
librome-java (>= 1.6),
......
......@@ -14,6 +14,7 @@ com.github.luben zstd-jni * * * *
com.healthmarketscience.jackcess jackcess * * * *
com.healthmarketscience.jackcess jackcess-encrypt * * * *
com.levigo.jbig2 levigo-jbig2-imageio * * * *
com.epam parso * * * *
com.pff java-libpst * * * *
de.thetaphi forbiddenapis * * * *
edu.ucar cdm * * * *
......@@ -22,6 +23,7 @@ edu.ucar httpservices * * * *
edu.ucar netcdf4 * * * *
edu.usc.ir sentiment-analysis-parser * * * *
junit junit * * * *
net.java.dev.jna jna * * * *
org.tallison jmatio * * * *
org.apache.cxf cxf-rt-rs-client * * * *
org.apache.felix maven-scr-plugin * * * *
......@@ -43,7 +45,7 @@ org.apache.tika tika-server * * * *
org.apache.uima uimafit-core * * * *
org.codehaus.gmaven groovy-maven-plugin * * * *
org.codehaus.mojo clirr-maven-plugin * * * *
org.codehaus.mojo versions-maven-plugin * * * *
org.opengis geoapi * * * *
org.xerial sqlite-jdbc * * * *
org.mockito * * * * *
......@@ -11,3 +11,4 @@ org.bouncycastle s/bcprov-jdk15on/bcprov/ * s/.*/debian/ * *
s/biz.aQute/biz.aQute.bnd/ * * s/.*/debian/ * *
org.apache.pdfbox pdfbox * s/.*/2.x/ * *
org.apache.pdfbox pdfbox-tools * s/.*/2.x/ * *
s/javax.annotation/org.apache.geronimo.specs/ s/javax.annotation-api/geronimo-annotation_1.3_spec/ * s/.*/debian/ * *
......@@ -3,22 +3,18 @@ Author: Emmanuel Bourg <ebourg@apache.org>
Forwarded: not-needed
--- a/tika-parsers/src/main/java/org/apache/tika/parser/internal/Activator.java
+++ b/tika-parsers/src/main/java/org/apache/tika/parser/internal/Activator.java
@@ -35,14 +35,14 @@
@@ -36,12 +36,12 @@
@Override
public void start(BundleContext context) throws Exception {
detectorService = context.registerService(
- Detector.class.getName(),
+ Detector.class,
new DefaultDetector(Activator.class.getClassLoader()),
- new Properties());
+ new java.util.Hashtable<String,String>());
new Hashtable<>());
Parser parser = new DefaultParser(Activator.class.getClassLoader());
parserService = context.registerService(
- Parser.class.getName(),
+ Parser.class,
parser,
- new Properties());
+ new java.util.Hashtable<String,String>());
new Hashtable<>());
}
@Override
......@@ -3,7 +3,7 @@ Author: Emmanuel Bourg <ebourg@apache.org>
Forwarded: no
--- a/tika-parsers/pom.xml
+++ b/tika-parsers/pom.xml
@@ -83,6 +83,7 @@
@@ -115,6 +115,7 @@
<groupId>org.gagravarr</groupId>
<artifactId>vorbis-java-tika</artifactId>
<version>${vorbis.version}</version>
......@@ -11,26 +11,10 @@ Forwarded: no
<exclusions>
<exclusion>
<groupId>org.apache.tika</groupId>
@@ -94,6 +95,7 @@
<groupId>com.healthmarketscience.jackcess</groupId>
<artifactId>jackcess</artifactId>
<version>2.1.10</version>
+ <optional>true</optional>
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
@@ -105,6 +107,7 @@
<groupId>com.healthmarketscience.jackcess</groupId>
<artifactId>jackcess-encrypt</artifactId>
<version>2.1.4</version>
+ <optional>true</optional>
<exclusions>
<exclusion>
<groupId>org.bouncycastle</groupId>
@@ -131,31 +134,37 @@
@@ -135,26 +136,31 @@
<groupId>org.tallison</groupId>
<artifactId>jmatio</artifactId>
<version>1.2</version>
<version>1.5</version>
+ <optional>true</optional>
</dependency>
<dependency>
......@@ -58,14 +42,8 @@ Forwarded: no
+ <optional>true</optional>
</dependency>
<dependency>
<groupId>org.brotli</groupId>
<artifactId>dec</artifactId>
<version>${brotli.version}</version>
+ <optional>true</optional>
</dependency>
<dependency>
<groupId>com.github.luben</groupId>
@@ -168,11 +177,13 @@
<groupId>com.epam</groupId>
@@ -177,11 +183,13 @@
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>${codec.version}</version>
......@@ -79,7 +57,7 @@ Forwarded: no
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
@@ -184,6 +195,7 @@
@@ -193,6 +201,7 @@
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox-tools</artifactId>
<version>${pdfbox.version}</version>
......@@ -87,7 +65,7 @@ Forwarded: no
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
@@ -199,6 +211,7 @@
@@ -208,6 +217,7 @@
<groupId>org.apache.pdfbox</groupId>
<artifactId>jempbox</artifactId>
<version>${jempbox.version}</version>
......@@ -95,7 +73,7 @@ Forwarded: no
</dependency>
<!-- TIKA-370: PDFBox declares the Bouncy Castle dependencies
as optional, but we prefer to have them always to avoid
@@ -207,26 +220,31 @@
@@ -216,16 +226,19 @@
<groupId>org.bouncycastle</groupId>
<artifactId>bcmail-jdk15on</artifactId>
<version>${bouncycastle.version}</version>
......@@ -112,8 +90,10 @@ Forwarded: no
<artifactId>poi</artifactId>
<version>${poi.version}</version>
+ <optional>true</optional>
</dependency>
<dependency>
<exclusions>
<exclusion>
<groupId>commons-codec</groupId>
@@ -237,11 +250,13 @@
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>${poi.version}</version>
......@@ -127,7 +107,23 @@ Forwarded: no
<exclusions>
<exclusion>
<groupId>stax</groupId>
@@ -242,31 +260,37 @@
@@ -257,6 +272,7 @@
<groupId>com.healthmarketscience.jackcess</groupId>
<artifactId>jackcess</artifactId>
<version>2.1.12</version>
+ <optional>true</optional>
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
@@ -268,6 +284,7 @@
<groupId>com.healthmarketscience.jackcess</groupId>
<artifactId>jackcess-encrypt</artifactId>
<version>2.1.4</version>
+ <optional>true</optional>
<exclusions>
<exclusion>
<groupId>org.bouncycastle</groupId>
@@ -285,31 +302,37 @@
<groupId>org.ccil.cowan.tagsoup</groupId>
<artifactId>tagsoup</artifactId>
<version>1.2.1</version>
......@@ -136,19 +132,19 @@ Forwarded: no
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
<version>5.0.4</version>
<version>7.0</version>
+ <optional>true</optional>
</dependency>
<dependency>
<groupId>com.googlecode.mp4parser</groupId>
<artifactId>isoparser</artifactId>
<version>1.1.18</version>
<version>1.1.22</version>
+ <optional>true</optional>
</dependency>
<dependency>
<groupId>com.drewnoakes</groupId>
<artifactId>metadata-extractor</artifactId>
<version>2.10.1</version>
<version>2.11.0</version>
+ <optional>true</optional>
</dependency>
<dependency>
......@@ -160,12 +156,12 @@ Forwarded: no
<dependency>
<groupId>com.rometools</groupId>
<artifactId>rome</artifactId>
<version>1.5.1</version>
<version>1.12.0</version>
+ <optional>true</optional>
<exclusions>
<exclusion>
<groupId>org.jdom</groupId>
@@ -278,16 +302,19 @@
@@ -321,16 +344,19 @@
<groupId>org.gagravarr</groupId>
<artifactId>vorbis-java-core</artifactId>
<version>${vorbis.version}</version>
......@@ -180,12 +176,12 @@ Forwarded: no
<dependency>
<groupId>org.codelibs</groupId>
<artifactId>jhighlight</artifactId>
<version>1.0.2</version>
<version>1.0.3</version>
+ <optional>true</optional>
</dependency>
<!-- can't upgrade to java-libpst 0.9.3 because it requires Java 8
and is buggy with OST TIKA-2415 -->
@@ -295,11 +322,13 @@
<exclusions>
<exclusion>
<groupId>commons-io</groupId>
@@ -344,11 +370,13 @@
<groupId>com.pff</groupId>
<artifactId>java-libpst</artifactId>
<version>0.8.1</version>
......@@ -194,20 +190,12 @@ Forwarded: no
<dependency>
<groupId>com.github.junrar</groupId>
<artifactId>junrar</artifactId>
<version>0.7</version>
<version>2.0.0</version>
+ <optional>true</optional>
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
@@ -320,6 +349,7 @@
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-rs-client</artifactId>
<version>${cxf.version}</version>
+ <optional>true</optional>
</dependency>
<!-- TIKA-2021: Tesseract OCR Parser dependencies,
used for executing image processing script -->
@@ -328,6 +358,7 @@
@@ -383,6 +411,7 @@
<artifactId>commons-exec</artifactId>
<version>${commonsexec.version}</version>
<scope>compile</scope>
......@@ -215,15 +203,15 @@ Forwarded: no
</dependency>
<!-- Provided dependencies -->
@@ -342,6 +373,7 @@
@@ -397,6 +426,7 @@
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.8.4</version>
<version>1.9.0</version>
+ <optional>true</optional>
</dependency>
<dependency>
@@ -354,6 +386,7 @@
@@ -409,6 +439,7 @@
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1.1</version>
......@@ -231,7 +219,7 @@ Forwarded: no
<exclusions>
<exclusion>
<groupId>junit</groupId>
@@ -371,11 +404,13 @@
@@ -426,11 +457,13 @@
<groupId>com.github.openjson</groupId>
<artifactId>openjson</artifactId>
<version>1.0.10</version>
......@@ -245,17 +233,17 @@ Forwarded: no
</dependency>
<!-- logging dependencies -->
@@ -520,6 +555,7 @@
@@ -594,6 +627,7 @@
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.0</version>
<version>1.6</version>
+ <optional>true</optional>
</dependency>
<dependency>
--- a/tika-xmp/pom.xml
+++ b/tika-xmp/pom.xml
@@ -85,6 +85,7 @@
@@ -93,6 +93,7 @@
<groupId>com.adobe.xmp</groupId>
<artifactId>xmpcore</artifactId>
<version>5.1.3</version>
......
......@@ -3,7 +3,7 @@ Author: Emmanuel Bourg <ebourg@apache.org>
Forwarded: not-needed
--- a/tika-parsers/pom.xml
+++ b/tika-parsers/pom.xml
@@ -812,6 +812,40 @@
@@ -918,6 +918,40 @@
</execution>
</executions>
</plugin>
......@@ -25,8 +25,7 @@ Forwarded: not-needed
+ <exclude>**/journal/TEIParser.java</exclude>
+ <exclude>**/mat/MatParser.java</exclude>
+ <exclude>**/mbox/OutlookPSTParser.java</exclude>
+ <exclude>**/microsoft/JackcessExtractor.java</exclude>
+ <exclude>**/microsoft/JackcessParser.java</exclude>
+ <exclude>**/microsoft/Jackcess*</exclude>
+ <exclude>**/netcdf/NetCDFParser.java</exclude>
+ <exclude>**/ner/NamedEntityParser.java</exclude>
+ <exclude>**/ner/grobid/GrobidNERecogniser.java</exclude>
......@@ -37,6 +36,7 @@ Forwarded: not-needed
+ <exclude>**/jdbc/SQLite3DBParser.java</exclude>
+ <exclude>**/jdbc/SQLite3Parser.java</exclude>
+ <exclude>**/recognition/**</exclude>
+ <exclude>**/sas/SAS7BDATParser.java</exclude>
+ <exclude>**/sentiment/SentimentAnalysisParser.java</exclude>
+ </excludes>
+ </configuration>
......
......@@ -3,16 +3,11 @@ Author: Emmanuel Bourg <ebourg@apache.org>
Forwarded: not-needed
--- a/tika-parsers/pom.xml
+++ b/tika-parsers/pom.xml
@@ -55,6 +55,17 @@
@@ -55,6 +55,12 @@
</properties>
<dependencies>
+ <dependency>
+ <groupId>commons-lang</groupId>
+ <artifactId>commons-lang</artifactId>
+ <version>debian</version>
+ </dependency>
+ <dependency>
+ <groupId>com.google.guava</groupId>
+ <artifactId>guava</artifactId>
+ <version>debian</version>
......
......@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.18</version>
<version>1.20</version>
<relativePath>tika-parent/pom.xml</relativePath>
</parent>
......@@ -38,11 +38,11 @@
<module>tika-parent</module>
<module>tika-core</module>
<module>tika-parsers</module>
<module>tika-bundle</module>
<module>tika-xmp</module>
<module>tika-serialization</module>
<module>tika-batch</module>
<module>tika-app</module>
<module>tika-bundle</module>
<module>tika-server</module>
<module>tika-translate</module>
<module>tika-langdetect</module>
......@@ -63,6 +63,7 @@
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>${maven.assembly.version}</version>
<executions>
<execution>
<id>src</id>
......@@ -86,6 +87,7 @@
</plugin>
<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<version>${maven.antrun.version}</version>
<executions>
<execution>
<goals>
......@@ -173,6 +175,7 @@ least three +1 Tika PMC votes are cast.
<plugin>
<groupId>org.apache.rat</groupId>
<artifactId>apache-rat-plugin</artifactId>
<version>${rat.version}</version>
<configuration>
<excludes>
<exclude>CHANGES.txt</exclude>
......
......@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.18</version>
<version>1.20</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>
......@@ -104,6 +104,7 @@
<plugins>
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<version>${maven.shade.version}</version>
<executions>
<execution>
<phase>package</phase>
......@@ -164,6 +165,13 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifestEntries>
<Automatic-Module-Name>org.apache.tika.app</Automatic-Module-Name>
</manifestEntries>
</archive>
</configuration>
<executions>
<execution>
<goals>
......@@ -175,6 +183,7 @@
<plugin>
<groupId>org.apache.rat</groupId>
<artifactId>apache-rat-plugin</artifactId>
<version>${rat.version}</version>
<configuration>
<excludes>
<exclude>src/test/resources/test-data/**</exclude>
......@@ -196,6 +205,7 @@
<plugins>
<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<version>${maven.antrun.version}</version>
<executions>
<execution>
<phase>package</phase>
......@@ -232,7 +242,7 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>1.7</version>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
......
......@@ -18,7 +18,7 @@
package org.apache.tika.cli;
import org.apache.commons.lang.SystemUtils;
import org.apache.commons.lang3.SystemUtils;
import java.io.IOException;
import java.nio.file.Files;
......
......@@ -34,6 +34,7 @@ import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintStream;
import java.io.PrintWriter;
import java.io.Serializable;
import java.io.UnsupportedEncodingException;
import java.io.Writer;
import java.lang.reflect.Field;
......@@ -56,6 +57,7 @@ import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import java.util.TreeSet;
import java.util.UUID;
import org.apache.commons.io.FilenameUtils;
import org.apache.commons.io.IOUtils;
......@@ -80,6 +82,7 @@ import org.apache.tika.gui.TikaGUI;
import org.apache.tika.io.TikaInputStream;
import org.apache.tika.language.detect.LanguageHandler;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.metadata.TikaCoreProperties;
import org.apache.tika.metadata.serialization.JsonMetadata;
import org.apache.tika.metadata.serialization.JsonMetadataList;
import org.apache.tika.mime.MediaType;
......@@ -103,6 +106,7 @@ import org.apache.tika.sax.BasicContentHandlerFactory;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.sax.ContentHandlerFactory;
import org.apache.tika.sax.ExpandedTitleContentHandler;
import org.apache.tika.sax.RecursiveParserWrapperHandler;
import org.apache.tika.xmp.XMPMetadata;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
......@@ -443,7 +447,13 @@ public class TikaCLI {
} else if (arg.equals("-d") || arg.equals("--detect")) {
type = DETECT;
} else if (arg.startsWith("--extract-dir=")) {
extractDir = new File(arg.substring("--extract-dir=".length()));
String dirPath = arg.substring("--extract-dir=".length());
//if the user accidentally doesn't include
//a directory, set the directory to the cwd
if (dirPath.length() == 0) {
dirPath = ".";
}
extractDir = new File(dirPath);
} else if (arg.equals("-z") || arg.equals("--extract")) {
extractInlineImagesFromPDFs();
type = NO_OUTPUT;
......@@ -502,14 +512,15 @@ public class TikaCLI {
private void handleRecursiveJson(URL url, OutputStream output) throws IOException, SAXException, TikaException {
Metadata metadata = new Metadata();
RecursiveParserWrapper wrapper = new RecursiveParserWrapper(parser, getContentHandlerFactory(type));
RecursiveParserWrapper wrapper = new RecursiveParserWrapper(parser);
RecursiveParserWrapperHandler handler = new RecursiveParserWrapperHandler(getContentHandlerFactory(type), -1);
try (InputStream input = TikaInputStream.get(url, metadata)) {
wrapper.parse(input, null, metadata, context);
wrapper.parse(input, handler, metadata, context);
}
JsonMetadataList.setPrettyPrinting(prettyPrint);
Writer writer = getOutputWriter(output, encoding);
try {
JsonMetadataList.toJson(wrapper.getMetadata(), writer);
JsonMetadataList.toJson(handler.getMetadataList(), writer);
} finally {
writer.flush();
}
......@@ -694,11 +705,7 @@ public class TikaCLI {
}
detector = config.getDetector();
context.set(Parser.class, parser);
context.set(PasswordProvider.class, new PasswordProvider() {
public String getPassword(Metadata metadata) {
return password;
}
});
context.set(PasswordProvider.class, new SimplePasswordProvider(password));
}
private void displayMetModels(){
......@@ -1046,21 +1053,13 @@ public class TikaCLI {
}
MediaType contentType = detector.detect(inputStream, metadata);
if (name.indexOf('.')==-1 && contentType!=null) {
try {
name += config.getMimeRepository().forName(
contentType.toString()).getExtension();
} catch (MimeTypeException e) {
e.printStackTrace();
}
File outputFile = null;
if (name == null) {
name = "file" + count++;
}
outputFile = getOutputFile(name, metadata, contentType);
String relID = metadata.get(Metadata.EMBEDDED_RELATIONSHIP_ID);
if (relID != null && !name.startsWith(relID)) {
name = relID + "_" + name;
}
File outputFile = new File(extractDir, FilenameUtils.normalize(name));
File parent = outputFile.getParentFile();
if (!parent.exists()) {
if (!parent.mkdirs()) {
......@@ -1097,6 +1096,58 @@ public class TikaCLI {
}
}
private File getOutputFile(String name, Metadata metadata, MediaType contentType) {
String ext = getExtension(contentType);
if (name.indexOf('.')==-1 && contentType!=null) {
name += ext;
}
String relID = metadata.get(Metadata.EMBEDDED_RELATIONSHIP_ID);
if (relID != null && !name.startsWith(relID)) {
name = relID + "_" + name;
}
//defensively do this so that we don't get an exception
//from FilenameUtils.normalize
name = name.replaceAll("\u0000", " ");
String normalizedName = FilenameUtils.normalize(name);
if (normalizedName == null) {
normalizedName = FilenameUtils.getName(name);
}
if (normalizedName == null) {
normalizedName = "file"+count++ +ext;
}
//strip off initial C:/ or ~/ or /
int prefixLength = FilenameUtils.getPrefixLength(normalizedName);
if (prefixLength > -1) {
normalizedName = normalizedName.substring(prefixLength);
}
File outputFile = new File(extractDir, normalizedName);
//if file already exists, prepend uuid
if (outputFile.exists()) {
String fileName = FilenameUtils.getName(normalizedName);
outputFile = new File(extractDir, UUID.randomUUID().toString()+"-"+fileName);
}
return outputFile;
}
private String getExtension(MediaType contentType) {
try {
String ext = config.getMimeRepository().forName(
contentType.toString()).getExtension();
if (ext == null) {
return ".bin";
} else {
return ext;
}
} catch (MimeTypeException e) {
e.printStackTrace();
}
return ".bin";
}
protected void copy(DirectoryEntry sourceDir, DirectoryEntry destDir)
throws IOException {
for (org.apache.poi.poifs.filesystem.Entry entry : sourceDir) {
......@@ -1268,4 +1319,16 @@ public class TikaCLI {
}
private static class SimplePasswordProvider
implements PasswordProvider, Serializable {
private final String password;
public SimplePasswordProvider(String password) {
this.password = password;
}
@Override
public String getPassword(Metadata metadata) {
return password;
}
}
}
......@@ -80,6 +80,7 @@ import org.apache.tika.parser.utils.CommonsDigester;
import org.apache.tika.sax.BasicContentHandlerFactory;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.sax.ContentHandlerDecorator;
import org.apache.tika.sax.RecursiveParserWrapperHandler;
import org.apache.tika.sax.TeeContentHandler;
import org.apache.tika.sax.XHTMLContentHandler;
import org.xml.sax.Attributes;
......@@ -395,13 +396,16 @@ public class TikaGUI extends JFrame
);
}
if (isReset) {
RecursiveParserWrapper wrapper = new RecursiveParserWrapper(parser,
RecursiveParserWrapperHandler recursiveParserWrapperHandler =
new RecursiveParserWrapperHandler(
new BasicContentHandlerFactory(
BasicContentHandlerFactory.HANDLER_TYPE.BODY, -1));
wrapper.parse(input, null, new Metadata(), new ParseContext());
BasicContentHandlerFactory.HANDLER_TYPE.BODY, -1),
-1);
RecursiveParserWrapper wrapper = new RecursiveParserWrapper(parser);
wrapper.parse(input, recursiveParserWrapperHandler, new Metadata(), new ParseContext());
StringWriter jsonBuffer = new StringWriter();
JsonMetadataList.setPrettyPrinting(true);
JsonMetadataList.toJson(wrapper.getMetadata(), jsonBuffer);
JsonMetadataList.toJson(recursiveParserWrapperHandler.getMetadataList(), jsonBuffer);
setText(json, jsonBuffer.toString());
}
layout.show(cards, "metadata");
......
......@@ -63,6 +63,7 @@
description="output directory for output"/> <!-- do we want to make this mandatory -->
<option opt="recursiveParserWrapper"
description="use the RecursiveParserWrapper or not (default = false)"/>
<option opt="streamOut" description="stream the output of the RecursiveParserWrapper (default = false)"/>
<option opt="handleExisting" hasArg="true"
description="if an output file already exists, do you want to: overwrite, rename or skip"/>
<option opt="basicHandlerType" hasArg="true"
......
......@@ -20,24 +20,41 @@ package org.apache.tika.cli;
import static java.nio.charset.StandardCharsets.UTF_8;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;
import java.io.BufferedWriter;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintStream;
import java.io.Reader;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.logging.Handler;
import org.apache.commons.io.FileUtils;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.metadata.serialization.JsonMetadataList;
import org.apache.tika.metadata.serialization.JsonStreamingSerializer;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.parser.RecursiveParserWrapper;
import org.apache.tika.sax.AbstractRecursiveParserWrapperHandler;
import org.apache.tika.sax.BasicContentHandlerFactory;
import org.apache.tika.sax.ContentHandlerFactory;
import org.apache.tika.sax.RecursiveParserWrapperHandler;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
public class TikaCLIBatchIntegrationTest {
......@@ -108,7 +125,28 @@ public class TikaCLIBatchIntegrationTest {
try (Reader reader = Files.newBufferedReader(jsonFile, UTF_8)) {
List<Metadata> metadataList = JsonMetadataList.fromJson(reader);
assertEquals(12, metadataList.size());
assertTrue(metadataList.get(6).get(RecursiveParserWrapper.TIKA_CONTENT).contains("human events"));
assertTrue(metadataList.get(6).get(AbstractRecursiveParserWrapperHandler.TIKA_CONTENT).contains("human events"));
}
}
@Test
public void testStreamingJsonRecursiveBatchIntegration() throws Exception {
String[] params = {"-i", testInputDirForCommandLine,
"-o", tempOutputDirForCommandLine,
"-numConsumers", "10",
"-J", //recursive Json
"-t", //plain text in content
"-streamOut"
};
TikaCLI.main(params);
Path jsonFile = tempOutputDir.resolve("test_recursive_embedded.docx.json");
try (Reader reader = Files.newBufferedReader(jsonFile, UTF_8)) {
List<Metadata> metadataList = JsonMetadataList.fromJson(reader);
assertEquals(12, metadataList.size());
assertTrue(metadataList.get(6).get(AbstractRecursiveParserWrapperHandler.TIKA_CONTENT).contains("human events"));
//test that the last written object has been bumped to the first by JsonMetadataList.fromJson()
assertNull( metadataList.get(0).get(AbstractRecursiveParserWrapperHandler.EMBEDDED_RESOURCE_PATH));
}
}
......@@ -170,5 +208,4 @@ public class TikaCLIBatchIntegrationTest {
Files.isRegularFile(path));
}
}
......@@ -17,15 +17,25 @@
package org.apache.tika.cli;
import static java.nio.charset.StandardCharsets.UTF_8;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InvalidObjectException;
import java.io.OutputStream;
import java.io.PrintStream;
import java.net.URI;
import java.nio.charset.StandardCharsets;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.FilenameUtils;
import org.apache.tika.exception.TikaException;
import org.junit.After;
import org.junit.Before;
......@@ -126,6 +136,17 @@ public class TikaCLITest {
assertTrue(outContent.toString(UTF_8.name()).contains("finished off the cake"));
}
/**
* Tests -f option of the cli
*
* @throws Exception
*/
@Test
public void testForkParser() throws Exception{
String[] params = {"-f", resourcePrefix + "alice.cli.test"};
TikaCLI.main(params);
assertTrue(outContent.toString(UTF_8.name()).contains("finished off the cake"));
}
/**
* Tests -m option of the cli
* @throws Exception
......@@ -245,39 +266,74 @@ public class TikaCLITest {
}
@Test
public void testExtract() throws Exception {
public void testExtractSimple() throws Exception {
String[] expectedChildren = new String[]{
"MBD002B040A.cdx",
"file4.png",
"MBD002B0FA6_file5.bin",
"MBD00262FE3.txt",
"file0.emf"
};
testExtract("/coffee.xls", expectedChildren, 8);
}
@Test
public void testExtractAbsolute() throws Exception {
String[] expectedChildren = new String[] {
"dangerous/dont/touch.pl",
};
testExtract("testZip_absolutePath.zip", expectedChildren, 2);
}
@Test
public void testExtractRelative() throws Exception {
String[] expectedChildren = new String[] {
"touch.pl",
};
testExtract("testZip_relative.zip", expectedChildren);
}
@Test
public void testExtractOverlapping() throws Exception {
//there should be two files, one with a prepended uuid-f1.txt
String[] expectedChildren = new String[] {
"f1.txt",
};
testExtract("testZip_overlappingNames.zip", expectedChildren, 2);
}
@Test
public void testExtract0x00() throws Exception {
String[] expectedChildren = new String[] {
"dang erous.pl",
};
testExtract("testZip_zeroByte.zip", expectedChildren);
}
private void testExtract(String targetFile, String[] expectedChildrenFileNames) throws Exception {
testExtract(targetFile, expectedChildrenFileNames, expectedChildrenFileNames.length);
}
private void testExtract(String targetFile, String[] expectedChildrenFileNames, int expectedLength) throws Exception {
File tempFile = File.createTempFile("tika-test-", "");
tempFile.delete();
tempFile.mkdir(); // not really good method for production usage, but ok for tests
// google guava library has better solution
tempFile.mkdir();
try {
String[] params = {"--extract-dir="+tempFile.getAbsolutePath(),"-z", resourcePrefix + "/coffee.xls"};
String[] params = {"--extract-dir=" + tempFile.getAbsolutePath(), "-z", resourcePrefix + "/"+targetFile};
TikaCLI.main(params);
StringBuffer allFiles = new StringBuffer();
assertEquals(expectedLength, tempFile.list().length);
for (String f : tempFile.list()) {
if (allFiles.length() > 0) allFiles.append(" : ");
allFiles.append(f);
}
// ChemDraw file
File expectedCDX = new File(tempFile, "MBD002B040A.cdx");
// Image of the ChemDraw molecule
File expectedIMG = new File(tempFile, "file4.png");
// OLE10Native
File expectedOLE10 = new File(tempFile, "MBD002B0FA6_file5.bin");
// Something that really isnt a text file... Not sure what it is???
File expected262FE3 = new File(tempFile, "MBD00262FE3.txt");
// Image of one of the embedded resources
File expectedEMF = new File(tempFile, "file0.emf");
assertExtracted(expectedCDX, allFiles.toString());
assertExtracted(expectedIMG, allFiles.toString());
assertExtracted(expectedOLE10, allFiles.toString());
assertExtracted(expected262FE3, allFiles.toString());
assertExtracted(expectedEMF, allFiles.toString());
for (String expectedChildName : expectedChildrenFileNames) {
assertExtracted(new File(tempFile, expectedChildName), allFiles.toString());
}
} finally {
FileUtils.deleteDirectory(tempFile);
}
......@@ -510,5 +566,4 @@ public class TikaCLITest {
assertFalse(content.contains("org.apache.tika.parser.executable.Executable"));
}
}