Commit b2a8f92b authored by Hilko Bengen's avatar Hilko Bengen

Imported Upstream version 4.6.0


Too many changes to show.

To preserve performance only 1000 of 1000+ files are displayed.

Lucene Build Instructions
Basic steps:
0) Install JDK 1.6 (or greater), Ant 1.8.2+, Ivy 2.2.0
1) Download Lucene from Apache and unpack it
2) Connect to the top-level of your Lucene installation
3) Install JavaCC (optional)
4) Run ant
Step 0) Set up your development environment (JDK 1.6 or greater,
Ant 1.8.2+, Ivy 2.2.0)
We'll assume that you know how to get and set up the JDK - if you
don't, then we suggest starting at and learning
more about Java, before returning to this README. Lucene runs with
JDK 1.6 and later.
Like many Open Source java projects, Lucene uses Apache Ant for build
control. Specifically, you MUST use Ant version 1.8.2+.
Ant is "kind of like make without make's wrinkles". Ant is
implemented in java and uses XML-based configuration files. You can
get it at:
You'll need to download the Ant binary distribution. Install it
according to the instructions at:
Finally, you'll need to install ivy into your ant lib folder
(~/.ant/lib). You can get it from
If you skip this step, the Lucene build system will offer to do it
for you.
Step 1) Download Lucene from Apache
We'll assume you already did this, or you wouldn't be reading this
file. However, you might have received this file by some alternate
route, or you might have an incomplete copy of the Lucene, so: Lucene
releases are available for download at:
Download either a zip or a tarred/gzipped version of the archive, and
uncompress it into a directory of your choice.
Step 2) From the command line, change (cd) into the top-level directory of your Lucene installation
Lucene's top-level directory contains the build.xml file. By default,
you do not need to change any of the settings in this file, but you do
need to run ant from this location so it knows where to find build.xml.
If you would like to change settings you can do so by creating one
or more of the following files and placing your own property settings
in there:
The first property which is found in the order with which the files are
loaded becomes the property setting which is used by the Ant build
NOTE: the ~ character represents your user account home directory.
Step 3) Run ant
Assuming you have ant in your PATH and have set ANT_HOME to the
location of your ant installation, typing "ant" at the shell prompt
and command prompt should run ant. Ant will by default look for the
"build.xml" file in your current directory, and compile Lucene.
If you want to build the documentation, type "ant documentation".
For further information on Lucene, go to:
Please join the Lucene-User mailing list by visiting this site:
Please post suggestions, questions, corrections or additions to this
document to the lucene-user mailing list.
This file was originally written by Steven J. Owens <>.
This file was modified by Jon S. Stevens <>.
Copyright (c) 2001-2005 The Apache Software Foundation. All rights reserved.
This diff is collapsed.
# JRE Version Migration Guide
If possible, use the same JRE major version at both index and search time.
When upgrading to a different JRE major version, consider re-indexing.
Different JRE major versions may implement different versions of Unicode,
which will change the way some parts of Lucene treat your text.
For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6,
but with Java 5 it will not.
This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4.
For reference, JRE major versions with their corresponding Unicode versions:
* Java 1.4, Unicode 3.0
* Java 5, Unicode 4.0
* Java 6, Unicode 4.0
* Java 7, Unicode 6.0
In general, whether or not you need to re-index largely depends upon the data that
you are searching, and what was changed in any given Unicode version. For example,
if you are completely sure that your content is limited to the "Basic Latin" range
of Unicode, you can safely ignore this.
## Special Notes: LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
* `StandardAnalyzer` will return the same results under Java 5 as it did under
Java 1.4. This is because it is largely independent of the runtime JRE for
Unicode support, (with the exception of lowercasing). However, no changes to
casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are
using this Analyzer you are NOT affected.
* `SimpleAnalyzer`, `StopAnalyzer`, `LetterTokenizer`, `LowerCaseFilter`, and
`LowerCaseTokenizer` may return different results, along with many other `Analyzer`s
and `TokenStream`s in Lucene's analysis modules. If you are using one of these
components, you may be affected.
This diff is collapsed.
This diff is collapsed.
Apache Lucene
Copyright 2013 The Apache Software Foundation
This product includes software developed by
The Apache Software Foundation (
Includes software from other Apache Software Foundation projects,
including, but not limited to:
- Apache Ant
- Apache Jakarta Regexp
- Apache Commons
- Apache Xerces
ICU4J, (under analysis/icu) is licensed under an MIT styles license
and Copyright (c) 1995-2008 International Business Machines Corporation and others
Some data files (under analysis/icu/src/data) are derived from Unicode data such
as the Unicode Character Database. See for more
Brics Automaton (under core/src/java/org/apache/lucene/util/automaton) is
BSD-licensed, created by Anders Møller. See
The levenshtein automata tables (under core/src/java/org/apache/lucene/util/automaton) were
automatically generated with the moman/finenight FSA library, created by
Jean-Philippe Barrette-LaPierre. This library is available under an MIT license,
see and
The class org.apache.lucene.util.WeakIdentityMap was derived from
the Apache CXF project and is Apache License 2.0.
The Google Code Prettify is Apache License 2.0.
JUnit (junit-4.10) is licensed under the Common Public License v. 1.0
This product includes code (JaspellTernarySearchTrie) from Java Spelling Checkin
g Package (jaspell):
License: The BSD License (
The snowball stemmers in
were developed by Martin Porter and Richard Boulton.
The snowball stopword lists in
were developed by Martin Porter and Richard Boulton.
The full snowball package is available from
The KStem stemmer in
was developed by Bob Krovetz and Sergio Guzman-Lara (CIIR-UMass Amherst)
under the BSD-license.
The Arabic,Persian,Romanian,Bulgarian, and Hindi analyzers (common) come with a default
stopword list that is BSD-licensed created by Jacques Savoy. These files reside in:
The German,Spanish,Finnish,French,Hungarian,Italian,Portuguese,Russian and Swedish light stemmers
(common) are based on BSD-licensed reference implementations created by Jacques Savoy and
Ljiljana Dolamic. These files reside in:
The Stempel analyzer (stempel) includes BSD-licensed software developed
by the Egothor project, created by Leo Galambos, Martin Kvapil,
and Edmond Nolan.
The Polish analyzer (stempel) comes with a default
stopword list that is BSD-licensed created by the Carrot2 project. The file resides
in stempel/src/resources/org/apache/lucene/analysis/pl/stopwords.txt.