Commit 4f1c7b14 authored by Ken Bloom's avatar Ken Bloom

Merge branch 'upstream'

Conflicts:
	configure.in
parents b16f1abe a95f0f45
......@@ -19,3 +19,4 @@ Sampo Pyysalo <smp_at_is.s.u-tokyo.ac.jp>
Murilo Saraiva de Queiroz <muriloq_at_gmail.com>
Fridrich Strba <fridrich.strba_at_bluewin.ch>
Peter Szolovits
Vikas N Kumar <walburn_at_gmail.com>
Version 4.6.5 (3 November 2009)
* Fix: Superlatives without preceeding determiners ("... likes you best")
* Fix: Take more care in distinguishing mass and count nouns.
* Fix: Old bug w/relative clauses: Rw+ is optional, not mandatory.
* Provide tags identifying relative, superlative adjectives.
* Remove BioLG NUMBER-AND-UNIT handling, its been superceeded.
* Fix handling of parenthetical phrases/clauses.
* Fix: handling of capitalized first words ending in letter "s".
* Fix: support "filler-it" SF link for "It was reasoned that..."
* Fix: certian WH-word constructions: "I did not know why until recently"
* Fix: go: "there goes the greatest guy ever"
* Fix: opening coordinating conjunctions: "And you can also ..."
* Configurable Hunspell spell-checker dictionary location.
* Fix: Misc ordinal usage.
* Add support for aspell spell-checker.
Version 4.6.4 (11 October 2009)
* Restore nouns starting w/letters x-z, elided in version 4.5.9 ff.
* Add support for single-word interjections/exclamations!
* Fix: sometimes command line client fails to show all valid linkages.
* Misc fixes: such_that, upon, acted.v
* Fix: impersonal "be" linking to passive participle.
* Fix: handling of capitalized first words.
* Fix: duplication of certain parses involving transitive verbs.
Version 4.6.3 (4 October 2009)
* Fix compilation bug on FreeBSD.
* Fix: allow MX link to post-nominal ", to be ..., "
* Fix: add idiom "time and again"
* Fix: another BioLG regression in handling of possesives.
* Fix: handling of period at end of number at end of sentence.
* Fix: Capitalized words ending in s at start of sentence.
* Use corpus-statistics-based ranking by default, if available.
* Fix difficulties in build of corpus statistics module.
Version 4.6.2 (21 September 2009)
* Fix: "come across as authoritiative".
* Improve java location guessing in FreeBSD
* Fix for assert triggered by long sentences.
* Fix: long sequence of periods treated as unknown word.
* Add informational print showing dictionary location on startup.
* Remove duplicated {@MV+} in tend.v
* Automatically resize the display size to fit the current window size.
* Fix handling of punctuation at the end of a capitalized word.
* Fix misc verbs acting as adjectival modifiers: e.g. "given", "allied"
* Fix bug in BioLG code regarding the handling of possesives.
* Fix a (rare) crash in sentences with many conjunctions.
* Fix a crash involving long sequences of UTF8 punctuation marks.
Version 4.6.1 (31 August 2009)
* Stop printing annoying warning when !vars are used.
* Fix missing dict file units.2 problem
* Fix compilation bug on FreeBSD.
Version 4.6.0 (29 August 2009)
* Avoid used of bzero, add missing include directives (MacOSX problem)
* Reclassify a number of "medical" prepositions as adverbs.
* Add approx 100 adverbs & 300 adjectives.
* Add approx 250 verbs.
* Add approx 300 nouns.
* Add misc units.
* Add misc European connector words/patronymics.
* Reclassify 100's of transitive verbs as optionally-transitive.
* Add distinct tokenization step ("sentence_split") to public API.
This last change forces the minor-version-number bump.
Version 4.5.10 (25 August 2009)
* Be sure to link with -lm
Version 4.5.9 (25 August 2009)
* Modify error messages to indicate that they are from link-grammar.
* Add missing Java files that were forgotten last time around.
* Add greeting to command-line client startup.
* Print disjunct cost also, when requesting disjunct printing.
* Add missing color names as mass nouns.
* Fix: Reclassify musical instruments: "He plays piano"
* Add experimental word-clustering system.
* Add CMake build file
* Fix: "It takes longer than that."
* Fix: "He has done very well."
* Fix: a dozen optionally transitive verbs (swim, kill, etc.)
* Fix: "He's out running."
* Fix: "suddenly" is a "manner adverb", not a clausal adverb.
* Fix: Use Pg links to gerunds: "He feared hitting the wall."
* Fix: assorted numerical-range bugs.
* Fix: prep modifiers with distances: "It is a few miles out"
* Fix: Spelled-out dates: "It started in nineteen twelve"
* Fix: Misc date, time expression parsing e.g "Zero hour is here."
* Fix: Misc words, "ordered list", "screened out"
* Fix: Post-fixed numbers can act as determiners.
* Fix: "We bought the last 50 ft. of cable."
* Fix: opening directives to imperatives: "Finally, move it back."
* Fix: Improved simple equation parsing support.
* Fix: Add misc fixes from BioLG that were previously overlooked.
* Fix: "favorite" can take determiner "a" ("a favorite place")
* Fix: assorted clausal complements: "The emperor ordered it done."
* Fix: ordinals: "First on our list is ..."
* Fix: verb modifier "some of the time", "most places"
* Fix: Sit, stand take modifiers: "he stood still"
Version 4.5.8 (2 July 2009)
* Fix: 'than anticipated', 'than was anticipated', etc.
* Fix: 'saw the wood'
......
# - Try to find the link-grammar library; Once done this will define
#
# LINK_GRAMMAR_FOUND - system has the link-grammar library
# LINK_GRAMMAR_INCLUDE_DIRS - the link-grammar include directory
# LINK_GRAMMAR_LIBRARIES - The libraries needed to use link-grammar
# LINK_GRAMMAR_DATA_DIR - the dir where you will find the dictionaries
# Copyright (c) 2008, OpenCog.org (http://opencog.org)
#
# Redistribution and use is allowed according to the terms of the BSD license.
# For details see the accompanying COPYING-CMAKE-SCRIPTS file.
# Look for the header file
FIND_PATH(LINK_GRAMMAR_INCLUDE_DIR link-grammar/link-includes.h)
FIND_PATH(LINK_GRAMMAR_DATA_DIR 4.0.dict
PATHS
/usr/share/link-grammar/en/
/usr/local/share/link-grammar/en/)
# Look for the library
FIND_LIBRARY(LINK_GRAMMAR_LIBRARY
NAMES
link-grammar
PATHS
/usr/lib
/usr/local/lib
/opt/local/lib)
# Copy the results to the output variables.
IF (LINK_GRAMMAR_INCLUDE_DIR AND LINK_GRAMMAR_LIBRARY AND LINK_GRAMMAR_DATA_DIR)
SET(LINK_GRAMMAR_FOUND 1)
SET(LINK_GRAMMAR_LIBRARIES ${LINK_GRAMMAR_LIBRARY})
SET(LINK_GRAMMAR_INCLUDE_DIRS ${LINK_GRAMMAR_INCLUDE_DIR})
ELSE (LINK_GRAMMAR_INCLUDE_DIR AND LINK_GRAMMAR_LIBRARY AND LINK_GRAMMAR_DATA_DIR)
SET(LINK_GRAMMAR_FOUND 0)
SET(LINK_GRAMMAR_LIBRARIES)
SET(LINK_GRAMMAR_INCLUDE_DIRS)
ENDIF (LINK_GRAMMAR_INCLUDE_DIR AND LINK_GRAMMAR_LIBRARY AND LINK_GRAMMAR_DATA_DIR)
# Report the results.
IF (NOT LINK_GRAMMAR_FOUND)
SET(LINK_GRAMMAR_DIR_MESSAGE
"link-grammar was not found. Make sure LINK_GRAMMAR_LIBRARY, LINK_GRAMMAR_INCLUDE_DIR and LINK_GRAMMAR_DATA_DIR are set.")
IF (NOT LINK_GRAMMAR_FIND_QUIETLY)
MESSAGE(STATUS "${LINK_GRAMMAR_DIR_MESSAGE}")
ELSE (NOT LINK_GRAMMAR_FIND_QUIETLY)
IF (LINK_GRAMMAR_FIND_REQUIRED)
MESSAGE(FATAL_ERROR "${LINK_GRAMMAR_DIR_MESSAGE}")
ENDIF (LINK_GRAMMAR_FIND_REQUIRED)
ENDIF (NOT LINK_GRAMMAR_FIND_QUIETLY)
ENDIF (NOT LINK_GRAMMAR_FOUND)
MARK_AS_ADVANCED(
LINK_GRAMMAR_INCLUDE_DIR
LINK_GRAMMAR_LIBRARY
)
......@@ -32,6 +32,7 @@ EXTRA_DIST = \
link-grammar.spec \
AUTHORS \
ChangeLog \
FindLinkGrammar.cmake\
LICENSE \
MAINTAINERS \
README \
......
This diff is collapsed.
......@@ -7,11 +7,11 @@ dnl 4a) Increment when removing or changing interfaces.
LINK_MAJOR_VERSION=4
dnl 4a) 5) Increment when adding interfaces.
dnl 6) Set to zero when removing or changing interfaces.
LINK_MINOR_VERSION=5
LINK_MINOR_VERSION=6
dnl 3) Increment when interfaces not changed at all,
dnl only bug fixes or internal changes made.
dnl 4b) Set to zero when adding, removing or changing interfaces.
LINK_MICRO_VERSION=8
LINK_MICRO_VERSION=5
dnl
dnl Set this too
MAJOR_VERSION_PLUS_MINOR_VERSION=`expr $LINK_MAJOR_VERSION + $LINK_MINOR_VERSION`
......@@ -67,6 +67,20 @@ AM_CONDITIONAL(WITH_BINRELOC, test "x$br_cv_binreloc" = "xyes")
dnl ====================================================================
# The std=c99 flag provides the proper float-pt math decls working,
# e.g. fmaxf However, it also undefined _BSD_SOURCE, etc which is
# needed to get fileno, strdup, etc. and so it needs to be manually
# enabled again.
# Setting -D_POSIX_SOURCE messes up compilation on FreeBSD by
# hiding strdup, etc. again.
# CFLAGS="${CFLAGS} -std=c99 -D_BSD_SOURCE -D_SVID_SOURCE -D_POSIX_C_SOURCE -D_GNU_SOURCE"
# Final solution: enable std=c99, explitictly turn on BSD and SVID and
# GNU, but do NOT turn on POSIX.
#
CFLAGS="${CFLAGS} -std=c99 -D_BSD_SOURCE -D_SVID_SOURCE -D_GNU_SOURCE"
AC_ARG_ENABLE( debug,
[ --enable-debug compile with debugging flags set],
CFLAGS="${CFLAGS} -g"
......@@ -113,6 +127,25 @@ AC_ARG_ENABLE( corpus-stats,
AM_CONDITIONAL(WITH_CORPUS, test x${buildcorpus} = xyes)
dnl ASpell Support is handled here
do_aspell=yes
AC_ARG_ENABLE([aspell], [AS_HELP_STRING([ --disable-aspell],
[Build without ASpell support (default is enabled)])],
do_aspell=no)
AM_CONDITIONAL(WITH_ASPELL, test x${do_aspell} = xyes)
dnl Hunspell Support is handled here
do_hunspell=yes
AC_ARG_ENABLE([hunspell], [AS_HELP_STRING([ --disable-hunspell],
[Build without HunSpell support (default is enabled)])],
do_hunspell=no)
AM_CONDITIONAL(WITH_HUNSPELL, test x${do_hunspell} = xyes)
AC_ARG_WITH([hunspell-dictdir], [AS_HELP_STRING([--with-hunspell-dictdir=DIR],
[Use DIR to find HunSpell files (default=/usr/share/myspell/dicts])],
[], with_hunspell_dictdir=)
dnl ====================================================================
# If not asking for the statistics backend, then don't even
......@@ -132,17 +165,62 @@ else
AM_CONDITIONAL(HAVE_SQLITE, false)
fi
dnl Set Default Spell Checker settings
dnl ====================================================================
AC_CHECK_HEADER([hunspell/hunspell.h], [CPPFLAGS="${CPPFLAGS} -DHAVE_HUNSPELL=1" HunSpellFound=yes], HunSpellFound=no)
AM_CONDITIONAL(HAVE_HUNSPELL, test x${HunSpellFound} = xyes)
ASpellFound=no
if test "$do_aspell" = yes ; then
PKG_CHECK_MODULES([ASPELL], [aspell], [ASpellFound=yes], [ASpellFound=no])
save_cpp_flags=${CPPFLAGS}
CPPFLAGS="${CPPFLAGS} ${ASPELL_CFLAGS}"
AC_CHECK_HEADER([aspell.h], [ASpellFound=yes], ASpellFound=no)
AC_CHECK_LIB(aspell, new_aspell_config, [], [ASpellFound=no])
CPPFLAGS=$save_cpp_flags
if test "x${ASpellFound}" = "xyes"; then
AC_DEFINE(HAVE_ASPELL, 1, [Define for compilation])
AC_SUBST(ASPELL_LIBS)
AC_SUBST(ASPELL_CFLAGS)
# If aspell enabled and found, then do NOT do hunspell
do_hunspell=no
fi
fi
AM_CONDITIONAL(HAVE_ASPELL, test x${ASpellFound} = xyes)
HunSpellDictDir=
HunSpellFound=no
if test x"$do_hunspell" = xyes; then
HunSpellFound=no
# First, look for the libraries.
PKG_CHECK_MODULES([HUNSPELL], [hunspell], [HunSpellFound=yes], [HunSpellFound=no])
save_cpp_flags=${CPPFLAGS}
CPPFLAGS="${CPPFLAGS} ${HUNSPELL_CFLAGS}"
AC_CHECK_HEADER([hunspell.h], [HunSpellFound=yes], HunSpellFound=no)
CPPFLAGS=$save_cpp_flags
if test "x${HunSpellFound}" = "xyes"; then
AC_DEFINE(HAVE_HUNSPELL, 1, [Define for compilation])
AC_SUBST(HUNSPELL_LIBS)
AC_SUBST(HUNSPELL_CFLAGS)
# Now, look for the dictionaries.
HunSpellDictDir=/usr/share/myspell/dicts
if test -n "$with_hunspell_dictdir"; then
HunSpellDictDir=$with_hunspell_dictdir
fi
if ! test -d "$HunSpellDictDir" ; then
echo "WARN HunSpell Dictionaries do not exist at \"$HunSpellDictDir\""
fi
AC_DEFINE_UNQUOTED(HUNSPELL_DICT_DIR, "$HunSpellDictDir", [Defining the dictionary path])
fi
if test "x${HunSpellFound}" = "xyes"; then
PKG_CHECK_MODULES([HUNSPELL], [hunspell])
AC_SUBST(HUNSPELL_LIBS)
AC_SUBST(HUNSPELL_CFLAGS)
fi
AM_CONDITIONAL(HAVE_HUNSPELL, test x${HunSpellFound} = xyes)
dnl ====================================================================
AC_DEFUN([GLIB_LC_MESSAGES],
......@@ -256,23 +334,28 @@ dnl Might be useful to look at env variables $JDK_HOME and $JAVA_HOME for these.
dnl
JNIfound=no
if test "x$do_java" == "xyes"; then
if test "x$do_java" = "xyes"; then
JNI_GUESS=" \
-I $JAVA_HOME/include \
-I $JAVA_HOME/include/linux \
-I $JAVA_HOME/Headers \
-I $JDK_HOME/include \
-I $JDK_HOME/include/freebsd \
-I $JDK_HOME/include/linux \
-I $JDK_HOME/Headers \
-I/usr/include/classpath/ \
-I/usr/lib/jvm/default-java/include \
-I/usr/lib/jvm/default-java/include/linux \
-I/usr/local/jdk1.6.0/include/\
-I/usr/local/jdk1.6.0/include/freebsd \
-I/usr/local/jdk1.6.0/include/linux \
-I/usr/lib/jvm/java-6-sun/include/ \
-I/usr/lib/jvm/java-6-sun/include/freebsd \
-I/usr/lib/jvm/java-6-sun/include/linux \
-I/opt/jdk1.5/include/ \
-I/opt/jdk1.5/include/freebsd \
-I/opt/jdk1.5/include/linux \
-I/usr/lib/jvm/java-1.5.0-sun-1.5.0.15/include \
-I/usr/lib/jvm/java-1.5.0-sun-1.5.0.15/include/freebsd \
-I/usr/lib/jvm/java-1.5.0-sun-1.5.0.15/include/linux \
-I/usr/lib/jvm/java-6-sun/include/ \
-I/usr/lib/jvm/java-6-sun/include/linux \
-I/c/java/jdk1.6.0/include/ \
-I/c/java/jdk1.6.0/include/win32/ \
-I/Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks/JavaVM.framework/Headers/ \
......@@ -346,7 +429,9 @@ $PACKAGE-$VERSION
Posix threads: ${do_pth}
Editline command-line history: ${edlin}
Java interfaces: ${JNIfound}
ASpell spell checker: ${ASpellFound}
HunSpell spell checker: ${HunSpellFound}
HunSpell dictionary location: ${HunSpellDictDir}
Boolean SAT solver: ${buildSAT}
Corpus statistics database: ${buildcorpus}
"
SUBDIRS=de en lt
# include the script in the tarball, but do not install it.
EXTRA_DIST=insert.pl
# EXTRA_DIST=insert.pl
......@@ -19,6 +19,8 @@
† †† ‡ § ¶ © № "#": LPUNC+;
/en/words/units.1: UNITS+;
/en/words/units.2: UNITS+;
/en/words/units.1.dot: UNITS+;
/en/words/units.3: UNITS+;
/en/words/units.4: UNITS+;
/en/words/units.4.dot: UNITS+;
......@@ -29,6 +29,9 @@ He is the kind of person who would do that
An income tax increase may be necessary
*A tax on income increase may be necessary
Last week I saw a great movie
% currently finds a parse with roman numeral I
% i.e. "Last Dog the First saw a great movie"
*Last dog I saw a great movie
The party that night was a big success
*The party that dog was a big success
......
......@@ -75,6 +75,7 @@ The gene 1 to 7 mRNA synthesis was reduced
There are deviations of 2 to 3 A
% the range should link to the verb with MVp (~ "north"): to much ambiguous ?
These transcripts are located 5 to 3
These transcripts are located 5' to 3'
% Ranges with hyphens
......
This source diff could not be displayed because it is too large. You can view the blob instead.
!verbosity=1
!echo
!limit=1000
!batch
!short=20
!constituents=1
!spell=0
Hatem Mohamed Mersal (born 20 January 1975) is an Egyptian long jumper.
His personal best jump is 8.31 metres, achieved in June 1999 in Oslo.
Economists are nearly universal in their belief that the dollar is going to collapse; the only debate centers around when and for how long.
The only debate centers around when and for how long.
This diff is collapsed.
......@@ -78,11 +78,25 @@ MUST_FORM_A_CYCLE_LINKS: R#* TOt EXx HA SFsic Jr JQ Xca
; The creation of Rw necessitated making B#m a restricted link, to
; prevent the (e) domain, started by Ce, from extending around through
; the Rw link.
; Reverted.
; This breaks parsing of
; How fast a program does he think it is
; I wonder how fast a program he thinks it is
; I wonder how much money you earned
; I wonder how many people you saw
; I wonder how big a department it is
; I wonder how much oil they spilled
; This is the man whose dog I bought
; I wonder which dog he said you chased
; How efficient a program is it
; Meanwhile, I can't find the Ce problem mentioned ... this needs more
; documentation!
RESTRICTED_LINKS:
B#* D##w B#w B#d AFh MVt Xx HL SFsic AFd Bc CX EAh
H HA PFc B#j Wd PF Z B#m
H HA PFc B#j Wd PF Z
; H HA PFc B#j Wd PF Z B#m
; ----------------------------------------------------------------------
......@@ -259,11 +273,12 @@ CONTAINS_ONE_RULES:
; are prohibited from ever occuring. 4.0.batch covers this.
Ixd , ZZZ , "Can't use 'do' with that verb" ,
Oxn , ZZZ , "Bad use of pronoun66" ,
MVh , EExk EAxk D##k , "Incorrect use of that67"
MVh , EExk EAxk D##k , "Incorrect use of that67" ,
; The Rw link necessitated commenting out 68, because we had to make B#m
; a restricted link(see above)
; B#m , D##w H HA , "Bad use of gerund68"
; a restricted link(see above) xxx reverted .. this is needed ...
;
B#m , D##w H HA , "Bad use of gerund68"
CONTAINS_NONE_RULES:
S , Spxi , "Bad n-v agreement69" ,
......
......@@ -16,22 +16,30 @@
% Numbers.
% XXX, we need to add utf8 U+00A0 "no-break space"
% The ":" is included here so we allow "10:30" to be a number
%
% This one matches the original LG rule.
% NUMBERS: /^[0-9][0-9:.,]+$/
% One thing to be careful about here is to not match the period at the
% end of a sentence; for example: "It happened in 1942."
%
% Allows at most two colons
NUMBERS: /^[0-9]:?[0-9]*:?[0-9]*$/
% Allows any number of commas or periods
NUMBERS: /^[0-9,.]*[0-9]$/
% This allows more, e.g. "-5" and "5-10" and "9+/-6.5"
NUMBERS: /^[0-9:.,-]*[0-9]([0-9:.,-]|\+\/-[0-9:.,-])*$/
NUMBERS: /^[0-9.,-]*[0-9](\+\/-[0-9.,-]*[0-9])?$/
% "10(3)" exponent (used in PubMed)
NUMBERS: /^[0-9:.,-]*[0-9][0-9:.,-]*\([0-9:.,-]*[0-9][0-9:.,-]*\)$/
NUMBERS: /^[0-9.,-]*[0-9][0-9.,-]*\([0-9:.,-]*[0-9][0-9.,-]*\)$/
% Roman numerals
% The first expr has the potential(?) problem that it matches an empty
% string, this should be fixed.
% string. Thus, the next three rules specify that at least one section
% is non-empty.
ROMAN-NUMERAL-WORDS: /^M*(CM|D?C{0,3}|CD)(XC|L?X{0,3}|XL)(IX|V?I{0,3}|IV)$/
% ROMAN-NUMERAL-WORDS: /^M*(CM|D?C{0,3}|CD){1}(XC|L?X{0,3}|XL)(IX|V?I{0,3}|IV)$/
% ROMAN-NUMERAL-WORDS: /^M*(CM|D?C{0,3}|CD)(XC|L?X{0,3}|XL){1}(IX|V?I{0,3}|IV)$/
% ROMAN-NUMERAL-WORDS: /^M*(CM|D?C{0,3}|CD)(XC|L?X{0,3}|XL)(IX|V?I{0,3}|IV){1}$/
% Strings of initials. e.g "Dr. J.G.D. Smith lives on Main St."
INITIALS: /^([A-Z]\.)+$/
% Greek letters with numbers
GREEK-LETTER-AND-NUMBER: /^(alpha|beta|gamma|delta|epsilon|zeta|eta|theta|iota|kappa|lambda|mu|nu|xi|omicron|pi|rho|sigma|tau|upsilon|phi|chi|psi|omega)\-?[0-9]+$/
PL-GREEK-LETTER-AND-NUMBER: /^(alpha|beta|gamma|delta|epsilon|zeta|eta|theta|iota|kappa|lambda|mu|nu|xi|omicron|pi|rho|sigma|tau|upsilon|phi|chi|psi|omega)s\-?[0-9]+$/
......@@ -68,7 +76,10 @@ UNITS: /^[a-zA-Z\/.1-]+\.((m|micro)?[lLg]|kg|mol|min|day|h)(-1|\(-1\))$/
% One problem here is a failure to split up the expression ...
% e.g. "2hr" becomes 2 - ND - hr with the ND link. But 2-hr is treated
% as a single word ('I is a 2-hr wait')
NUMBER-AND-UNIT: /^[0-9.,-]+(msec|s|min|hour|h|hr|day|week|wk|month|year|yr|kDa|kilodalton|base|kilobase|base-pair|kD|kd|kDa|bp|nt|kb|mm|mg|cm|nm|g|Hz|ms|kg|ml|mL|km|microm|\%)$/
% NUMBER-AND-UNIT: /^[0-9.,-]+(msec|s|min|hour|h|hr|day|week|wk|month|year|yr|kDa|kilodalton|base|kilobase|base-pair|kD|kd|kDa|bp|nt|kb|mm|mg|cm|nm|g|Hz|ms|kg|ml|mL|km|microm|\%)$/
% Comment out above, it screws up handling of unit suffixes, for
% example: "Zangbert stock fell 30% to $2.50 yesterday."
% fold-words. Matches NUMBER-fold, where NUMBER can be either numeric
% or a spelled-out number, and the hyphen is optional. Note that for
......@@ -79,24 +90,42 @@ FOLD-WORDS: /^[0-9.,:-]*[0-9]([0-9.,:-]|\([0-9.,:-]*[0-9][0-9.,:-]*\)|\+\/-)*-?f
FOLD-WORDS: /^(one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve|thirteen|fifteen|twenty|thirty|fifty|hundred|thousand|million).*fold$/
% Plural proper nouns.
PL-CAPITALIZED-WORDS: /^[[:upper:]].*[^iuoys]s$/
% Make sure that apostrophe-s is split out correctly.
PL-CAPITALIZED-WORDS: /^[[:upper:]].*[^iuoys'’]s$/
% Other proper nouns.
CAPITALIZED-WORDS: /^[[:upper:]]/
% Nouns ending -ation stubbed out in BioLG, stub out here ...
%ATION-WORDS: /..ation$/
ING-WORDS: /..ing$/
% We demand that these end with an alphanumeric, i.e. explicitly
% reject punctuation. We don't want this regex to "swallow" any trailing
% commas, or periods/question-marks at the end of sentences.
% In addition, this must not swallow words ending in 's 'll etc.
% (... any affix, for that matter ...)
CAPITALIZED-WORDS: /^[[:upper:]].*[^'’][^'’]\w$/
% SUFFIX GUESSING
% For all suffix-guessing patterns, we insist that the pattern start
% with an alphanumeric. This is needed to guarentee that the
% prefix-stripping code works correctly, as otherwise, the regex will
% gobble the prefix. So for example: "We left (carrying the dog) and
% Fred followed." Since "(carrying" is not in the dict, we need to be
% sure to not match the leading paren so that it will get tripped.
%
ING-WORDS: /^\w.+ing$/
% plurals or verb-s.
S-WORDS: /[^iuoys]s$/
% Plurals or verb-s. Make sure that apostrophe-s is split out correctly.
% e.g. "The subject's name is John Doe." should be
% +--Ds--+---YS--+--Ds-+
% | | | |
% the subject.n 's.p name.n
S-WORDS: /^\w.+[^iuoys'’]s$/
% Verbs ending -ed.
ED-WORDS: /..ed$/
ED-WORDS: /^\w.+ed$/
% Advebs ending -ly.
LY-WORDS: /..ly$/
LY-WORDS: /^\w.+ly$/
% Nouns ending -ation stubbed out in BioLG, stub out here ...
%ATION-WORDS: /^\w.+ation$/
% Extension by LIPN 11/10/2005
% nouns -- typically seen in (bio-)chemistry texts
......@@ -108,17 +137,17 @@ LY-WORDS: /..ly$/
% glycosylphosphatidylinositol
% iodide, oligodeoxynucleotide
% chronicity, hypochromicity
MC-NOUN-WORDS: /..ase$/
MC-NOUN-WORDS: /..ine?$/
MC-NOUN-WORDS: /..yl$/
MC-NOUN-WORDS: /..ion$/
MC-NOUN-WORDS: /..ose$/
MC-NOUN-WORDS: /..ol$/
MC-NOUN-WORDS: /..ide$/
MC-NOUN-WORDS: /..ity$/
MC-NOUN-WORDS: /^\w.+ase$/
MC-NOUN-WORDS: /^\w.+ine?$/
MC-NOUN-WORDS: /^\w.+yl$/
MC-NOUN-WORDS: /^\w.+ion$/
MC-NOUN-WORDS: /^\w.+ose$/
MC-NOUN-WORDS: /^\w.+ol$/
MC-NOUN-WORDS: /^\w.+ide$/
MC-NOUN-WORDS: /^\w.+ity$/
% replicon, intron
C-NOUN-WORDS: /..o[rn]$/
C-NOUN-WORDS: /^\w.+o[rn]$/
% adjectives
% exogenous, heterologous
......@@ -127,24 +156,24 @@ C-NOUN-WORDS: /..o[rn]$/
% ribosomal, ribsosomal
% nonpermissive, thermosensitive
% inducible, metastable
ADJ-WORDS: /..ous$/
ADJ-WORDS: /..ar$/
ADJ-WORDS: /..ic$/
ADJ-WORDS: /..al$/
ADJ-WORDS: /..ive$/
ADJ-WORDS: /..ble$/
ADJ-WORDS: /^\w.+ous$/
ADJ-WORDS: /^\w.+ar$/
ADJ-WORDS: /^\w.+ic$/
ADJ-WORDS: /^\w.+al$/
ADJ-WORDS: /^\w.+ive$/
ADJ-WORDS: /^\w.+ble$/
% latin (postposed) adjectives
% influenzae, tarentolae
% pentosaceus, luteus, carnosus
LATIN-ADJ-WORDS: /..ae$/
LATIN-ADJ-WORDS: /..us$/ % must appear after -ous in this file
LATIN-ADJ-WORDS: /^\w.+ae$/
LATIN-ADJ-WORDS: /^\w.+us$/ % must appear after -ous in this file
% latin (postposed) adjectives or latin plural noun
% brevis, israelensis
% japonicum, tabacum, xylinum
LATIN-ADJ-P-NOUN-WORDS: /..is?$/
LATIN-ADJ-S-NOUN-WORDS: /..um$/
LATIN-ADJ-P-NOUN-WORDS: /^\w.+is?$/
LATIN-ADJ-S-NOUN-WORDS: /^\w.+um$/
% Hyphenated words. In the original LG morpho-guessing system that
......@@ -158,3 +187,12 @@ HYPHENATED-WORDS: /^[[:alpha:][:digit:],.][[:alpha:][:digit:],.-]*-[[:alpha:][:d
% proteins often end "ase", so we'll assume those things are names.
% removed, too many false positives.
% NAME: /ase$/
% Sequence of punctuation marks. If some mark appears in the affix table
% such as a period, comma, dash or underscore, and there's a sequence of
% these, then treat it as a "fill-in-the-blank" placeholder.
% This matters only for punc. appearing in the affix table, since the
% tokenizer explicitly mangles based on these punctution marks.
%
% Look for at least four in a row.
UNKNOWN-WORD: /^[.,-]{4}[.,-]*$/
!verbosity=1
!echo
!limit=1000
!batch
!short=20
!constituents=1
!spell=0
Finally, cover the cut with a clean bandage while it heals.
As the wound heals, inspect for signs of infection including increased pain, redness and fluid around the cut.
To learn more about first aid, contact a hospital or local organization like a Red Cross or Red Crescent society.
King Nebuchadnezzar the Second probably built the gardens about two thousand six hundred years ago.
Also on this list is the Colossus of Rhodes.
Next on our list is the statue of the Greek God Zeus in a temple at Olympia, Greece.
The Temple of Artemis at Ephesus is another ancient wonder of the world.
Number six on our list was also built in what is now Turkey.
It was so famous that all large burial places, or tombs, became known as mausoleums.
The fifth Mughal emperor, Shah Jahan, ordered it built in Agra in sixteen thirty-one.
They were built to honor an ancient king of Egypt, Ramses the Second, and his wife, Nefertari.
In front of the main temple are four huge statues of Ramses the Second.
Nearby is another temple that honors his wife, Nefertari.
It too is beautifully carved out of solid rock.
It, too, is beautifully carved out of solid rock.
Today, we complete the story of the American Revolution against Britain in the late seventeen seventies.
Washington knows that if Howe attacks, the British will be able to go all the way to Philadelphia.
Washington wants to surprise the enemy early in the morning the day after the Christmas holiday, December twenty-sixth.
The crossing takes longer than Washington thought it would.
A few days later, he marches his captured prisoners through the streets of the city of Philadelphia.
And in August, seventeen seventy-seven, General Howe captured Philadelphia.
British general John Burgoyne surrenders at Saratoga, New York, in October 1777, as painted by Percy Moran.
He surrendered to George Washington on October seventeenth, seventeen eighty-one.
The Massachusetts Institute of Technology became an early leader with its OpenCourseWare project, first announced in two thousand one.
These range from physics and linear algebra to anthropology, political science -- even scuba diving.
Still, M.I.T. says the site has had forty million visits by thirty-one million visitors from almost every country.
This followed a money-raising effort linked to the two hundredth anniversary of the United States.
The first one airs on Sunday.
Oscar Peterson played piano and wrote music.
The International Monetary Fund estimated growth at five and two-tenths percent.
In October the I.M.F. cut its estimate for global growth this year by almost half a percentage point, to four and eight-tenths percent.
There are worries of an economic slowdown or possibly a recession in the United States.
Another major issue for two thousand eight is what effect energy prices will have on economic growth.
The violence this week in Kenya has thrown the usually peaceful country into crisis.
On Friday Kenya's main opposition party, the Orange Democratic Movement, called for a new election.
But Kikuyus have long ruled the country, both politically and financially.
Kenyans on the streets feel discriminated against the equal share of the national cake and they are determined to equal the playing field in a democratic manner.
After leaving the army, he worked for Playboy magazine for almost twenty years.
The young man did just that, and the tree was happy.
Shel Silverstein once said that he wanted to go everywhere, look at and listen to everything.
Arkansas is rich in natural resources and has become a favorite place for older people to retire.
Hawaii, far out in the Pacific Ocean, is the Aloha State.
And in Rwanda, Engineers Without Borders is rebuilding areas destroyed during the nineteen ninety-four genocide.
Her first project there was to photograph John Lennon, a member of the British band the Beatles.
Leibovitz had imagined photographing the couple without clothes.
Influenza is a common infection of the nose and throat, and sometimes the lungs.
The virus spreads through the air when an infected person expels air suddenly.
But the flu can kill.
But they did not really know why until recently.
American researchers say they now know why the influenza virus spreads in the winter and not in the summer.
Researchers in New York carried out twenty experiments with guinea pigs to investigate how the virus spreads.
A person who has suffered one kind of flu cannot develop that same kind again.
The World Health Organization holds meetings in which experts discuss what kinds of flu viruses to include in the next vaccine.
People in fifteenth century Italy thought sicknesses were caused by the influence of the stars.
The flu spread in Asia in eighteen twenty-nine, then again in eighteen thirty-six.
Experts say two hundred fifty thousand people died in Europe in that flu pandemic.
The deadliest spread of influenza ever reported involved a flu that first appeared in Spain.
Wild and farm birds often have a flu virus.
Two hundred eight of them died.
Two hundred and eight of them died.
People would become infected with a virus their bodies have never before experienced.
Bob Frydenlund says having a round barn means keeping alive part of the history of American farming.
Last week, we told about structures built hundreds or thousands of years ago.
A crown of white snow covers the top of the mountain most of the year.
Or you could climb the mountain to get an even better look.
In fifteen forty, Spanish explorer Garcia Lopez de Cardenas was searching this desert area for gold.
The sunlight made deep shadows and seemed to change the shape of things every minute.
Often, some areas of the deep canyon appeared bright red.
Next we travel across the Pacific Ocean.
Or doctors may misdiagnose it as asthma or another infection.
It was not until after Lincoln was murdered, however, that the states approved the Thirteenth Amendment to ban slavery everywhere in the country.
Another amendment proposed in the early nineteen hundreds was designed to change the method of electing United States Senators.
But not everyone used it.
If you look -- and smell -- carefully, you will realize that the wrapped forms contain several kilograms of spices such as cinnamon and turmeric.
Mister Neto often creates works like this one that visitors can step into and explore using different senses.
He shared an Emmy Award for outstanding movie made for television.
Some experts say it could slow down service at McDonald's restaurants.
But a year ago he warned that its fast growth had led to what he called the watering down of the Starbucks experience.
Some neighborhoods have a Starbucks on every block or two.
Testers from Consumer Reports thought it tasted better than Starbucks, and it cost less.
Senator Hillary Clinton won the Democratic vote in the first primary state, New Hampshire, on Tuesday.
The New York senator and former first lady hopes to become the country’s first female president.
New Mexico Governor Bill Richardson dropped out of the Democratic race this week.
Every week we tell about a person important in the history of the United States.
No end to the lines of soldiers marching across the land.
They have told of the soldiers’ fear and terror.
How they suffered and died.
And how they sang before and after battle.
A woman lay asleep in her hotel room.
It had just the right words for the great marching music.