Commit 05c5685a authored by Jeremy Bicha's avatar Jeremy Bicha

New upstream version 5.5.1

parent 2d63e231
Version 5.5.1 (27 July 2018)
* Fix broken Java bindings build.
* English dict: Fix clause openers with questions.
* English dict: Various misc fixes.
* English dict: Various paraphrasing verbs
* Bring the SQL-backed dict to production state.
* Convert MSVC build to MSVC15 (Visual Studio 2017).
* Restore the repeatability of the produced linkages.
Version 5.5.0 (29 April 2018) Version 5.5.0 (29 April 2018)
* Fix accidental API breakage that impacts OpenCog. * Fix accidental API breakage that impacts OpenCog.
* Fix memory leak when parsing with null links. * Fix memory leak when parsing with null links.
......
...@@ -27,29 +27,31 @@ EXTRA_DIST = \ ...@@ -27,29 +27,31 @@ EXTRA_DIST = \
MAINTAINERS \ MAINTAINERS \
NEWS \ NEWS \
README.md \ README.md \
debug/README.md \
docker/docker-build.sh \ docker/docker-build.sh \
docker/docker-base/Dockerfile \ docker/docker-base/Dockerfile \
docker/docker-parser/Dockerfile \ docker/docker-parser/Dockerfile \
docker/docker-python/Dockerfile \ docker/docker-python/Dockerfile \
docker/docker-server/Dockerfile \ docker/docker-server/Dockerfile \
m4/varcheckpoint.m4 \ m4/varcheckpoint.m4 \
msvc14/LGlib-features.props \ msvc/LGlib-features.props \
msvc14/LinkGrammarExe.vcxproj \ msvc/LinkGrammarExe.vcxproj \
msvc14/LinkGrammarExe.vcxproj.filters \ msvc/LinkGrammarExe.vcxproj.filters \
msvc14/LinkGrammarJava.vcxproj \ msvc/LinkGrammarJava.vcxproj \
msvc14/LinkGrammarJava.vcxproj.filters \ msvc/LinkGrammarJava.vcxproj.filters \
msvc14/LinkGrammar.sln \ msvc/LinkGrammar.sln \
msvc14/LinkGrammar.vcxproj \ msvc/LinkGrammar.vcxproj \
msvc14/LinkGrammar.vcxproj.filters \ msvc/LinkGrammar.vcxproj.filters \
msvc14/Local.props \ msvc/Local.props \
msvc14/confvar.bat \ msvc/confvar.bat \
msvc14/MSVC-common.props \ msvc/MSVC-common.props \
msvc14/post-build.bat \ msvc/post-build.bat \
msvc14/Python2.vcxproj \ msvc/Python2.vcxproj \
msvc14/Python2.vcxproj.filters \ msvc/Python3.vcxproj \
msvc14/Python3.vcxproj.filters \ msvc/Python2.vcxproj.filters \
msvc14/README.md \ msvc/Python3.vcxproj.filters \
msvc14/make-check.py \ msvc/README.md \
msvc/make-check.py \
mingw/README-Cygwin.md \ mingw/README-Cygwin.md \
mingw/README-MSYS.md \ mingw/README-MSYS.md \
mingw/README-MSYS2.md \ mingw/README-MSYS2.md \
......
...@@ -448,29 +448,31 @@ EXTRA_DIST = \ ...@@ -448,29 +448,31 @@ EXTRA_DIST = \
MAINTAINERS \ MAINTAINERS \
NEWS \ NEWS \
README.md \ README.md \
debug/README.md \
docker/docker-build.sh \ docker/docker-build.sh \
docker/docker-base/Dockerfile \ docker/docker-base/Dockerfile \
docker/docker-parser/Dockerfile \ docker/docker-parser/Dockerfile \
docker/docker-python/Dockerfile \ docker/docker-python/Dockerfile \
docker/docker-server/Dockerfile \ docker/docker-server/Dockerfile \
m4/varcheckpoint.m4 \ m4/varcheckpoint.m4 \
msvc14/LGlib-features.props \ msvc/LGlib-features.props \
msvc14/LinkGrammarExe.vcxproj \ msvc/LinkGrammarExe.vcxproj \
msvc14/LinkGrammarExe.vcxproj.filters \ msvc/LinkGrammarExe.vcxproj.filters \
msvc14/LinkGrammarJava.vcxproj \ msvc/LinkGrammarJava.vcxproj \
msvc14/LinkGrammarJava.vcxproj.filters \ msvc/LinkGrammarJava.vcxproj.filters \
msvc14/LinkGrammar.sln \ msvc/LinkGrammar.sln \
msvc14/LinkGrammar.vcxproj \ msvc/LinkGrammar.vcxproj \
msvc14/LinkGrammar.vcxproj.filters \ msvc/LinkGrammar.vcxproj.filters \
msvc14/Local.props \ msvc/Local.props \
msvc14/confvar.bat \ msvc/confvar.bat \
msvc14/MSVC-common.props \ msvc/MSVC-common.props \
msvc14/post-build.bat \ msvc/post-build.bat \
msvc14/Python2.vcxproj \ msvc/Python2.vcxproj \
msvc14/Python2.vcxproj.filters \ msvc/Python3.vcxproj \
msvc14/Python3.vcxproj.filters \ msvc/Python2.vcxproj.filters \
msvc14/README.md \ msvc/Python3.vcxproj.filters \
msvc14/make-check.py \ msvc/README.md \
msvc/make-check.py \
mingw/README-Cygwin.md \ mingw/README-Cygwin.md \
mingw/README-MSYS.md \ mingw/README-MSYS.md \
mingw/README-MSYS2.md \ mingw/README-MSYS2.md \
......
[ANNOUNCE] Link-Grammar Version 5.5.0 is now available.
Version 5.5.0 of link-grammar has been released. It contains several
important bug-fixes for opencog users.
* The previous version accidentally broke the opencog API. This version
fixes it.
* Linkages generated by the "ANY" random parser were not actually being
randomized. This is now fixed. (Bug reported by Andres.)
* Poorly-formated dictionaries no longer report errors. (Bug reported
by Alexei/Anton)
The complete list of changes is:
* Fix accidental API breakage that impacts OpenCog.
* Fix memory leak when parsing with null links.
* Python bindings: Add an optional parse-option argument to parse().
* Add an extended version API and use it in "link-parser --version".
* Fix spurious errors if the last dict line is a comment.
* Fix garbage report if EOF encountered in a quoted dict word.
* Fix garbage report if whitespace encountered in a quoted dict word.
* Add a per-command help in link-parser.
* Add a command line completion in link-parser.
* Enable build of word-graph printing support by default.
* Add idiom lookup in link-parser's dict lookup command (!!idiom_here).
* Improve handling of quoted words (e.g. single words in "scare
* quotes").
* Fix random selection of linkages so that it's actually random.
You can download link-grammar from
http://www.abisource.com/downloads/link-grammar/current/
The website is here:
https://www.abisource.com/projects/link-grammar/
WHAT IS LINK GRAMMER?
The Link Grammar Parser is a syntactic parser of English (and other
languages as well), based on Link Grammar, an original theory of English
syntax. Given a sentence, the system assigns to it a syntactic structure,
which consists of a set of labelled links connecting pairs of words.
=================================================================
=================================================================
=================================================================
[ANNOUNCE] Link-Grammar Version 5.4.4 is now available. [ANNOUNCE] Link-Grammar Version 5.4.4 is now available.
I'm pleased to announce that version 5.4.4 is now available. I don't I'm pleased to announce that version 5.4.4 is now available. I don't
......
Link Grammar Parser Link Grammar Parser
=================== ===================
***Version 5.5.0*** ***Version 5.5.1***
The Link Grammar Parser implements the Sleator/Temperley/Lafferty The Link Grammar Parser implements the Sleator/Temperley/Lafferty
theory of natural language parsing. This version of the parser is theory of natural language parsing. This version of the parser is
...@@ -169,7 +169,7 @@ Contents ...@@ -169,7 +169,7 @@ Contents
| configure | The GNU configuration script | | configure | The GNU configuration script |
| autogen.sh | Developer's configure maintenance tool | | autogen.sh | Developer's configure maintenance tool |
| debug/ | Information for debugging the library | | debug/ | Information for debugging the library |
| msvc14/ | Microsoft Visual-C project files | | msvc/ | Microsoft Visual-C project files |
| mingw/ | Information on using MinGW under MSYS or Cygwin | | mingw/ | Information on using MinGW under MSYS or Cygwin |
UNPACKING and signature verification UNPACKING and signature verification
...@@ -186,7 +186,7 @@ corruption of the dataset during download, and to help ensure that ...@@ -186,7 +186,7 @@ corruption of the dataset during download, and to help ensure that
no malicious changes were made to the code internals by third no malicious changes were made to the code internals by third
parties. The signatures can be checked with the gpg command: parties. The signatures can be checked with the gpg command:
`gpg --verify link-grammar-5.5.0.tar.gz.asc` `gpg --verify link-grammar-5.5.1.tar.gz.asc`
which should generate output identical to (except for the date): which should generate output identical to (except for the date):
``` ```
...@@ -201,7 +201,7 @@ verify the check-sums, issue `md5sum -c MD5SUM` at the command line. ...@@ -201,7 +201,7 @@ verify the check-sums, issue `md5sum -c MD5SUM` at the command line.
Tags in `git` can be verified by performing the following: Tags in `git` can be verified by performing the following:
``` ```
gpg --recv-keys --keyserver keyserver.ubuntu.com EB6AA534E0C0651C gpg --recv-keys --keyserver keyserver.ubuntu.com EB6AA534E0C0651C
git tag -v link-grammar-5.5.0 git tag -v link-grammar-5.5.1
``` ```
...@@ -477,8 +477,8 @@ See [mingw/README-Cygwin.md](mingw/README-Cygwin.md). ...@@ -477,8 +477,8 @@ See [mingw/README-Cygwin.md](mingw/README-Cygwin.md).
BUILDING and RUNNING on Windows (MSVC) BUILDING and RUNNING on Windows (MSVC)
-------------------------------------- --------------------------------------
Microsoft Visual C/C++ project files can be found in the msvc14 directory. Microsoft Visual C/C++ project files can be found in the `msvc` directory.
For directions see the [README.md](msvc14/README.md) file there. For directions see the [README.md](msvc/README.md) file there.
RUNNING the program RUNNING the program
------------------- -------------------
...@@ -956,6 +956,9 @@ but this `Js` link crosses (clashes with - marked by xxx) the link ...@@ -956,6 +956,9 @@ but this `Js` link crosses (clashes with - marked by xxx) the link
to the conjunction. These two cases suggest that one should to the conjunction. These two cases suggest that one should
allow most links to cross over the down-links to conjunctions. allow most links to cross over the down-links to conjunctions.
This is currently worked-around by splitting the Js link into two:
a Jj part and a Jk part; the two are used together to cross over
the conjunction.
Type Theory Type Theory
...@@ -1407,6 +1410,10 @@ http://www.corpus.bham.ac.uk/publications/index.shtml ...@@ -1407,6 +1410,10 @@ http://www.corpus.bham.ac.uk/publications/index.shtml
edited by Elena Tognini-Bonelli, volume 4), 2000<br> edited by Elena Tognini-Bonelli, volume 4), 2000<br>
[Book review](http://www.aclweb.org/anthology/J01-2013). [Book review](http://www.aclweb.org/anthology/J01-2013).
“The Molecular Level of Lexical Semantics”, EA Nida, (1997)
International Journal of Lexicography, 10(4): 265–274.
[Online](https://www.academia.edu/36534355/The_Molecular_Level_of_Lexical_Semantics_by_EA_Nida)
### "holes" in collocations (aka "set phrases" of "phrasemes"): ### "holes" in collocations (aka "set phrases" of "phrasemes"):
The link-grammar provides several mechanisms to support The link-grammar provides several mechanisms to support
circumpositions or even more complicated multi-word structures. circumpositions or even more complicated multi-word structures.
...@@ -1534,6 +1541,11 @@ http://www.phon.ucl.ac.uk/home/dick/enc2010/articles/relative-clause.htm ...@@ -1534,6 +1541,11 @@ http://www.phon.ucl.ac.uk/home/dick/enc2010/articles/relative-clause.htm
mutual information content, they can dominate the syntactic mutual information content, they can dominate the syntactic
structure of a sentence. structure of a sentence.
### Preposition linking:
The current parse of "he wanted to look at and listen to everything."
is inadequate: the link to "everything" needs to connect to "and", so
that "listen to" and "look at" are treated as atomic verb phrases.
### Lexical functions: ### Lexical functions:
MTT suggests that perhaps the correct way to understand the contents MTT suggests that perhaps the correct way to understand the contents
of the post-processing rules is as an implementation of 'lexical of the post-processing rules is as an implementation of 'lexical
...@@ -1574,11 +1586,13 @@ analysis. To quote Wikipedia: ...@@ -1574,11 +1586,13 @@ analysis. To quote Wikipedia:
> "tower". > "tower".
### Morphology printing: ### Morphology printing:
Instead of hard-coding LL, declare which links are morpho links Instead of hard-coding LL, declare which links are morpho links
in the dict. in the dict.
### UTF-8 cleanup: ### UTF-8 cleanup:
Hmm. Is this really needed? UTF-8 seems to work well, now. So maybe
leave it alone.
Replace the mbrtowc code with proper language support; it seems Replace the mbrtowc code with proper language support; it seems
that the correct solution is to use [ICU](http://site.icu-project.org/) that the correct solution is to use [ICU](http://site.icu-project.org/)
* ICU pros: runs on windows. * ICU pros: runs on windows.
......
...@@ -45,6 +45,11 @@ install-data-hook: ...@@ -45,6 +45,11 @@ install-data-hook:
uninstall-hook: uninstall-hook:
-rm ${DESTDIR}${javadir}/linkgrammar.jar -rm ${DESTDIR}${javadir}/linkgrammar.jar
dist-hook:
if HAVE_ANT
: # Validate that the bindings can actually be created
zipinfo $(java_DATA) org/linkgrammar/LinkGrammar.class >/dev/null
endif
EXTRA_DIST = \ EXTRA_DIST = \
README \ README \
......
...@@ -459,6 +459,9 @@ distdir: $(DISTFILES) ...@@ -459,6 +459,9 @@ distdir: $(DISTFILES)
|| exit 1; \ || exit 1; \
fi; \ fi; \
done done
$(MAKE) $(AM_MAKEFLAGS) \
top_distdir="$(top_distdir)" distdir="$(distdir)" \
dist-hook
check-am: all-am check-am: all-am
check: check-am check: check-am
all-am: Makefile $(DATA) all-am: Makefile $(DATA)
...@@ -496,8 +499,8 @@ clean-generic: ...@@ -496,8 +499,8 @@ clean-generic:
maintainer-clean-generic: maintainer-clean-generic:
@echo "This command is intended for maintainers to use" @echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild." @echo "it deletes files that may require special tools to rebuild."
@HAVE_ANT_FALSE@clean-local:
@HAVE_ANT_FALSE@distclean-local: @HAVE_ANT_FALSE@distclean-local:
@HAVE_ANT_FALSE@clean-local:
clean: clean-am clean: clean-am
clean-am: clean-generic clean-libtool clean-local mostlyclean-am clean-am: clean-generic clean-libtool clean-local mostlyclean-am
...@@ -569,17 +572,17 @@ uninstall-am: uninstall-javaDATA ...@@ -569,17 +572,17 @@ uninstall-am: uninstall-javaDATA
.MAKE: install-am install-data-am install-strip uninstall-am .MAKE: install-am install-data-am install-strip uninstall-am
.PHONY: all all-am check check-am clean clean-generic clean-libtool \ .PHONY: all all-am check check-am clean clean-generic clean-libtool \
clean-local cscopelist-am ctags-am distclean distclean-generic \ clean-local cscopelist-am ctags-am dist-hook distclean \
distclean-libtool distclean-local distdir dvi dvi-am html \ distclean-generic distclean-libtool distclean-local distdir \
html-am info info-am install install-am install-data \ dvi dvi-am html html-am info info-am install install-am \
install-data-am install-data-hook install-dvi install-dvi-am \ install-data install-data-am install-data-hook install-dvi \
install-exec install-exec-am install-html install-html-am \ install-dvi-am install-exec install-exec-am install-html \
install-info install-info-am install-javaDATA install-man \ install-html-am install-info install-info-am install-javaDATA \
install-pdf install-pdf-am install-ps install-ps-am \ install-man install-pdf install-pdf-am install-ps \
install-strip installcheck installcheck-am installdirs \ install-ps-am install-strip installcheck installcheck-am \
maintainer-clean maintainer-clean-generic mostlyclean \ installdirs maintainer-clean maintainer-clean-generic \
mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \ mostlyclean mostlyclean-generic mostlyclean-libtool pdf pdf-am \
tags-am uninstall uninstall-am uninstall-hook \ ps ps-am tags-am uninstall uninstall-am uninstall-hook \
uninstall-javaDATA uninstall-javaDATA
.PRECIOUS: Makefile .PRECIOUS: Makefile
...@@ -606,6 +609,10 @@ install-data-hook: ...@@ -606,6 +609,10 @@ install-data-hook:
uninstall-hook: uninstall-hook:
-rm ${DESTDIR}${javadir}/linkgrammar.jar -rm ${DESTDIR}${javadir}/linkgrammar.jar
dist-hook:
@HAVE_ANT_TRUE@ : # Validate that the bindings can actually be created
@HAVE_ANT_TRUE@ zipinfo $(java_DATA) org/linkgrammar/LinkGrammar.class >/dev/null
# Tell versions [3.59,3.63) of GNU make to not export all variables. # Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded. # Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT: .NOEXPORT:
This diff is collapsed.
...@@ -21,8 +21,13 @@ EXTRA_DIST = \ ...@@ -21,8 +21,13 @@ EXTRA_DIST = \
README.md \ README.md \
example.py \ example.py \
sentence-check.py \ sentence-check.py \
parses-demo-sql.txt \
parses-en.txt \ parses-en.txt \
parses-lt.txt \ parses-lt.txt \
parses-pos-en.txt \
parses-pos-he.txt \
parses-pos-ru.txt \
parses-pos-spell-en.txt \
parses-quotes-en.txt \ parses-quotes-en.txt \
parses-sat-en.txt \ parses-sat-en.txt \
lg_testutils.py \ lg_testutils.py \
......
...@@ -531,8 +531,13 @@ EXTRA_DIST = \ ...@@ -531,8 +531,13 @@ EXTRA_DIST = \
README.md \ README.md \
example.py \ example.py \
sentence-check.py \ sentence-check.py \
parses-demo-sql.txt \
parses-en.txt \ parses-en.txt \
parses-lt.txt \ parses-lt.txt \
parses-pos-en.txt \
parses-pos-he.txt \
parses-pos-ru.txt \
parses-pos-spell-en.txt \
parses-quotes-en.txt \ parses-quotes-en.txt \
parses-sat-en.txt \ parses-sat-en.txt \
lg_testutils.py \ lg_testutils.py \
......
...@@ -69,8 +69,8 @@ configured with the SAT solver (this is currently the case for native ...@@ -69,8 +69,8 @@ configured with the SAT solver (this is currently the case for native
Windows builds). Windows builds).
The test procedure is outlined below. For native Windows/MinGW, see The test procedure is outlined below. For native Windows/MinGW, see
the `msvc14/README.md` file: the `msvc/README.md` file:
[Running Python programs in Windows](/msvc14/README.md#running-python-programs). [Running Python programs in Windows](/msvc/README.md#running-python-programs).
### Testing the build directory ### Testing the build directory
The following is assumed: The following is assumed:
......
...@@ -7,7 +7,7 @@ def add_eqcost_linkage_order(original_class): ...@@ -7,7 +7,7 @@ def add_eqcost_linkage_order(original_class):
be in a deterministic order. be in a deterministic order.
Usage: lg_testutils.add_eqcost_linkage_order(Sentence) Usage: lg_testutils.add_eqcost_linkage_order(Sentence)
""" """
class eqcost_soretd_parse(original_class.sentence_parse): class eqcost_sorted_parse(original_class.sentence_parse):
""" """
Sort equal-cost linkages according to the alphabetic order of their Sort equal-cost linkages according to the alphabetic order of their
diagram string, on demand. We need it because the order of linkages diagram string, on demand. We need it because the order of linkages
...@@ -73,6 +73,6 @@ def add_eqcost_linkage_order(original_class): ...@@ -73,6 +73,6 @@ def add_eqcost_linkage_order(original_class):
# parse() has an optional single argument for parse options. If it is not given, # parse() has an optional single argument for parse options. If it is not given,
# call original_parse() also without arguments in order to test it that way. # call original_parse() also without arguments in order to test it that way.
linkages = self.original_parse() if parse_options is None else self.original_parse(parse_options) linkages = self.original_parse() if parse_options is None else self.original_parse(parse_options)
return eqcost_soretd_parse(linkages) return eqcost_sorted_parse(linkages)
original_class.parse = parse original_class.parse = parse
% This file contains test sentences to verify that the SQL dict
% works. It contains more than one sentence to check that memory
% is freed properly (e.g by using LSAN).
Ithis is a test
O
O +------WV------+--Osm--+
O +---Wd---+-Ss*b+ +-Ds-+
O | | | | |
OLEFT-WALL this.p is.v a test.n
O
Ithis is another test
O
O +------WV------+-----Osm-----+
O +---Wd---+-Ss*b+ +---Ds--+
O | | | | |
OLEFT-WALL this.p is.v another test.n
O
...@@ -40,8 +40,8 @@ C ...@@ -40,8 +40,8 @@ C
IY'gotta do it this way IY'gotta do it this way
O O
O +---->WV---->+ +------MVa-----+ O +------->WV------>+ +------MVa-----+
O +->Wd--+-Sp*i+--I*t--+Osm+ +-Dsu-+ O +-->Wd---+--Sp*i--+--I*t--+Osm+ +-Dsu-+
O | | | | | | | O | | | | | | |
OLEFT-WALL y' gotta.v-d do.v it this.d way.n OLEFT-WALL y'.#you gotta.v-d do.v it this.d way.n
O O
% This file contains test sentences, and the expected positions (start, end)
% of their words. The first P line is char position, and the second one is
% byte position.
Ithis is a test
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) a(8, 9) test.n(10, 14) RIGHT-WALL(14, 14)
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) a(8, 9) test.n(10, 14) RIGHT-WALL(14, 14)
P
% Middle extra whitespace.
Ithis is a test
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) a(8, 9) test.n(11, 15) RIGHT-WALL(15, 15)
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) a(8, 9) test.n(11, 15) RIGHT-WALL(15, 15)
P
% Initial whitespace.
I this is a test
PLEFT-WALL(0, 0) this.p(1, 5) is.v(6, 8) a(9, 10) test.n(11, 15) RIGHT-WALL(15, 15)
PLEFT-WALL(0, 0) this.p(1, 5) is.v(6, 8) a(9, 10) test.n(11, 15) RIGHT-WALL(15, 15)
P
% Various kinds of input splits.
Iit's a test.
PLEFT-WALL(0, 0) it(0, 2) 's.v(2, 4) a(5, 6) test.n(7, 11) .(11, 12) RIGHT-WALL(12, 12)
PLEFT-WALL(0, 0) it(0, 2) 's.v(2, 4) a(5, 6) test.n(7, 11) .(11, 12) RIGHT-WALL(12, 12)
P
Ithis is--a test
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) --.r(7, 9) a(9, 10) test.n(11, 15) RIGHT-WALL(15, 15)
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) --.r(7, 9) a(9, 10) test.n(11, 15) RIGHT-WALL(15, 15)
P
% A different byte and char positions for non-ASCII.
II love going to the café.
PLEFT-WALL(0, 0) I.p(0, 1) love.v(2, 6) going.v(7, 12) to.r(13, 15) the(16, 19) café.n(20, 24) .(24, 25) RIGHT-WALL(25, 25)
PLEFT-WALL(0, 0) I.p(0, 1) love.v(2, 6) going.v(7, 12) to.r(13, 15) the(16, 19) café.n(20, 25) .(25, 26) RIGHT-WALL(26, 26)
P
% Test linkages w/null-linked words
-max_null_count=1
IThis is a the test
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) [a](8, 9) the(10, 13) test.n(14, 18) RIGHT-WALL(18, 18)
PLEFT-WALL(0, 0) this.p(0, 4) is.v(5, 7) [a](8, 9) the(10, 13) test.n(14, 18) RIGHT-WALL(18, 18)
P
% Here "As" gets split by the tokenizer into 2 alternatives:
% 1: As
% 2: A.u s.u
% There is no full linkage, and in order to show the result in a reasonable
% way (this is most beneficial if there are more then two such alternatives)
% the library combines back the failed splits and should recalculate
% the word position of these combined splits correctly.
IAs no linkage
PLEFT-WALL(0, 0) [as](0, 2) no.ij(3, 5) linkage.n-u(6, 13) RIGHT-WALL(13, 13)
PLEFT-WALL(0, 0) [as](0, 2) no.ij(3, 5) linkage.n-u(6, 13) RIGHT-WALL(13, 13)
P
% For the file purpose and format see parses-pos-en.txt.
Iהכלב רץ מהחצר
PLEFT-WALL(0, 0) (0, 1) הכלב(0, 4) רץ(5, 7) (8, 9) (9, 10) מהחצר(8, 13)
PLEFT-WALL(0, 0) (0, 2) הכלב(0, 8) רץ(9, 13) (14, 16) (16, 18) מהחצר(14, 24)
P
-display_morphology = True
Iהכלב רץ מהחצר
PLEFT-WALL(0, 0) ה=(0, 1) כלב(1, 4) רץ(5, 7) מ=(8, 9) ה=(9, 10) חצר(10, 13)
PLEFT-WALL(0, 0) ה=(0, 2) כלב(2, 8) רץ(9, 13) מ=(14, 16) ה=(16, 18) חצר(18, 24)
P
% For the file purpose and format see parses-pos-en.txt.
Iэто тести
PLEFT-WALL(0, 0) это.msi(0, 3) тести.nlmpi(4, 9) RIGHT-WALL(9, 9)
PLEFT-WALL(0, 0) это.msi(0, 6) тести.nlmpi(7, 17) RIGHT-WALL(17, 17)
P
-display_morphology = True
Iэто тести
PLEFT-WALL(0, 0) это.msi(0, 3) тест.=(4, 8) =и.nlmpi(8, 9) RIGHT-WALL(9, 9)
PLEFT-WALL(0, 0) это.msi(0, 6) тест.=(7, 15) =и.nlmpi(15, 17) RIGHT-WALL(17, 17)
P
% This file contains test sentences, and the expected positions (start, end)
% of their words. The first P line is char position, and the second one is
% byte position.
% Validate that the linkage words that are a result of a spell guess have
% the position of the original sentence words.
% The word "seasand" gets broken in 2 possibilities.
Ithe seasand lakes are hot
PLEFT-WALL(0, 0) the(0, 3) seas[&].n(4, 8) and[&].j-n(8, 11) lakes.n(12, 17) are.v(18, 21) hot.a(22, 25) RIGHT-WALL(25, 25)
PLEFT-WALL(0, 0) the(0, 3) seas[&].n(4, 8) and[&].j-n(8, 11) lakes.n(12, 17) are.v(18, 21) hot.a(22, 25) RIGHT-WALL(25, 25)
P
N
PLEFT-WALL(0, 0) the(0, 3) sea[&].n-u(4, 7) sand[&].n-u(7, 11) lakes.n(12, 17) are.v(18, 21) hot.a(22, 25) RIGHT-WALL(25, 25)
PLEFT-WALL(0, 0) the(0, 3) sea[&].n-u(4, 7) sand[&].n-u(7, 11) lakes.n(12, 17) are.v(18, 21) hot.a(22, 25) RIGHT-WALL(25, 25)
P
% The following misspelled word has only one possible spell guess.
% (This is needed because we check here only the first linkage.)
II love going to a zooo.
PLEFT-WALL(0, 0) I.p(0, 1) love.v(2, 6) going.v(7, 12) to.r(13, 15) a(16, 17) zoo[~].n(18, 22) .(22, 23) RIGHT-WALL(23, 23)
PLEFT-WALL(0, 0) I.p(0, 1) love.v(2, 6) going.v(7, 12) to.r(13, 15) a(16, 17) zoo[~].n(18, 22) .(22, 23) RIGHT-WALL(23, 23)
P
% This file contains test cases for the SAT solver, % This file contains test cases for the SAT solver,
% to validate that it works. % to validate that it works.
% Since the SAT solver doesn't order (for now) its solutions % Since the SAT solver doesn't order (for now) its solutions
% according to cost, this is just the solution it omits first % according to cost, this is just the solution it emits first
% for the given sentence. % for the given sentence.
Ithis is a test Ithis is a test
......
...@@ -18,15 +18,22 @@ Sentence has 1 unlinked word: ...@@ -18,15 +18,22 @@ Sentence has 1 unlinked word:
3: LEFT-WALL this.p is.v [a] the test.n of bfgiuing[!].g and.j-n xxxvfrg[?].a RIGHT-WALL 3: LEFT-WALL this.p is.v [a] the test.n of bfgiuing[!].g and.j-n xxxvfrg[?].a RIGHT-WALL
4: LEFT-WALL this.p is.v a [the] test.n of bfgiuing[!].g and.j-n xxxvfrg[?].a RIGHT-WALL 4: LEFT-WALL this.p is.v a [the] test.n of bfgiuing[!].g and.j-n xxxvfrg[?].a RIGHT-WALL
""" """
from __future__ import print_function from __future__ import print_function
import sys import sys
from sys import stdin
import re import re
import itertools
import argparse import argparse
import readline
from linkgrammar import (Sentence, ParseOptions, Dictionary, from linkgrammar import (Sentence, ParseOptions, Dictionary,
LG_Error, LG_TimerExhausted, Clinkgrammar as clg) LG_Error, LG_TimerExhausted, Clinkgrammar as clg)
def is_python2():
return sys.version_info[:1] == (2,)
get_input = raw_input if is_python2() else input
def nsuffix(q): def nsuffix(q):
return '' if q == 1 else 's' return '' if q == 1 else 's'
...@@ -38,7 +45,13 @@ class Formatter(argparse.HelpFormatter): ...@@ -38,7 +45,13 @@ class Formatter(argparse.HelpFormatter):
#-----------------------------------------------------------------------------# #-----------------------------------------------------------------------------#
is_stdin_atty = sys.stdin.isatty()
PROMPT = "sentence-check: " if is_stdin_atty else ""
DISPLAY_GUESSES = True # Display regex and POS guesses DISPLAY_GUESSES = True # Display regex and POS guesses
BATCH_LABELS = '*: '
print ("Version:", clg.linkgrammar_get_version())
args = argparse.ArgumentParser(formatter_class=Formatter) args = argparse.ArgumentParser(formatter_class=Formatter)
args.add_argument('lang', nargs='?', default='en', args.add_argument('lang', nargs='?', default='en',
...@@ -50,6 +63,8 @@ args.add_argument("-p", "--position", action="store_true", ...@@ -50,6 +63,8 @@ args.add_argument("-p", "--position", action="store_true",