0.34 - 2024-04-26 [Ko van der Sloot] * require C++17 * require latest ticcutils and libfolia * some code refactoring * improved GitHub CI * removed dependency on libtar 0.33 - 2024-04-26 [Ko van der Sloot] * --KANON was not always hounored * after a change in ucto, it was needed to reset the tokenizer more often * in the lemmatizer, fuzzy matching of CGN tags is implemented 0.32 - 2023-12-05 [Ko van der Sloot] * ignore but warn on empty derivations, a lame fix for https://github.com/LanguageMachines/frog/issues/103 0.31 - 2023-10-21 [Ko van der Sloot] * use ticcutils > 0.34. NFC normalizations is standard now * use Tokenizer::config_prefix() instead of magic string 'tokconfig-' * code cleanup and quality improvement (cppcheck is very useful) [Maarten van Gompel] * added frog demo gif 0.30 - 2023-05-08 [Ko van der Sloot] * finally fixed a major memory-leak in MBMA which bothered me for months * also some minor leaks are plugged 0.29 - 2023-05-05 [Ko van der Sloot] * added a fix for https://github.com/LanguageMachines/frog/issues/100 (where Frog created invalid FoLiA in a cornercase) * improved api_test * small code refactorings * require libfolia >= 2.15, for correct working of word correction * improved MWU code. Using Unicode strings and detecting MWU's with a starting Capital. [Maarten van Gompel] * .gitignore: added build dir 0.28 - 2023-02-28 [Ko van der Sloot] * We no longer accept FoLiA paragraphs with both Words and Sentences. [Maarten van Gompel] * Software metadata fix 0.27.2 - 2023-02-23 [Maarten van Gompel] * Minor software metadata fix only, no functional changes 0.27.1 - 2023-02-22 [Maarten van Gompel] * Software metadata update only, no functional changes 0.27 - 2023-02-22 [Ko van der Sloot] * Major Release. * Internally we always perform a 'deep' morphological analysis. * This information is used for XML and JSON output. * For the 'classic' Tabbed output, we maintain backward comptability. * You need to specify '--deep-morph' to get the deep analysis in the output. * You may also specify '--compounds' to get an extra column with compound information. Other changes: * C++ code quality * adapted to more recent Timbl implementations (Unicode awareness) * Tokenizer: - Better handling of --languages option. - 'und' is now also acceptable as a "language" - Better debugging possibility * Mbma: To many alternatives with Inverted Verbs were generated. As the Tagger doesn't help us directly, we filter on the person of the next word, and only return V/te2I when the next word is 2-nd person 0.26 - 2023-01-02 [Ko van der Sloot] * fix for https://github.com/LanguageMachines/frog/issues/96 * code improvements, readability and fixing CppCheck warnings * needs recent ticcutils (>=0.30) * needs newest Timbl (6.8) for more Unicode awarenes * updated GigHub action [Maarten van Gompel] * added MAINTAINERS file * updated codemeta.json 0.25 - 2022-07-22 [Maarten van Gompel] * updated metadata (codemeta.json) following new (proposed) CLARIAH requirements (CLARIAH/clariah-plus#38) * added builds-deps.sh for automatically building and installing dependencies * added Dockerfile and instructions * added support for user-based configuration dirs ($XDG_CONFIG_HOME/frog), takes precedence over global data dirs [Ko vd Sloot] * updated Doxygen config file 0.24 - 2021-12-15 [Ko vd Sloot] * start using the newest UTF8 aware Timbl and Mbt and Ucto * use NFC normalized UnicodeString more general internaly * added a fix in MBMA codng, to get better reproducable result on different OS/Compiler combinations * lots of small refactoring * bumped library version, because of some API changed [Maarten van Gompel] * merged a patch suggested by Helmut Grohne <helmut@subdivi.de> - configure.ac: Bug#993123: frog FTCBFS: hard codes the build architecture pkg-config Source: frog Version: 0.20-2 Tags: patch upstream User: debian-cross@lists.debian.org Usertags: ftcbfs frog fails to cross build from source, because configure.ac hard codes the build architecture pkg-config in one place (after correctly detecting the host architecture one). Simply using the correct substitution variable makes frog cross buildable. Please consider applying the attached patch. Helmut Signed-off-by: Maarten van Gompel <proycon@anaproy.nl> 0.23 - 2021 * requires libfolia 2.9 * replaced TravisCI by GitHub actions * fixed https://github.com/proycon/python-frog/issues/20 * some refactoring to avoid unneeded creation of files 0.22 - 2020-11-17 [Ko vd Sloot] * start using the tmp_stream() class form ticcutils 0.25 0.21 - 2020-07-22 [Ko van der Sloot] * Fixes a problem with temporary files not being cleaned up properly #92 0.20.1 - 2020-04-15 [Ko vd Sloot] Bug fix release. - added missing Doxygen.cfg to the tarball 0.20 - 2020-04-15 [Ko vd Sloot] * added Doxygen to the build * added a lot of comment in Doxygen format * adapted to the newest ticcutils version * adapted to latest libfolia * adapted to latest ucto * lots of code refactorings * implemented --JSONin option (server only) * implemented --JSONout option * added a --allow-word-correction option which allows ucto to correct FoLiA Word nodes [Iris Hendrix] Documentation updates 0.19.1 - 2019-11-15 [Ko vd Sloot] * fixed an overseen incompatability problem with the new libfolia. (https://github.com/proycon/tscan/issues/13) * removed dependency on MbtServer * Some documentation updates * improved using Alpino, using unique filenames now. 0.19 - 2019-10-21 [Ko vd Sloot] * added code to use al locally installed Alpino parser * added code to use a remote Alpino Server * added code to use (remote) timblservers and mbtservers for alle modules using JSON calls. Stil experimental. * several code refactoring and small fixes: - memory leaks - using NER files in non-standard locations - bug fixes for some corner cases. * frog.*.debug files are cleaned up after 1 day. 0.18.3 - 2019-07-22 [Ko vd Sloot] Bug fix release: * Fixes: - https://github.com/LanguageMachines/frog/issues/78 0.18.2 - 2019-07-15 [Ko vd Sloot] Bug fix release: * Fixes for: - https://github.com/LanguageMachines/frog/issues/75 - https://github.com/LanguageMachines/frog/issues/77 0.18.1 - 2019-06-19 [Ko vd Sloot] Bug fix release: "tabbed" output contained 1 tab to much when --skip=a was specified 0.18 - 2019-06-19 [Ko vd Sloot] Bug fixes and enhancements: * provenance uses new 'generate_id' option in libfolia:processor * solved problems when frogging partly tokenized FoLiA * solved problems when processing with --skip=t * small improvement in compound detection (still more to do...) 0.17 - 2019-05-29 [Ko vd Sloot] This release supports FoLiA 2.0 * some bug fixes - trust the tokenizer to get the default language - don't stumble upon empty sentences introduced by a non-breaking-space - provenance data is added for all the modules 0.16 - 2019-05-15 [Ko vd Sloot] This is the last release using pre FoLiA 2.0 It includes a total rework of the Frog Internals, aiming at better maintainability and hoping for a speedup and a smaller memory footprint. This work will continue in the upcoming release for FoLiA 2.0 Major Changes: * total rework. Not using a FoLiA document as the internal datastructure anymore but a FrogData structure. * use folia::engine for all FoLiA processing * -Q option is NOT supported anymore. It was unreliable anyway * builds on the newest ucto versions only * fix for https://github.com/proycon/LaMachine/issues//135 https://github.com/LanguageMachines/frog/issues/66 * handles some corner cases in FoLiA better * lots of code cleanup * numerous small fixes ( e.g. in NER and MBMA results) * improved working of --languages option * avoid invalid FoLiA: https://github.com/LanguageMachines/frog/issues/60 * fixed memory leaks * better handling of weird FoLiA [Maarten van Gompel] * added skeleton for new Frog documentation 0.15 - 2018-05-16 [Ko vd Sloot] * ucto_tokenizer_mod: removed call of (useless) ucto:setSentenceDetection(true) * fix to close the server when a socket fails * when frogging a file, and the docID is NOT specified, use the filename as the docID (filtering out non-NCName characters) * fix building the documentation from TeX files * a lot of small code improvements [Maarten van Gompel] * added codemeta.json * Fixed python-frog example in documentation (closes #48) 0.14 - 2018-02-19 [Ko vd Sloot] * use TiCC::UniFilter now * use TiCC::diacritics_filter now * configuration modernized. OSX build supported too * XML (FoLiA) files are autodetected * some more logging and time stamps added * added code to NER module to override original tags (e.g. from gazeteer) 0.13.9 - 2017-11-07 [Ko vd Sloot] Bug fix release, to get all our releases into balance. (Toad release requires 0.13.9) 0.13.8 - 2017-10-26 [Ko vd Sloot] * Now with new and enhanced NER and IOB chunker. (needs Frogdata >0.15) * added -t / --textredundancy option, which is passed to ucto * set textclass attributes on entities (folia 1.5 feature) * better textclass handling in general * multiple types of entities (setnames) are stored in different layers * some small provisions for 'multi word' words added. mblem may use them other modules just ignore them (seeing a multiword as multi words) * added --inpuclass and --outputclass options. (prefer over textclass) * added a --retry option, to redo complete directories, skipping what is done. * added a --nostdout option to suppress the tabbed output to stdout * refactoring and small fixes [Maarten van Gompel] * new --override option 0.13.7 - 2017-01-23 * Data files are now in share/ rather than etc/ (requires frogdata >= v0.13) 0.13.6 - 2017-01-05 [Ko van der Sloot] * rework done on compounding in MBMA. (still work in progress) * lots of improvement in MBMA rule handling. (but still work in progress) - support for 'glue' rules added. - support for 'hidden' morphemes added. - proper CELEX tags are outputted now in the XML - some structure labels have better names now * removed exit() calls from library modules (issue #17) * added languages option which is handled over to ucto too. - detect multiple languages - handle a selected language an ignore the rest 0.13.5 - 2016-09-13 * Added safeguards against faulty data * Added manpage for ner tool (issue #8) * Added some more compounding rules * Read and display frogdata version 0.13.4 - 2016-07-11 [Ko van der Sloot] - added long options --help and --version - interactive use is limited to TTY's only, so pipes from std in work - added a --language='name' option. it tries to read the configuration from a subdirectory with 'name' in the configdir The default is 'nl' - tokenizer timing is fixed at last - be robust agains a missing clex tag - better warning when OpenMP is not present - adaptation in mbma - added 2 convenience functions to FragAPI: get_full_morph_analysis() and get_compound_analysis() - CompoundType is now in it;s own namespace - some code refactoring, as usual 0.13.3 - 2016-03-10 [Ko van der Sloot] * Now based on libfolia 1.0 * lot of code refactoring * minor bug fixes 0.13.0 - 2015-09-28 [Ko van der Sloot] * moved repository to GitHub * added Travis support * First version without Python dependencies! The CSI parser is implemented in C++ * use more stuff from >ticcutils (BasicServer) * frog now runs on a minimal configurations too * a lot more stuff is configurable (te become more language independent) * added NER lookup from a file * mbma is improved. Doesn't have the "eer" and "ere" hacks anymore does hande C tags/inflections better better 'daring' mode adds some comopund info * fixed the mbma_prog and mblem_prog to run without a tagger or tokenizer * added a 'ner' commandline tool * a lot of smaller bug fixes and code refactoring 0.12.20 - 2015-01-29 [Ko van der Sloot] * release * fixed terrible bug in FrogServer (unitialized MWU could happen) 0.12.19 - 2014-12-01 [Ko van der Sloot] * release * some bug fixes for FoliA support 0.12.18 - 2014-09-16 * A true FrogAPI is added * depends on ticcutils 0.6 or above (for CommandLine mostly) * a lot of changes in the MBMA module. It now can produce nested morphemes using the --daring option of frog. Still experimental! * Frog can now run interactive with readline support too. * -t option is optional, multiple inputfiles are supported * -o works for multiple files * -d works better now (--debug even better) * added xml:id to Entities and Chunks * a lot off small bug fixes 0.12.17 - 2013-04-03 * the servermode now kan handle multiline input (non XML only). Can be switched off with the -n option. * A lot of refactoring regarding FoLiA stuff * start using ticcutils * the -Q option now works * added a --uttmarker option * added mbma and mblem programs * updated man pages 0.12.16 - 2013-02-19 * bug fix release. Some stuff was moved from Timbl to libticcutils 0.12.15 - 2012-03-29 * using the new Mbt 3.2.8 API for Tagging. The code is much simpler and less error-prone now. * depends on libfolia 0.9 * improved mblem module: <alt> tags for alternative readings, faster too * refactoctoring all over the place * We no longer ship the datafiles. Use frogdata package instead 0.12.14 - 2012-02-29 * NER was disabled. fixed 0.12.13 - 2012-02-27 * adapted to libfolia 0.8, which is more strict on set definitions * added an IOB NER * added a --max-parser-tokens option * the mwu list is reduced a lot (by AntalB) * fixed a threading problem * code refactoring continues 0.12.12 - 2012-02-09 [Ko vd Sloot] * fixed stupid error in frog-dp-update script * added a manpage for that script 0.12.11 - 2012-02-09 [Ko vd Sloot] * added a simple script: frog-dp-update.sh this installs the DP config and a full functional frog.cfg * some small cleanups in iob-chunker code. * added newest chunker configuration, now also gives confidence values * fixed some problems with iob chunker 0.12.10 - 2012-02-06 [Ko vd Sloot] * now frog ships with small-frog.cfg. Larger config is to be distributed separately from the DPconfig directory. * made debug handling better and the same for all modules * added IOB chunker * added -x option * added --xmldir option * fixed --skip=t. It was totally ignored!? * removed the 0.12.2 patch. TimblServer solves it now. * fixed problem with tags containing // * update usage() * updated man page 0.12.9 - 2012-01-12 [Ko vd Sloot] * fixed threading problems. * split very long function into 2 parts [Maarten van Gompel] * when in servermode, set_omp_num_threads(1). Otherwise every call to the server would start extra threads. 0.12.8 - 2012-01-10 [Ko vd Sloot] * fixed argument escaping problem when calling libfolia [ Maarten van Gompel ] * fixed a typo in cgn_tagger_mod 0.12.7 - 2012-01-05 * fixed compilation on GNU/Hurd * temporary parserfiles get a unique name now (using the pid). 0.12.6 - 2011-12-22 * merged with the foliabased branch. So now we use the folia based Tokenizer. folia XML is now the main interface between the modules in Frog. 0.12.5 - 2011-10-10 * is released 0.12.4 - ?? * missing in action 0.12.3 - 2011-08-23 [Ko vd Sloot] * added a column for confidence. Needs the most recent Timbl and Mbt! * changed the behaviour of the -Q option. (adapt to ucto 0.4.7) [ Maarten van Gompel ] * moved nasty sentence per line patch away, support now in ucto itself 0.12.2 - 2011-04-19 [ Maarten van Gompel ] * fixed max read buffer (2048 byte) problem in server mode 0.12.1 - 2011-04-18 [ Ko vd Sloot ] * added a fixed mbma.igtree file * better reaction when startup fails. Try to bail out asap. 0.12.0 - 2011-03-21 { Ko vd Sloot ] * decapped progs and man 0.11.1 - 2011-03-20 [ Joost van Baal ] * NEWS: record changes and releases * docs/Frog.1, docs/Makefile.am: add Frog.1 .so link: consistent with name of binary 0.11.0 - 2011-03-17 [ Ko van der Sloot ] * Reworked mblem and mbma: less dependant on tagger results * minor fixes * more stuff is handled in parallel (work in progress) * docs/frog.1: added a man page [ Antal van den Bosch ] * config/mblem.tree: "weesten" issue * config/mblem.tree: let's hope this tree file reverts to the correct encoding situation * config/mblem.tree, config/mblem.tree.wgt: fixed "emmen" and "vrienden" errors 0.10.4 - 2011-03-01 [ Ko van der Sloot ] * configure.ac: We need the most recent ucto! * Makefile.am: now bootstrap works [ Maarten van Gompel ] * src/Frog-util.cxx: when using testdir, ignore hidden files (dotfiles) * src/Frog.cxx: sentence per line on input side * src/Frog.cxx: prettier help output 0.10.3 - 2011-02-27 [ Joost van Baal ] * scripts/Makefile.am, scripts/pylet/{data,util}/Makefile.am: Minor changes to make life easier for software packagers. Install python code and compiled python in sane locations. [ Ko van der Sloot ] * src/Frog.cxx: added a '-n' option to do 'one sentence per line' tokenizing. 0.10.2 - 2011-02-13 [ Joost van Baal ] * Minor changes to make life easier for software packagers. 0.10.1 - 2011-02-12 [ Joost van Baal ] * configure.ac: merge frog-ng patch to deal with unavailable icu 4.6 [ Ko van der Sloot ] * configure.ac: we need ucto >= 0.3.6 * src/Frog.cxx: added -e option to set the encoding * configure.ac: bumped version, now uses ucto-icu.pc [ Antal van den Bosch ] * config/Frog.mwu.1.0, config/Makefile.am, config/frog.cfg, config/mwu.suspects5: moved to a more comprehensive MWU file based on the Lassy + Alpino treebanks 0.9.3 - 2010-01-26 [ Ko van der Sloot ] * New release [ Maarten van Gompel ] * 2010-08-30 Added paragraph detection, added beginofsentence role, restyled view of usage options, implemented '--stok' mode for tokenisation (one sentence per line), roles are now shown explicitly in verbose tokeniser output. (0.6) * 2010-08-17 Improved server mode, without intermediate files (new tokeniser only) (0.5) * 2010-05-11 Integrated new tokeniser (from ucto) (0.3?) 2008-06-01 Ko vd Sloot Source is moved to SVN 2007-12-03 Ko vd Sloot Finished packaging. 2007-10-09 Started packaging