Skip to content
Snippets Groups Projects
NEWS 17.8 KiB
Newer Older
0.34 - 2024-04-26
[Ko van der Sloot]
* require C++17
* require latest ticcutils and libfolia
* some code refactoring
* improved GitHub CI
* removed dependency on libtar

0.33 - 2024-04-26
[Ko van der Sloot]
* --KANON was not always hounored
* after a change in ucto, it was needed to reset the tokenizer more often
* in the lemmatizer, fuzzy matching of CGN tags is implemented

0.32 - 2023-12-05
[Ko van der Sloot]
* ignore but warn on empty derivations, a lame fix for https://github.com/LanguageMachines/frog/issues/103

0.31 - 2023-10-21
[Ko van der Sloot]
* use ticcutils > 0.34. NFC normalizations is standard now
* use Tokenizer::config_prefix() instead of magic string 'tokconfig-'
* code cleanup and quality improvement (cppcheck is very useful)

[Maarten van Gompel]
* added frog demo gif

0.30 - 2023-05-08
[Ko van der Sloot]
* finally fixed a major memory-leak in MBMA which bothered me for months
* also some minor leaks are plugged

0.29 - 2023-05-05
[Ko van der Sloot]
* added a fix for https://github.com/LanguageMachines/frog/issues/100
  (where Frog created invalid FoLiA in a cornercase)
* improved api_test
* small code refactorings
* require libfolia >= 2.15, for correct working of word correction
* improved MWU code. Using Unicode strings and detecting MWU's with a starting
  Capital.
[Maarten van Gompel]
* .gitignore: added build dir

0.28 - 2023-02-28
[Ko van der Sloot]
*  We no longer accept FoLiA paragraphs with both Words and Sentences.

[Maarten van Gompel]
* Software metadata fix

0.27.2 - 2023-02-23
[Maarten van Gompel]
* Minor software metadata fix only, no functional changes

0.27.1 - 2023-02-22
[Maarten van Gompel]
* Software metadata update only, no functional changes

0.27 - 2023-02-22
[Ko van der Sloot]
* Major Release.
* Internally we always perform a 'deep' morphological analysis.
* This information is used for XML and JSON output.
* For the 'classic' Tabbed output, we maintain backward comptability.
* You need to specify '--deep-morph' to get the deep analysis in the output.
* You may also specify '--compounds' to get an extra column with compound
  information.

Other changes:
* C++ code quality
* adapted to more recent Timbl implementations (Unicode awareness)
* Tokenizer:
  - Better handling of --languages option.
  - 'und' is now also acceptable as a "language"
  - Better debugging possibility
* Mbma: To many alternatives with Inverted Verbs were generated. As the
        Tagger doesn't help us directly, we filter on the person of the next
	word, and only return V/te2I when the next word is 2-nd person

0.26 - 2023-01-02
[Ko van der Sloot]
* fix for https://github.com/LanguageMachines/frog/issues/96
* code improvements, readability and fixing CppCheck warnings
* needs recent ticcutils (>=0.30)
* needs newest Timbl (6.8) for more Unicode awarenes
* updated GigHub action

[Maarten van Gompel]
* added MAINTAINERS file
* updated codemeta.json

0.25 - 2022-07-22

[Maarten van Gompel]
* updated metadata (codemeta.json) following new (proposed) CLARIAH requirements (CLARIAH/clariah-plus#38)
* added builds-deps.sh for automatically building and installing dependencies
* added Dockerfile and instructions
* added support for user-based configuration dirs ($XDG_CONFIG_HOME/frog), takes precedence over global data dirs

[Ko vd Sloot]
* updated Doxygen config file

0.24 - 2021-12-15
[Ko vd Sloot]
* start using the newest UTF8 aware Timbl and Mbt and Ucto
* use NFC normalized UnicodeString more general internaly
* added a fix in MBMA codng, to get better reproducable result on different
  OS/Compiler combinations
* lots of small refactoring
* bumped library version, because of some API changed

[Maarten van Gompel]
* merged a patch suggested by Helmut Grohne <helmut@subdivi.de>
  - configure.ac: Bug#993123: frog FTCBFS: hard codes the build
    architecture pkg-config Source: frog Version: 0.20-2
      Tags: patch upstream User:
    debian-cross@lists.debian.org Usertags:
      ftcbfs frog fails to cross build from source, because configure.ac hard
      codes the build architecture pkg-config in one place (after
      correctly detecting the host architecture one). Simply using the
      correct substitution variable makes frog cross buildable. Please
      consider applying the attached patch.
        Helmut Signed-off-by: Maarten van Gompel <proycon@anaproy.nl>

0.23 - 2021
* requires libfolia 2.9
* replaced TravisCI by GitHub actions
* fixed https://github.com/proycon/python-frog/issues/20
* some refactoring to avoid unneeded creation of files

0.22 - 2020-11-17
[Ko vd Sloot]
* start using the tmp_stream() class form ticcutils 0.25

0.21 - 2020-07-22
[Ko van der Sloot]
* Fixes a problem with temporary files not being cleaned up properly #92

0.20.1 - 2020-04-15
[Ko vd Sloot]
Bug fix release.
- added missing Doxygen.cfg to the tarball

0.20 - 2020-04-15
[Ko vd Sloot]
* added Doxygen to the build
* added a lot of comment in Doxygen format
* adapted to the newest ticcutils version
* adapted to latest libfolia
* adapted to latest ucto
* lots of code refactorings
* implemented --JSONin option (server only)
* implemented --JSONout option
* added a --allow-word-correction option which allows ucto to correct FoLiA
  Word nodes

[Iris Hendrix]
Documentation updates

0.19.1 - 2019-11-15
[Ko vd Sloot]
* fixed an overseen incompatability problem with the new libfolia.
   (https://github.com/proycon/tscan/issues/13)
* removed dependency on MbtServer
* Some documentation updates
* improved using Alpino, using unique filenames now.

0.19 - 2019-10-21
[Ko vd Sloot]
* added code to use al locally installed Alpino parser
* added code to use a remote Alpino Server
* added code to use (remote) timblservers and mbtservers for alle modules
  using JSON calls. Stil experimental.
* several code refactoring and small fixes:
  - memory leaks
  - using NER files in non-standard locations
  - bug fixes for some corner cases.

* frog.*.debug files  are cleaned up after 1 day.

0.18.3 - 2019-07-22
[Ko vd Sloot]
Bug fix release:
* Fixes:
  - https://github.com/LanguageMachines/frog/issues/78

0.18.2 - 2019-07-15
[Ko vd Sloot]
Bug fix release:
* Fixes for:
  - https://github.com/LanguageMachines/frog/issues/75
  - https://github.com/LanguageMachines/frog/issues/77

0.18.1 - 2019-06-19
[Ko vd Sloot]
Bug fix release:
"tabbed" output contained 1 tab to much when --skip=a was specified

0.18 - 2019-06-19
[Ko vd Sloot]
Bug fixes and enhancements:
* provenance uses new 'generate_id' option in libfolia:processor
* solved problems when frogging partly tokenized FoLiA
* solved problems when processing with --skip=t
* small improvement in compound detection (still more to do...)

0.17 - 2019-05-29
[Ko vd Sloot]
This release supports FoLiA 2.0
* some bug fixes
  - trust the tokenizer to get the default language
  - don't stumble upon empty sentences introduced by a non-breaking-space
  - provenance data is added for all the modules

0.16 - 2019-05-15
[Ko vd Sloot]
This is the last release using pre FoLiA 2.0
It includes a total rework of the Frog Internals, aiming at better
maintainability and hoping for a speedup and a smaller memory footprint.
This work will continue in the upcoming release for FoLiA 2.0

Major Changes:
* total rework. Not using a FoLiA document as the internal datastructure anymore
  but a FrogData structure.
* use folia::engine for all FoLiA processing
* -Q option is NOT supported anymore. It was unreliable anyway
* builds on the newest ucto versions only
* fix for https://github.com/proycon/LaMachine/issues//135
          https://github.com/LanguageMachines/frog/issues/66
* handles some corner cases in FoLiA better
* lots of code cleanup
* numerous small fixes ( e.g. in NER and MBMA results)
* improved working of --languages option
* avoid invalid FoLiA: https://github.com/LanguageMachines/frog/issues/60
* fixed memory leaks
* better handling of weird FoLiA

[Maarten van Gompel]
* added skeleton for new Frog documentation


0.15 - 2018-05-16
[Ko vd Sloot]
* ucto_tokenizer_mod: removed call of (useless) ucto:setSentenceDetection(true)
* fix to close the server when a socket fails
* when frogging a file, and the docID is NOT specified, use the filename as
  the docID (filtering out non-NCName characters)
* fix building the documentation from TeX files
* a lot of small code improvements

[Maarten van Gompel]
* added codemeta.json
* Fixed python-frog example in documentation (closes #48)

0.14 - 2018-02-19
[Ko vd Sloot]
* use TiCC::UniFilter now
* use TiCC::diacritics_filter now
* configuration modernized. OSX build supported too
* XML (FoLiA) files are autodetected
* some more logging and time stamps added
* added code to NER module to override original tags (e.g. from gazeteer)

0.13.9 - 2017-11-07
[Ko vd Sloot]
Bug fix release, to get all our releases into balance. (Toad release
requires 0.13.9)

0.13.8 - 2017-10-26
[Ko vd Sloot]
* Now with new and enhanced NER and IOB chunker. (needs Frogdata >0.15)
* added -t / --textredundancy option, which is passed to ucto
* set textclass attributes on entities (folia 1.5 feature)
* better textclass handling in general
* multiple types of entities (setnames) are stored in different layers
* some small provisions for 'multi word' words added. mblem may use them
   other modules just ignore them (seeing a multiword as multi words)
* added --inpuclass and --outputclass options. (prefer over textclass)
* added a --retry option, to redo complete directories, skipping what is done.
* added a --nostdout option to suppress the tabbed output to stdout
* refactoring and small fixes

[Maarten van Gompel]
* new --override option

0.13.7 - 2017-01-23
* Data files are now in share/ rather than etc/ (requires frogdata >= v0.13)

0.13.6 - 2017-01-05
[Ko van der Sloot]
* rework done on compounding in MBMA. (still work in progress)
* lots of improvement in MBMA rule handling. (but still work in progress)
  - support for 'glue' rules added.
  - support for 'hidden' morphemes added.
  - proper CELEX tags are outputted now in the XML
  - some structure labels have better names now
* removed exit() calls from library modules (issue #17)
* added languages option which is handled over to ucto too.
  - detect multiple languages
  - handle a selected language an ignore the rest

0.13.5 - 2016-09-13
* Added safeguards against faulty data
* Added manpage for ner tool (issue #8)
* Added some more compounding rules
* Read and display frogdata version

0.13.4 - 2016-07-11
[Ko van der Sloot]
- added long options --help and --version
- interactive use is limited to TTY's only, so pipes from std in work
- added a --language='name' option. it tries to read the configuration from
  a subdirectory with 'name' in the configdir
  The default is 'nl'
- tokenizer timing is fixed at last
- be robust agains a missing clex tag
- better warning when OpenMP is not present
- adaptation in mbma
- added 2 convenience functions to FragAPI:
    get_full_morph_analysis() and
    get_compound_analysis()
- CompoundType is now in it;s own namespace
- some code refactoring, as usual

0.13.3 - 2016-03-10
[Ko van der Sloot]
* Now based on libfolia 1.0
* lot of code refactoring
* minor bug fixes

0.13.0 - 2015-09-28
[Ko van der Sloot]
* moved repository to GitHub
* added Travis support
* First version without Python dependencies!
  The CSI parser is implemented in C++
* use more stuff from >ticcutils (BasicServer)
* frog now runs on a minimal configurations too
* a lot more stuff is configurable (te become more language independent)
* added NER lookup from a file
* mbma is improved.
   Doesn't have the "eer" and "ere" hacks anymore
   does hande C tags/inflections better
   better 'daring' mode
   adds some comopund info
* fixed the mbma_prog and mblem_prog to run without a tagger or tokenizer
* added a 'ner' commandline tool
* a lot of smaller bug fixes and code refactoring

0.12.20 - 2015-01-29
[Ko van der Sloot]
 * release
 * fixed terrible bug in FrogServer (unitialized MWU could happen)

0.12.19 - 2014-12-01
[Ko van der Sloot]
 * release
 * some bug fixes for FoliA support

0.12.18 - 2014-09-16
 * A true FrogAPI is added
 * depends on ticcutils 0.6 or above (for CommandLine mostly)
 * a lot of changes in the MBMA module. It now can produce nested morphemes
     using the --daring option of frog. Still experimental!
 * Frog can now run interactive with readline support too.
 * -t option is optional, multiple inputfiles are supported
 * -o works for multiple files
 * -d works better now (--debug even better)
 * added xml:id to Entities and Chunks
 * a lot off small bug fixes
0.12.17 - 2013-04-03
 * the servermode now kan handle multiline input (non XML only).
   Can be switched off with the -n option.
 * A lot of refactoring regarding FoLiA stuff
 * start using ticcutils
 * the -Q option now works
 * added a --uttmarker option
 * added mbma and mblem programs
 * updated man pages

0.12.16 - 2013-02-19
 * bug fix release. Some stuff was moved from Timbl to libticcutils

0.12.15 - 2012-03-29
 * using the new Mbt 3.2.8 API for Tagging.
   The code is much simpler and less error-prone now.
 * depends on libfolia 0.9
 * improved mblem module: <alt> tags for alternative readings, faster too
 * refactoctoring all over the place
 * We no longer ship the datafiles. Use frogdata package instead

0.12.14 - 2012-02-29
 * NER was disabled. fixed

0.12.13 - 2012-02-27
 * adapted to libfolia 0.8, which is more strict on set definitions
 * added an IOB NER
 * added a --max-parser-tokens option
 * the mwu list is reduced a lot (by AntalB)
 * fixed a threading problem
 * code refactoring continues

0.12.12 - 2012-02-09
[Ko vd Sloot]
  * fixed stupid error in frog-dp-update script
  * added a manpage for that script

0.12.11 - 2012-02-09

[Ko vd Sloot]
  * added a simple script: frog-dp-update.sh
    this installs the DP config and a full functional frog.cfg
  * some small cleanups in iob-chunker code.
  * added newest chunker configuration, now also gives confidence values
  * fixed some problems with iob chunker

0.12.10 - 2012-02-06
[Ko vd Sloot]
  * now frog ships with small-frog.cfg. Larger config is to be distributed
    separately from the DPconfig directory.
  * made debug handling better and the same for all modules
  * added IOB chunker
  * added -x option
  * added --xmldir option
  * fixed --skip=t. It was totally ignored!?
  * removed the 0.12.2 patch. TimblServer solves it now.
  * fixed problem with tags containing //
  * update usage()
  * updated man page

0.12.9 - 2012-01-12
[Ko vd Sloot]
  * fixed threading problems.
  * split very long function into 2 parts

[Maarten van Gompel]
  * when in servermode, set_omp_num_threads(1). Otherwise every call
   to the server would start extra threads.

0.12.8 - 2012-01-10
[Ko vd Sloot]
  * fixed argument escaping problem when calling libfolia
[ Maarten van Gompel ]
  * fixed a typo in cgn_tagger_mod

0.12.7 - 2012-01-05
  * fixed compilation on GNU/Hurd
  * temporary parserfiles get a unique name now (using the pid).

0.12.6 - 2011-12-22
  * merged with the foliabased branch. So now we use the folia based Tokenizer.
    folia XML is now the main interface between the modules in Frog.

0.12.5 - 2011-10-10
  * is released

0.12.4 - ??
  * missing in action

0.12.3 - 2011-08-23
[Ko vd Sloot]
* added a column for confidence. Needs the most recent Timbl and Mbt!
* changed the behaviour of the -Q option. (adapt to ucto 0.4.7)

[ Maarten van Gompel ]
* moved nasty sentence per line patch away, support now in ucto itself

0.12.2 - 2011-04-19
[ Maarten van Gompel ]
* fixed max read buffer (2048 byte) problem in server mode

0.12.1 - 2011-04-18
[ Ko vd Sloot ]
* added a fixed mbma.igtree file
* better reaction when startup fails. Try to bail out asap.

0.12.0 - 2011-03-21
{ Ko vd Sloot ]
* decapped progs and man

0.11.1 - 2011-03-20

[ Joost van Baal ]
* NEWS: record changes and releases
* docs/Frog.1, docs/Makefile.am: add Frog.1 .so link: consistent with name of
  binary


0.11.0 - 2011-03-17

[ Ko van der Sloot ]
* Reworked mblem and mbma: less dependant on tagger results
* minor fixes
* more stuff is handled in parallel (work in progress)
* docs/frog.1: added a man page

[ Antal van den Bosch ]
* config/mblem.tree: "weesten" issue
* config/mblem.tree: let's hope this tree file reverts to the correct encoding
  situation
* config/mblem.tree, config/mblem.tree.wgt: fixed "emmen" and "vrienden" errors


0.10.4 - 2011-03-01

[ Ko van der Sloot ]
* configure.ac: We need the most recent ucto!
* Makefile.am: now bootstrap works

[ Maarten van Gompel ]
* src/Frog-util.cxx: when using testdir, ignore hidden files (dotfiles)
* src/Frog.cxx: sentence per line on input side
* src/Frog.cxx: prettier help output


0.10.3 - 2011-02-27

[ Joost van Baal ]
* scripts/Makefile.am, scripts/pylet/{data,util}/Makefile.am: Minor changes
  to make life easier for software packagers.  Install python code and
  compiled python in sane locations.

[ Ko van der Sloot ]
* src/Frog.cxx: added a '-n' option to do 'one sentence per line' tokenizing.


0.10.2 - 2011-02-13

[ Joost van Baal ]
* Minor changes to make life easier for software packagers.


0.10.1 - 2011-02-12

[ Joost van Baal ]
* configure.ac: merge frog-ng patch to deal with unavailable icu 4.6

[ Ko van der Sloot ]
* configure.ac: we need ucto >= 0.3.6
* src/Frog.cxx: added -e option to set the encoding
* configure.ac: bumped version, now uses ucto-icu.pc

[ Antal van den Bosch ]
* config/Frog.mwu.1.0, config/Makefile.am, config/frog.cfg,
  config/mwu.suspects5: moved to a more comprehensive MWU file
  based on the Lassy + Alpino treebanks


0.9.3 - 2010-01-26

[ Ko van der Sloot ]
* New release

[ Maarten van Gompel ]
* 2010-08-30 Added paragraph detection, added beginofsentence role, restyled
  view of usage options, implemented '--stok' mode for tokenisation (one
  sentence per line), roles are now shown explicitly in verbose tokeniser
  output.  (0.6)
* 2010-08-17 Improved server mode, without intermediate files (new tokeniser
  only) (0.5)
* 2010-05-11 Integrated new tokeniser (from ucto) (0.3?)


2008-06-01 Ko vd Sloot
	   Source is moved to SVN
2007-12-03 Ko vd Sloot
	   Finished packaging.
2007-10-09 Started packaging