Commit 842a042c authored by Hideki Yamane's avatar Hideki Yamane 🐈

Imported Upstream version 0.6.3.b-20111013

parents
Masayuki Asahara <masayu-a -@- is.naist.jp>
Copyright (c) 2009, Nara Institute of Science and Technology, Japan.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
Neither the name of the Nara Institute of Science and Technology
(NAIST) nor the names of its contributors may be used to endorse or
promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2010-07-30 Asahara Masayuki
* しばらく、コーパス修正だけでどの程度改善できるか、がんばってみます。
Basic Installation
==================
These are generic installation instructions.
The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses
those values to create a `Makefile' in each directory of the package.
It may also create one or more `.h' files containing system-dependent
definitions. Finally, it creates a shell script `config.status' that
you can run in the future to recreate the current configuration, a file
`config.cache' that saves the results of its tests to speed up
reconfiguring, and a file `config.log' containing compiler output
(useful mainly for debugging `configure').
If you need to do unusual things to compile the package, please try
to figure out how `configure' could check whether to do them, and mail
diffs or instructions to the address given in the `README' so they can
be considered for the next release. If at some point `config.cache'
contains results you don't want to keep, you may remove or edit it.
The file `configure.in' is used to create `configure' by a program
called `autoconf'. You only need `configure.in' if you want to change
it or regenerate `configure' using a newer version of `autoconf'.
The simplest way to compile this package is:
1. `cd' to the directory containing the package's source code and type
`./configure' to configure the package for your system. If you're
using `csh' on an old version of System V, you might need to type
`sh ./configure' instead to prevent `csh' from trying to execute
`configure' itself.
Running `configure' takes awhile. While running, it prints some
messages telling which features it is checking for.
2. Type `make' to compile the package.
3. Optionally, type `make check' to run any self-tests that come with
the package.
4. Type `make install' to install the programs and any data files and
documentation.
5. You can remove the program binaries and object files from the
source code directory by typing `make clean'. To also remove the
files that `configure' created (so you can compile the package for
a different kind of computer), type `make distclean'. There is
also a `make maintainer-clean' target, but that is intended mainly
for the package's developers. If you use it, you may have to get
all sorts of other programs in order to regenerate files that came
with the distribution.
Compilers and Options
=====================
Some systems require unusual options for compilation or linking that
the `configure' script does not know about. You can give `configure'
initial values for variables by setting them in the environment. Using
a Bourne-compatible shell, you can do that on the command line like
this:
CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
Or on systems that have the `env' program, you can do it like this:
env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure
Compiling For Multiple Architectures
====================================
You can compile the package for more than one kind of computer at the
same time, by placing the object files for each architecture in their
own directory. To do this, you must use a version of `make' that
supports the `VPATH' variable, such as GNU `make'. `cd' to the
directory where you want the object files and executables to go and run
the `configure' script. `configure' automatically checks for the
source code in the directory that `configure' is in and in `..'.
If you have to use a `make' that does not supports the `VPATH'
variable, you have to compile the package for one architecture at a time
in the source code directory. After you have installed the package for
one architecture, use `make distclean' before reconfiguring for another
architecture.
Installation Names
==================
By default, `make install' will install the package's files in
`/usr/local/bin', `/usr/local/man', etc. You can specify an
installation prefix other than `/usr/local' by giving `configure' the
option `--prefix=PATH'.
You can specify separate installation prefixes for
architecture-specific files and architecture-independent files. If you
give `configure' the option `--exec-prefix=PATH', the package will use
PATH as the prefix for installing programs and libraries.
Documentation and other data files will still use the regular prefix.
In addition, if you use an unusual directory layout you can give
options like `--bindir=PATH' to specify different values for particular
kinds of files. Run `configure --help' for a list of the directories
you can set and what kinds of files go in them.
If the package supports it, you can cause programs to be installed
with an extra prefix or suffix on their names by giving `configure' the
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
Optional Features
=================
Some packages pay attention to `--enable-FEATURE' options to
`configure', where FEATURE indicates an optional part of the package.
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
is something like `gnu-as' or `x' (for the X Window System). The
`README' should mention any `--enable-' and `--with-' options that the
package recognizes.
For packages that use the X Window System, `configure' can usually
find the X include and library files automatically, but if it doesn't,
you can use the `configure' options `--x-includes=DIR' and
`--x-libraries=DIR' to specify their locations.
Specifying the System Type
==========================
There may be some features `configure' can not figure out
automatically, but needs to determine by the type of host the package
will run on. Usually `configure' can figure that out, but if it prints
a message saying it can not guess the host type, give it the
`--host=TYPE' option. TYPE can either be a short name for the system
type, such as `sun4', or a canonical name with three fields:
CPU-COMPANY-SYSTEM
See the file `config.sub' for the possible values of each field. If
`config.sub' isn't included in this package, then this package doesn't
need to know the host type.
If you are building compiler tools for cross-compiling, you can also
use the `--target=TYPE' option to select the type of system they will
produce code for and the `--build=TYPE' option to select the type of
system on which you are compiling the package.
Sharing Defaults
================
If you want to set default values for `configure' scripts to share,
you can create a site shell script called `config.site' that gives
default values for variables like `CC', `cache_file', and `prefix'.
`configure' looks for `PREFIX/share/config.site' if it exists, then
`PREFIX/etc/config.site' if it exists. Or, you can set the
`CONFIG_SITE' environment variable to the location of the site script.
A warning: not all `configure' scripts look for a site script.
Operation Controls
==================
`configure' recognizes the following options to control how it
operates.
`--cache-file=FILE'
Use and save the results of the tests in FILE instead of
`./config.cache'. Set FILE to `/dev/null' to disable caching, for
debugging `configure'.
`--help'
Print a summary of the options to `configure', and exit.
`--quiet'
`--silent'
`-q'
Do not print messages saying which checks are being made. To
suppress all normal output, redirect it to `/dev/null' (any error
messages will still be shown).
`--srcdir=DIR'
Look for the package's source code in directory DIR. Usually
`configure' can determine that directory automatically.
`--version'
Print the version of Autoconf used to generate the `configure'
script, and exit.
`configure' also accepts some other, not widely useful, options.
mecab_dict_index = @MECAB_DICT_INDEX@
dicdir = @MECAB_DICDIR@
# SUBDIRS = doc script
dic_DATA = @MECAB_GENDATA@ @MECAB_LEXICAL_DIC@ @MECAB_PREDATA@
EXTRA_DIST = @MECAB_LEXICAL_DIC@ @MECAB_PREDATA@ RESULT
CLEANFILES = @MECAB_GENDATA@
@MECAB_GENDATA@:
$(mecab_dict_index) -d . -o . -f EUC-JP -t @CHARSET@
@echo To enable dictionary, rewrite @MECAB_MECABRC@ as \"dicdir = @MECAB_DICDIR@\"
rpm: dist
rpm -ta @PACKAGE@-@VERSION@.tar.gz
export-package:
./upload.pl -p mecab -n @PACKAGE@ -r @VERSION@ -f @PACKAGE@-@VERSION@.tar.gz
install-exec-hook:
if ! [ -d $(DESTDIR)/etc/mecab/dic/naist-jdic ]; \
then mkdir -p $(DESTDIR)/etc/mecab/dic/naist-jdic; \
fi
if ! [ -f $(DESTDIR)/etc/mecab/dic/naist-jdic/dicrc ]; \
then $(LN_S) @MECAB_DICDIR@/dicrc $(DESTDIR)/etc/mecab/dic/naist-jdic/dicrc; \
fi
This diff is collapsed.
2011-10-13: masayua -at- gmail.com
version 0.6.3b
patch from Osamu Aoki
2010-08-01: masayu-a -at- is.naist.jp
version 0.6.3
2010-01-31: masayu-a -at- is.naist.jp
version 0.6.2
2009-06-30: masayu-a -at- is.naist.jp
version 0.6.1
0.6.0 のパッケージの不具合の修正
2009-06-16: masayu-a -at- is.naist.jp
version 0.6.0
0.6.0 より出力形式で第11フィールド(CSVの辞書ファイルでは第15フィールド)に
複合語の情報を追加しました。自動付与されたものではなく人手によるものです。
<w orth="くろみがかる" form="クロミガカル" pos="動詞-自立" ctype="五段・ラ行" cform="基本形" ><w orth="くろみ" form="クロミ" pos="名詞-一般" ctype="" cform="" ><w orth="くろ" form="クロ" pos="名詞-一般" ctype="" cform="" >くろ</w><w orth="み" form="ミ" pos="名詞-接尾-一般" ctype="" cform="" >み</w></w><w orth="がかる" form="ガカル" pos="動詞-接尾" ctype="五段・ラ行" cform="基本形" >がかる</w></w>
尚、固有名詞には原則複合語情報を付与していません。
2009-05-12: masayu-a -at- is.naist.jp
version 0.5.0
0.5.0 より出力形式で第10フィールド(CSVの辞書ファイルでは第14フィールド)に
表記ゆれの情報を追加しました。自動付与されたものではなく人手によるものです。
国語研の「表記統合辞書」の第5フィールドに相当する情報を追加しております。
http://www.kokken.go.jp/lrc/index.php?%A1%D8%C9%BD%B5%AD%C5%FD%B9%E7%BC%AD%BD%F1%A1%D9%2F%CD%F8%CD%D1%A5%DE%A5%CB%A5%E5%A5%A2%A5%EB
尚、固有名詞には表記ゆれ情報を付与していません。
http://sourceforge.jp/projects/naist-jdic/
http://mecab.sourceforge.net/
This diff is collapsed.
#
# Japanese charcter category map
#
# $Id: char.def,v 1.4 2006/07/05 16:54:13 taku-ku Exp $;
#
###################################################################################
#
# CHARACTER CATEGORY DEFINITION
#
# CATEGORY_NAME INVOKE GROUP LENGTH
#
# - CATEGORY_NAME: Name of category. you have to define DEFAULT class.
# - INVOKE: 1/0: always invoke unknown word processing, evan when the word can be found in the lexicon
# - GROUP: 1/0: make a new word by grouping the same chracter category
# - LENGTH: n: 1 to n length new words are added
#
DEFAULT 0 1 0 # DEFAULT is a mandatory category!
SPACE 0 1 0
KANJI 0 0 2
SYMBOL 1 1 0
NUMERIC 1 1 0
ALPHA 1 1 0
HIRAGANA 0 1 2
KATAKANA 1 1 2
KANJINUMERIC 1 1 0
GREEK 1 1 0
CYRILLIC 1 1 0
###################################################################################
#
# CODE(UCS2) TO CATEGORY MAPPING
#
# SPACE
0x0020 SPACE # DO NOT REMOVE THIS LINE, 0x0020 is reserved for SPACE
0x000D SPACE # CR
0x0009 SPACE # HT
0x000B SPACE # VT
0x000A SPACE # LF
# ASCII
0x0021..0x002F SYMBOL
0x0030..0x0039 NUMERIC
0x003A..0x0040 SYMBOL
0x0041..0x005A ALPHA
0x005B..0x0060 SYMBOL
0x0061..0x007A ALPHA
0x007B..0x007E SYMBOL
# Latin
0x00A1..0x00BF SYMBOL # Latin 1
0x00C0..0x00FF ALPHA # Latin 1
0x0100..0x017F ALPHA # Latin Extended A
0x0180..0x0236 ALPHA # Latin Extended B
0x1E00..0x1EF9 ALPHA # Latin Extended Additional
# CYRILLIC
0x0400..0x04F9 CYRILLIC
0x0500..0x050F CYRILLIC # Cyrillic supplementary
# GREEK
0x0374..0x03FB GREEK # Greek and Coptic
# HIRAGANA
0x3041..0x309F HIRAGANA
# KATAKANA
0x30A1..0x30FF KATAKANA
0x31F0..0x31FF KATAKANA # Small KU .. Small RO
# 0x30FC KATAKANA HIRAGANA # ー
0x30FC KATAKANA
# Half KATAKANA
0xFF66..0xFF9D KATAKANA
0xFF9E..0xFF9F KATAKANA
# KANJI
0x2E80..0x2EF3 KANJI # CJK Raidcals Supplement
0x2F00..0x2FD5 KANJI
0x3005 KANJI
0x3007 KANJI
0x3400..0x4DB5 KANJI # CJK Unified Ideographs Extention
0x4E00..0x9FA5 KANJI
0xF900..0xFA2D KANJI
0xFA30..0xFA6A KANJI
# KANJI-NUMERIC (一 二 三 四 五 六 七 八 九 十 百 千 万 億 兆)
0x4E00 KANJINUMERIC KANJI
0x4E8C KANJINUMERIC KANJI
0x4E09 KANJINUMERIC KANJI
0x56DB KANJINUMERIC KANJI
0x4E94 KANJINUMERIC KANJI
0x516D KANJINUMERIC KANJI
0x4E03 KANJINUMERIC KANJI
0x516B KANJINUMERIC KANJI
0x4E5D KANJINUMERIC KANJI
0x5341 KANJINUMERIC KANJI
0x767E KANJINUMERIC KANJI
0x5343 KANJINUMERIC KANJI
0x4E07 KANJINUMERIC KANJI
0x5104 KANJINUMERIC KANJI
0x5146 KANJINUMERIC KANJI
# ZENKAKU
0xFF10..0xFF19 NUMERIC
0xFF21..0xFF3A ALPHA
0xFF41..0xFF5A ALPHA
0xFF01..0xFF0F SYMBOL
0xFF1A..0xFF1F SYMBOL
0xFF3B..0xFF40 SYMBOL
0xFF5B..0xFF65 SYMBOL
0xFFE0..0xFFEF SYMBOL # HalfWidth and Full width Form
# OTHER SYMBOLS
0x2000..0x206F SYMBOL # General Punctuation
0x2070..0x209F NUMERIC # Superscripts and Subscripts
0x20A0..0x20CF SYMBOL # Currency Symbols
0x20D0..0x20FF SYMBOL # Combining Diaritical Marks for Symbols
0x2100..0x214F SYMBOL # Letterlike Symbols
0x2150..0x218F NUMERIC # Number forms
0x2100..0x214B SYMBOL # Letterlike Symbols
0x2190..0x21FF SYMBOL # Arrow
0x2200..0x22FF SYMBOL # Mathematical Operators
0x2300..0x23FF SYMBOL # Miscellaneuos Technical
0x2460..0x24FF SYMBOL # Enclosed NUMERICs
0x2501..0x257F SYMBOL # Box Drawing
0x2580..0x259F SYMBOL # Block Elements
0x25A0..0x25FF SYMBOL # Geometric Shapes
0x2600..0x26FE SYMBOL # Miscellaneous Symbols
0x2700..0x27BF SYMBOL # Dingbats
0x27F0..0x27FF SYMBOL # Supplemental Arrows A
0x27C0..0x27EF SYMBOL # Miscellaneous Mathematical Symbols-A
0x2800..0x28FF SYMBOL # Braille Patterns
0x2900..0x297F SYMBOL # Supplemental Arrows B
0x2B00..0x2BFF SYMBOL # Miscellaneous Symbols and Arrows
0x2A00..0x2AFF SYMBOL # Supplemental Mathematical Operators
0x3300..0x33FF SYMBOL
0x3200..0x32FE SYMBOL # ENclosed CJK Letters and Months
0x3000..0x303F SYMBOL # CJK Symbol and Punctuation
0xFE30..0xFE4F SYMBOL # CJK Compatibility Forms
0xFE50..0xFE6B SYMBOL # Small Form Variants
# added 2006/3/13
0x3007 SYMBOL KANJINUMERIC
# END OF TABLE
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
AC_INIT(matrix.def)
AM_INIT_AUTOMAKE(mecab-naist-jdic, 0.6.3b-20111013)
AC_PROG_INSTALL
AC_PROG_LN_S
AC_SUBST(datarootdir)
AC_ARG_WITH(
dicdir,
[ --with-dicdir=DIR set dicdir location ],
AC_MSG_RESULT(using $with_dicdir for dicdir)
MECAB_DICDIR=$with_dicdir, MECAB_DICDIR="no"
)
AC_ARG_WITH(
mecab-config,
[ --with-mecab-config=PATH set mecab-config location [search path]],
AC_MSG_RESULT(using $with_mecab_config for mecab-config)
MECAB_CONFIG=$with_mecab_config,
[AC_PATH_PROGS(MECAB_CONFIG, mecab-config, no)]
)
if test $MECAB_CONFIG = "no";
then
AC_MSG_ERROR(mecab-config is not found in your system)
fi
AC_SUBST(MECAB_CONFIG)
if test $MECAB_DICDIR = "no";
then
MECAB_DICDIR="`$MECAB_CONFIG --dicdir`/naist-jdic"
fi
MECAB_DICT_INDEX="`$MECAB_CONFIG --libexecdir`/mecab-dict-index"
MECAB_MECABRC="`$MECAB_CONFIG --sysconfdir`/mecabrc"
AC_SUBST(MECAB_DICDIR)
AC_SUBST(MECAB_DICT_INDEX)
AC_SUBST(MECAB_MECABRC)
AC_ARG_WITH(
charset,
[ --with-charset=charset set default charset (euc-jp/sjis/utf-8))],
[CHARSET=${withval}], [CHARSET='euc-jp']
)
CHARSET=$CHARSET
AC_SUBST(CHARSET)
MECAB_LEXICAL_DIC=`echo *.csv`
AC_SUBST(MECAB_LEXICAL_DIC)
MECAB_GENDATA="matrix.bin char.bin sys.dic unk.dic"
AC_SUBST(MECAB_GENDATA)
MECAB_PREDATA="`echo *.def` dicrc"
AC_SUBST(MECAB_PREDATA)
AC_OUTPUT([Makefile])
;
; Configuration file of NAIST Japanese Dictionary
;
cost-factor = 800
bos-feature = BOS/EOS,*,*,*,*,*,*,*,*
eval-size = 8
unk-eval-size = 4
config-charset = EUC-JP
; yomi
node-format-yomi = %pS%f[7]
unk-format-yomi = %M
eos-format-yomi = \n
; simple
node-format-simple = %m\t%F-[0,1,2,3]\n
eos-format-simple = EOS\n
; ChaSen
node-format-chasen = %m\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n
unk-format-chasen = %m\t%m\t%m\t%F-[0,1,2,3]\t\t\n
eos-format-chasen = EOS\n
; ChaSen (include spaces)
node-format-chasen2 = %M\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n
unk-format-chasen2 = %M\t%m\t%m\t%F-[0,1,2,3]\t\t\n
eos-format-chasen2 = EOS\n
#
# ここから bigram 定義
#
# %F[0..N] Unigram文脈
# %F?: 未定義の場合は,このテンプレートを適用しない
# POS Unigram
UNIGRAM U1:%F[0]
UNIGRAM U2:%F[0],%F?[1]
UNIGRAM U3:%F[0],%F[1],%F?[2]
UNIGRAM U4:%F[0],%F[1],%F[2],%F?[3]
# Word-POS
UNIGRAM W0:%F[6]
UNIGRAM W1:%F[0]/%F[6]
UNIGRAM W2:%F[0],%F?[1]/%F[6]
UNIGRAM W3:%F[0],%F[1],%F?[2]/%F[6]
UNIGRAM W4:%F[0],%F[1],%F[2],%F?[3]/%F[6]
# Word-Read-POS
UNIGRAM R0:%F[7]
UNIGRAM R1:%F[6],%F[7]
UNIGRAM R2:%F[0],%F[6],%F[7]
UNIGRAM R3:%F[0],%F?[1],%F[6],%F[7]
UNIGRAM R4:%F[0],%F[1],%F?[2],%F[6],%F[7]
UNIGRAM R5:%F[0],%F[1],%F[2],%F?[3],%F[6],%F[7]
# char type
UNIGRAM T0:%t
UNIGRAM T1:%F[0]/%t
UNIGRAM T2:%F[0],%F?[1]/%t
UNIGRAM T3:%F[0],%F[1],%F?[2]/%t
UNIGRAM T4:%F[0],%F[1],%F[2],%F?[3]/%t
#
# ここから bigram 定義
#
# %L[0..N] 左文脈
# %R[0..N] 右文脈
#
# %R?: 未定義の場合は,このテンプレートを適用しない
# 品詞
BIGRAM B00:%L[0]/%R[0]
BIGRAM B01:%L[0],%L?[1]/%R[0]
BIGRAM B02:%L[0]/%R[0],%R?[1]
BIGRAM B03:%L[0]/%R[0],%R[1],%R?[2]
BIGRAM B04:%L[0],%L?[1]/%R[0],%R[1],%R?[2]
BIGRAM B05:%L[0]/%R[0],%R[1],%R[2],%R?[3]
BIGRAM B06:%L[0],%L?[1]/%R[0],%R[1],%R[2],%R?[3]
BIGRAM B07:%L[0],%L[1],%L?[2]/%R[0]
BIGRAM B08:%L[0],%L[1],%L?[2]/%R[0],%R?[1]
BIGRAM B09:%L[0],%L[1],%L[2],%L?[3]/%R[0]
BIGRAM B10:%L[0],%L[1],%L[2],%L?[3]/%R[0],%R?[1]
BIGRAM B11:%L[0],%L[1],%L?[2]/%R[0],%R[1],%R?[2]
BIGRAM B12:%L[0],%L[1],%L?[2]/%R[0],%R[1],%R[2],%R?[3]
BIGRAM B13:%L[0],%L[1],%L[2],%L?[3]/%R[0],%R[1],%R?[2]
BIGRAM B14:%L[0],%L[1],%L[2],%L?[3]/%R[0],%R[1],%R[2],%R?[3]
# 活用
BIGRAM B20:%L[0],%L?[4]/%R[0]
BIGRAM B21:%L[0],%L?[5]/%R[0]
BIGRAM B22:%L[0],%L?[4],%L?[5]/%R[0]
BIGRAM B23:%L[0]/%R[0],%R?[4]
BIGRAM B24:%L[0]/%R[0],%R?[5]
BIGRAM B25:%L[0]/%R[0],%R?[4],%R?[5]
BIGRAM B26:%L[0],%L?[4]/%R[0],%R?[4]
BIGRAM B27:%L[0],%L?[4]/%R[0],%R?[5]
BIGRAM B28:%L[0],%L?[5]/%R[0],%R?[4]
BIGRAM B29:%L[0],%L?[5]/%R[0],%R?[5]
BIGRAM B30:%L[0],%L?[4],%L?[5]/%R[0],%R?[4]
BIGRAM B31:%L[0],%L?[4],%L?[5]/%R[0],%R?[5]
BIGRAM B32:%L[0],%L?[4]/%R[0],%R?[4],%R?[5]
BIGRAM B33:%L[0],%L?[5]/%R[0],%R?[4],%R?[5]
BIGRAM B34:%L[0],%L?[4],%L?[5]/%R[0],%R?[4],%R?[5]
# POS leaf category
BIGRAM B40:%L[0],%L[1],%L[2],%L?[4]/%R[0],%R[1],%R[2]
BIGRAM B41:%L[0],%L[1],%L[2],%L?[5]/%R[0],%R[1],%R[2]
BIGRAM B42:%L[0],%L[1],%L[2],%L?[4],%L?[5]/%R[0],%R[1],%R[2]
BIGRAM B43:%L[0],%L[1],%L[2]/%R[0],%R[1],%R[2],%R?[4]
BIGRAM B44:%L[0],%L[1],%L[2]/%R[0],%R[1],%R[2],%R?[5]
BIGRAM B45:%L[0],%L[1],%L[2]/%R[0],%R[1],%R[2],%R?[4],%R?[5]
BIGRAM B46:%L[0],%L[1],%L[2],%L?[4]/%R[0],%R[1],%R[2],%R?[4]
BIGRAM B47:%L[0],%L[1],%L[2],%L?[4]/%R[0],%R[1],%R[2],%R?[5]
BIGRAM B48:%L[0],%L[1],%L[2],%L?[5]/%R[0],%R[1],%R[2],%R?[4]
BIGRAM B49:%L[0],%L[1],%L[2],%L?[5]/%R[0],%R[1],%R[2],%R?[5]
BIGRAM B50:%L[0],%L[1],%L[2],%L?[4],%L?[5]/%R[0],%R[1],%R[2],%R?[4]
BIGRAM B51:%L[0],%L[1],%L[2],%L?[4],%L?[5]/%R[0],%R[1],%R[2],%R?[5]
BIGRAM B52:%L[0],%L[1],%L[2],%L?[4]/%R[0],%R[1],%R[2],%R?[4],%R?[5]
BIGRAM B53:%L[0],%L[1],%L[2],%L?[5]/%R[0],%R[1],%R[2],%R?[4],%R?[5]
BIGRAM B54:%L[0],%L[1],%L[2],%L?[4],%L?[5]/%R[0],%R[1],%R[2],%R?[4],%R?[5]
# 語彙化
BIGRAM B61:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3]
BIGRAM B61:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3],%R[4]
BIGRAM B62:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3],%R[5]
BIGRAM B63:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5]
BIGRAM B64:%L[0],%L[1],%L[2],%L[3]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B65:%L[0],%L[1],%L[2],%L[3],%L[4]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B66:%L[0],%L[1],%L[2],%L[3],%L[5]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B67:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B68:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B70:%L?[6]/%R?[6]
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
#! /bin/sh
# mkinstalldirs --- make directory hierarchy
# Author: Noah Friedman <friedman@prep.ai.mit.edu>
# Created: 1993-05-16
# Public domain
# $Id: mkinstalldirs,v 1.13 1999/01/05 03:18:55 bje Exp $
errstatus=0
for file
do
set fnord `echo ":$file" | sed -ne 's/^:\//#/;s/^://;s/\// /g;s/^#/\//;p'`
shift
pathcomp=
for d
do
pathcomp="$pathcomp$d"
case "$pathcomp" in
-* ) pathcomp=./$pathcomp ;;
esac
if test ! -d "$pathcomp"; then
echo "mkdir $pathcomp"
mkdir "$pathcomp" || lasterr=$?
if test ! -d "$pathcomp"; then
errstatus=$lasterr
fi
fi
pathcomp="$pathcomp/"
done
done
exit $errstatus
# mkinstalldirs ends here
This diff is collapsed.
その他,間投,*,* 0
フィラー,*,*,* 1
感動詞,*,*,* 2
記号,アルファベット,*,* 3
記号,一般,*,* 4
記号,括弧開,*,* 5
記号,括弧閉,*,* 6
記号,句点,*,* 7
記号,空白,*,* 8
記号,読点,*,* 9
形容詞,自立,*,* 10
形容詞,接尾,*,* 11
形容詞,非自立,*,* 12
助詞,格助詞,一般,* 13
助詞,格助詞,引用,* 14
助詞,格助詞,連語,* 15
助詞,係助詞,*,* 16
助詞,終助詞,*,* 17
助詞,接続助詞,*,* 18
助詞,特殊,*,* 19
助詞,副詞化,*,* 20
助詞,副助詞,*,* 21
助詞,副助詞/並立助詞/終助詞,*,* 22
助詞,並立助詞,*,* 23
助詞,連体化,*,* 24
助動詞,*,*,* 25
接続詞,*,*,* 26
接頭詞,形容詞接続,*,* 27
接頭詞,数接続,*,* 28
接頭詞,動詞接続,*,* 29
接頭詞,名詞接続,*,* 30
動詞,自立,*,* 31
動詞,接尾,*,* 32
動詞,非自立,*,* 33
副詞,一般,*,* 34
副詞,助詞類接続,*,* 35
名詞,サ変接続,*,* 36
名詞,ナイ形容詞語幹,*,* 37
名詞,一般,*,* 38
名詞,引用文字列,*,* 39
名詞,形容動詞語幹,*,* 40
名詞,固有名詞,一般,* 41
名詞,固有名詞,人名,一般 42
名詞,固有名詞,人名,姓 43
名詞,固有名詞,人名,名 44
名詞,固有名詞,組織,* 45
名詞,固有名詞,地域,一般 46
名詞,固有名詞,地域,国 47
名詞,数,*,* 48
名詞,接続詞的,*,* 49
名詞,接尾,サ変接続,* 50
名詞,接尾,一般,* 51
名詞,接尾,形容動詞語幹,* 52
名詞,接尾,助数詞,* 53
名詞,接尾,助動詞語幹,* 54
名詞,接尾,人名,* 55
名詞,接尾,地域,* 56
名詞,接尾,特殊,* 57
名詞,接尾,副詞可能,* 58
名詞,代名詞,一般,* 59
名詞,代名詞,縮約,* 60
名詞,動詞非自立的,*,* 61
名詞,特殊,助動詞語幹,* 62
名詞,非自立,一般,* 63
名詞,非自立,形容動詞語幹,* 64
名詞,非自立,助動詞語幹,* 65
名詞,非自立,副詞可能,* 66
名詞,副詞可能,*,* 67
連体詞,*,*,* 68
#
# Feature(POS) to Internal State mapping
#
[unigram rewrite]
# 読み,発音をとりのぞいて, 品詞1,2,3,4,活用形,活用型,原形,よみ を使う
*,*,*,*,*,*,*,* $1,$2,$3,$4,$5,$6,$7,$8
# 読みがない場合は無視
*,*,*,*,*,*,* $1,$2,$3,$4,$5,$6,$7,*
[left rewrite]
(助詞|助動詞),*,*,*,*,*,(ない|無い) $1,$2,$3,$4,$5,$6,無い
(助詞|助動詞),終助詞,*,*,*,*,(よ|ヨ) $1,$2,$3,$4,$5,$6,よ