Commit ddb2bf41 authored by Matthew Pideil's avatar Matthew Pideil

Initial import of node-iconv version 2.1.0

parents
*.node
*.o
.lock-wscript
.project
/build/
/node_modules/
deps/libiconv/tests
\ No newline at end of file
node-iconv license
==============================================================================
Copyright (c) 2013, Ben Noordhuis <info@bnoordhuis.nl>
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
GNU libiconv license
==============================================================================
Copyright (C) 2000-2009, 2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
# node-iconv
Text recoding in JavaScript for fun and profit!
node-iconv may or may not work on Windows. Please try it and report any issues
you have.
## Installing with [npm](http://npmjs.org/)
$ npm install iconv
Note that you do not need to have a copy of libiconv installed to use this
module.
## Compiling from source
$ git clone git://github.com/bnoordhuis/node-iconv.git
$ node-gyp configure build
$ npm install .
## Usage
Encode from one character encoding to another:
// convert from UTF-8 to ISO-8859-1
var Buffer = require('buffer').Buffer;
var Iconv = require('iconv').Iconv;
var assert = require('assert');
var iconv = new Iconv('UTF-8', 'ISO-8859-1');
var buffer = iconv.convert('Hello, world!');
var buffer2 = iconv.convert(new Buffer('Hello, world!'));
assert.equals(buffer.inspect(), buffer2.inspect());
// do something useful with the buffers
A simple ISO-8859-1 to UTF-8 conversion TCP service:
var net = require('net');
var Iconv = require('iconv').Iconv;
var server = net.createServer(function(conn) {
var iconv = new Iconv('latin1', 'utf-8');
conn.pipe(iconv).pipe(conn);
});
server.listen(8000);
console.log('Listening on tcp://0.0.0.0:8000/');
Look at test/test-basic.js and test/test-stream.js for more examples
and node-iconv's behaviour under error conditions.
## Notes
Things to keep in mind when you work with node-iconv.
### Chunked data
Say you are reading data in chunks from a HTTP stream. The logical input is a
single document (the full POST request data) but the physical input will be
spread over several buffers (the request chunks).
You must accumulate the small buffers into a single large buffer before
performing the conversion. If you don't, you will get unexpected results with
multi-byte and stateful character sets like UTF-8 and ISO-2022-JP.
The above only applies when you are calling `Iconv#convert()` yourself.
If you use the streaming interface, node-iconv takes care of stitching
partial character sequences together again.
### Dealing with untranslatable characters
Characters are not always translatable to another encoding. The UTF-8 string
"ça va が", for example, cannot be represented in plain 7-bits ASCII without
some loss of fidelity.
By default, node-iconv throws EILSEQ when untranslatabe characters are
encountered but this can be customized. Quoting the `iconv_open(3)` man page:
//TRANSLIT
When the string "//TRANSLIT" is appended to tocode, transliteration is
activated. This means that when a character cannot be represented in the
target character set, it can be approximated through one or several
similarly looking characters.
//IGNORE
When the string "//IGNORE" is appended to tocode, characters that cannot be
represented in the target character set will be silently discarded.
Example usage:
var iconv = new Iconv('UTF-8', 'ASCII');
iconv.convert('ça va'); // throws EILSEQ
var iconv = new Iconv('UTF-8', 'ASCII//IGNORE');
iconv.convert('ça va'); // returns "a va"
var iconv = new Iconv('UTF-8', 'ASCII//TRANSLIT');
iconv.convert('ça va'); // "ca va"
var iconv = new Iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE');
iconv.convert('ça va が'); // "ca va "
### EINVAL
EINVAL is raised when the input ends in a partial character sequence. This is a
feature, not a bug.
{
'targets': [
{
'target_name': 'iconv',
'defines': [
'ICONV_CONST=const',
'USE_AIX=1',
'USE_DOS=1',
'USE_EXTRA=1',
'USE_OSF1=1',
'LIBDIR="."', # not actually used
],
'include_dirs': [
'deps/libiconv/srclib',
'support',
'<!(node -e "require(\'nan\')")',
],
'sources': [
'deps/libiconv/libcharset/lib/localcharset.c',
'deps/libiconv/lib/iconv.c',
'src/binding.cc',
],
'conditions': [
['OS == "win"', {
'defines': ['WIN32_NATIVE=1'],
}, {
'defines': ['HAVE_WORKING_O_NOFOLLOW=1'],
'cflags': [
# silence warnings from iconv.c
'-Wno-unused-function',
'-Wno-unused-parameter',
'-Wno-unused-variable',
],
}],
],
}
]
}
ChangeLog merge=merge-changelog
Makefile
*.l[ao]
*.[ao]
*.so
config.h
config.log
config.status
include/iconv.h
include/iconv.h.inst
lib/.libs/
lib/charset.alias
lib/config.h
lib/libcharset.h
lib/localcharset.h
lib/stamp-h2
libcharset/config.h
libcharset/config.log
libcharset/config.status
libcharset/include/libcharset.h
libcharset/include/localcharset.h
libcharset/include/localcharset.h.inst
libcharset/lib/.libs/
libcharset/lib/charset.alias
libcharset/lib/ref-add.sed
libcharset/lib/ref-del.sed
libcharset/libtool
libtool
po/Makefile.in
po/POTFILES
preload/.libs/
preload/config.log
preload/config.status
preload/libtool
src/iconv_no_i18n
srclib/alloca.h
srclib/arg-nonnull.h
srclib/c++defs.h
srclib/fcntl.h
srclib/signal.h
srclib/stdio.h
srclib/stdlib.h
srclib/string.h
srclib/sys/
srclib/time.h
srclib/unistd.h
srclib/unitypes.h
srclib/uniwidth.h
srclib/warn-on-use.h
stamp-h1
This diff is collapsed.
Bruno Haible <bruno@clisp.org>
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
No packages need to be installed before GNU libiconv is installed.
While some other iconv(3) implementations - like FreeBSD iconv(3) - choose
the "many small shared libraries" and dlopen(3) approach, this implementation
packs everything into a single shared library. Here is a comparison of the
two designs.
* Run-time efficiency
1. A dlopen() based approach needs a cache of loaded shared libraries.
Otherwise, every iconv_open() call will result in a call to dlopen()
and thus to file system related system calls - which is prohibitive
because some applications use the iconv_open/iconv/iconv_close sequence
for every single filename, string, or piece of text.
2. In terms of virtual memory use, both approaches are on par. Being shared
libraries, the tables are shared between any processes that use them.
And because of the demand loading used by Unix systems (and because libiconv
does not have initialization functions), only those parts of the tables
which are needed (typically very few kilobytes) will be read from disk and
paged into main memory.
3. Even with a cache of loaded shared libraries, the dlopen() based approach
makes more system calls, because it has to load one or two shared libraries
for every encoding in use.
* Total size
In the dlopen(3) approach, every shared library has a symbol table and
relocation offset. All together, FreeBSD iconv installs more than 200 shared
libraries with a total size of 2.3 MB. Whereas libiconv installs 0.45 MB.
* Extensibility
The dlopen(3) approach is good for guaranteeing extensibility if the iconv
implementation is distributed without source. (Or when, as in glibc, you
cannot rebuild iconv without rebuilding your libc, thus possibly
destabilizing your system.)
The libiconv package achieves extensibility through the LGPL license:
Every user has access to the source of the package and can extend and
replace just libiconv.so.
The places which have to be modified when a new encoding is added are as
follows: add an #include statement in iconv.c, add an entry in the table in
iconv.c, and of course, update the README and iconv_open.3 manual page.
* Use within other packages
If you want to incorporate an iconv implementation into another package
(such as a mail user agent or web browser), the single library approach
is easier, because:
1. In the shared library approach you have to provide the right directory
prefix which will be used at run time.
2. Incorporating iconv as a static library into the executable is easy -
it won't need dynamic loading. (This assumes that your package is under
the LGPL or GPL license.)
All conversions go through Unicode. This is possible because most of the
world's characters have already been allocated in the Unicode standard.
Therefore we have for each encoding two functions:
- For conversion from the encoding to Unicode, a function called xxx_mbtowc.
- For conversion from Unicode to the encoding, a function called xxx_wctomb,
and for stateful encodings, a function called xxx_reset which returns to
the initial shift state.
All our functions operate on a single Unicode character at a time. This is
obviously less efficient than operating on an entire buffer of characters at
a time, but it makes the coding considerably easier and less bug-prone. Those
who wish best performance should install the Real Thing (TM): GNU libc 2.1
or newer.
All you need to know when hacking (modifying) GNU libiconv or when building
it off the CVS.
Requirements
============
You will need reasonably recent versions of the build tools:
* A C compiler. Such as GNU GCC.
+ Homepage:
http://gcc.gnu.org/
* GNU automake
+ Homepage:
http://www.gnu.org/software/automake/
* GNU autoconf
+ Homepage:
http://www.gnu.org/software/autoconf/
* GNU m4
+ Homepage:
http://www.gnu.org/software/m4/
* GNU gperf
+ Homepage:
http://www.gnu.org/software/gperf/