Many improvements for dl10n-check
This work branch provides lots of improvements to dl10n-check
:
- now it uses (via the
Debian::Pkg::DebSrc
module) the dpkg API to extract the source package to a temporary directory: while this takes disk space (and more time for the unpacking), it removes the need to manually parse the source package. This alone should fix almost (if not all) the problematic sources mentioned inIGNMATERIAL
(in etc/dl10n.conf) - it uses the dpkg API also to:
- parse the
control
files - compare versions (instead of spawning
dpkg
commands) - parse the
.dsc
files
- improves the handling of languages:
- switches to the core
Locale::*
modules, so it knows about way more languages than now - tweaks the language detection of
.po
files, so it properly recognizes more cases - allow 3-letters languages, e.g.
hsb
, etc
- uses more existing modules, instead of spawning external commands
- drops support for yada
- improves the
convert_to_unicode
subroutine:
- requires the
Text::Iconv
module, caching its instances - move the fallback code to a subroutine, using it only if needed (saves some
s///
)
- adds an optional value for
--careful
, so it is possible to batch DB saves everyN
changes (instead of after every change) - simplifies checking of
Priority
, andSection
in sources - ... and other minor changes
Regarding the memory usage: now dl10n-check
takes more RAM (in my tests ~500MB or so), but simply because the DB now will contains way more files than before. Also, the memory usage is more or less constant, and it does not spike to absurd values when handling very big sources (e.g. 0ad-data, libreoffice, etc) So, it should be fine to let it run on all the sources, now.
Edited by Pino Toscano