Skip to content

Fixed ability to dump a Deb822 object in a different encoding in python2

Mihai Ibanescu requested to merge mibanescu-guest/python-debian:master into master

We have run across an older deb file:

http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb

One of its files, usr/lib/aspell/íslenska.alias, is not utf8-encoded in the control file.

This exposed what I think is a bug in deb822.Deb822: in python 2, I cannot load a sequence (dictionary) in one encoding and dump it into a different encoding. This works fine in python3. The difference is that keys are internally stored as text both in PY2 and PY3, but they mean different things. In PY3, text means unicode, so the original encoding is irrelevant. In PY2, text means binary (in PY3 parlance), and the original encoding is relevant.

To simplify the problem, I will only use the first offending letter of the file that has problems, í (\xed in iso-8859-1). Here is my test script:

from debian import deb822

obj = deb822.Deb822({'\xed': 'i'}, encoding='iso-8859-1')
print(obj.dump(encoding='utf-8'))

Running it in python3:

python3.6 test.py
í: i

Running it in python 2.7:

python2.7 test.py
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 0: ordinal not in range(128)

Another bug in PY2 is related to the implementation of __str__: it should return a string object, but self.dump() returns Unicode.

Edited by Stuart Prescott

Merge request reports

Loading