Commit 42e1d237 authored by Ana Guerrero López's avatar Ana Guerrero López

Import Upstream version 4.5.1

parent 05e4f698
Metadata-Version: 1.0
Name: IMDbPY
Version: 4.4
Version: 4.5.1
Summary: Python package to access the IMDb's database
Home-page: http://imdbpy.sf.net/
Author: Davide Alberani
......@@ -32,6 +32,8 @@ Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Natural Language :: English
Classifier: Natural Language :: Italian
Classifier: Natural Language :: Turkish
Classifier: Programming Language :: Python
Classifier: Programming Language :: C
Classifier: Operating System :: OS Independent
......
......@@ -69,13 +69,16 @@ docs/README.users
docs/README.utf8
docs/TODO.txt
docs/imdbpy.cfg
docs/imdbpy44.dtd
docs/imdbpy45.dtd
docs/imdbpyPowered.png
docs/imdbpyico.png
docs/imdbpyico.xpm
docs/imdbpyico16x16.ico
docs/imdbpyico32x32.ico
docs/imdbpywin.bmp
docs/goodies/README.txt
docs/goodies/applydiffs.sh
docs/goodies/reduce.sh
imdb/Character.py
imdb/Company.py
imdb/Movie.py
......@@ -88,6 +91,7 @@ imdb/articles.py
imdb/helpers.py
imdb/utils.py
imdb/locale/__init__.py
imdb/locale/__init__.pyc
imdb/locale/generatepot.py
imdb/locale/imdbpy-en.po
imdb/locale/imdbpy-it.po
......
Metadata-Version: 1.0
Name: IMDbPY
Version: 4.4
Version: 4.5.1
Summary: Python package to access the IMDb's database
Home-page: http://imdbpy.sf.net/
Author: Davide Alberani
......@@ -32,6 +32,8 @@ Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Natural Language :: English
Classifier: Natural Language :: Italian
Classifier: Natural Language :: Turkish
Classifier: Programming Language :: Python
Classifier: Programming Language :: C
Classifier: Operating System :: OS Independent
......
This diff is collapsed.
Changelog for IMDbPY
====================
* What's the new in release 4.5.1 "Dollhouse" (01 Mar 2010)
[general]
- reintroduced the ez_setup.py file.
- fixes for AKAs on 'release dates'.
- added the dtd.
* What's the new in release 4.5 "Invictus" (28 Feb 2010)
[general]
- moved to setuptools 0.6c11.
- trying to make the SVN release versions work fine.
- http/mobile should work in GAE (Google App Engine).
- added some goodies scripts, useful for programmers (see the
docs/goodies directory).
[http/mobile]
- removed urllib-based User-Agent header.
- fixes for some minor changes to IMDb's html.
- fixes for garbage in movie quotes.
- improvements in the handling of AKAs.
[mobile]
- fixe for AKAs in search results.
[sql]
- fixes for bugs restoring imdbIDs.
- first steps to split CSV creation/insertion.
* What's the new in release 4.4 "Gandhi" (06 Jan 2010)
[general]
- introduced a logging facility; see README.logging.
......
......@@ -375,3 +375,20 @@ syntax:
The created files will be imported near the end of the imdbpy2sql.py
processing; notice that after that, you can safely cancel these files.
CSV partial processing
======================
It's possible, since IMDbPY 4.5, to separate the two steps involved using
CSV files.
With the --csv-only-write command line option the old database will
be zeroed and the CSV files saved (along with imdbIDs information).
Using the --csv-only-load option you can load these saved files into
an existing database (this database MUST be the one left almost empty
by the previous run).
Beware that right now the whole procedure is not very well tested.
Using both commands, on the command line you still have to specify
the whole "-u URI -d /path/plainTextDataFiles/ -c /path/CSVfiles/"
series of arguments.
IMDbPY's goodies
================
Useful shell scripts, especially for developers.
See the comments at the top of the files for usage and
configuration options.
applydiffs.sh: Bash script useful apply patches to a set of
IMDb's plain text data files.
You can use this script to apply the diffs files distributed
on a (more or less) weekly base by IMDb.
reduce.sh: Bash script useful to create a "slimmed down" version
of the IMDb's plain text data files.
It's useful to create shorter versions of the plain
text data files, to test the imdbpy2sql.py script faster.
#!/bin/sh
#
# applydiffs.sh: Bash script useful apply patches to a set of
# IMDb's plain text data files.
#
# Usage: copy this script in the directory with the plain text
# data files and run it passing a list of diffs-file(s) as
# arguments.
# It's possible that the plain text data files will be left
# in an inconsistent state, so a backup is probably a good idea.
#
# Copyright: 2009-2010 Davide Alberani <da@erlug.linux.it>
#
# This program is released under the terms of the GNU GPL 2 or later license.
#
if [ $# -lt 1 ] ; then
echo "USAGE: $0 diffs-file [diffs-file...]"
echo " Beware that diffs-file must be sorted from the older to the newer!"
exit 1
fi
COMPRESSION="1"
ALL_DIFFS="$@"
for DIFFS in $@
do
rm -rf diffs
echo -n "Unpacking $DIFFS..."
tar xfz "$DIFFS"
echo " done!"
for DF in diffs/*.list
do
fname="`basename $DF`"
if [ -f "$fname" ] ; then
wasUnpacked=1
applyTo="$fname"
elif [ -f "$fname.gz" ] ; then
wasUnpacked=0
applyTo="$fname.gz"
else
echo "NOT applying: $fname doesn't exists."
continue
fi
if [ $wasUnpacked -eq 0 ] ; then
echo -n "unzipping $applyTo..."
gunzip "$applyTo"
echo "done!"
fi
echo -n "patching $fname with $DF..."
patch -s "$fname" "$DF"
if [ $? -ne 0 ] ; then
echo "FAILED!"
continue
fi
echo "done!"
done
echo "finished with $DIFFS"
echo ""
done
rm -rf diffs
for lfile in *.list
do
echo -n "gzipping $lfile..."
gzip -$COMPRESSION "$lfile"
echo "done!"
done
#!/bin/sh
#
# reduce.sh: Bash script useful to create a "slimmed down" version of the
# IMDb's plain text data files.
#
# Usage: copy this script in the directory with the plain text data files;
# configure the options below and run it.
#
# Copyright: 2009-2010 Davide Alberani <da@erlug.linux.it>
#
# This program is released under the terms of the GNU GPL 2 or later license.
#
# Directory with the plain text data file.
ORIG_DIR="."
# Directory where "reduced" files will be stored; it will be create if needed.
# Beware that this directory is relative to ORIG_DIR.
DEST_DIR="./partial/"
# How much percentage of the original file to keep.
KEEP_X_PERCENT="1"
# The compression ratio of the created files.
COMPRESSION="1"
# -
# Nothing to configure below.
# -
cd "$ORIG_DIR"
mkdir -p "$DEST_DIR"
DIV_BY="`expr 100 / $KEEP_X_PERCENT`"
for file in *.gz
do
LINES="`zcat "$file" | wc -l`"
CONSIDER="`expr $LINES / $DIV_BY`"
FULL_CONS="$CONSIDER"
CONSIDER="`expr $CONSIDER / 2`"
NEWNAME="`echo "$file" | rev | cut -c 4- | rev`"
# Tries to keep enough lines from the top of the file.
MIN_TOP_LINES="`zgrep -m 1 "^-----------------------------------------" -n "$file" | cut -d : -f 1`"
if test -z "$MIN_TOP_LINES" ; then
MIN_TOP_LINES=0
fi
if test "$file" == "business.list.gz" -a $MIN_TOP_LINES -lt 260 ; then
MIN_TOP_LINES=260
elif test "$file" == "alternate-versions.list.gz" -a $MIN_TOP_LINES -lt 320 ; then
MIN_TOP_LINES=320
elif test "$file" == "cinematographers.list.gz" -a $MIN_TOP_LINES -lt 240 ; then
MIN_TOP_LINES=240
elif test "$file" == "complete-cast.list.gz" ; then
MIN_TOP_LINES=140
elif test "$file" == "complete-crew.list.gz" ; then
MIN_TOP_LINES=150
elif test "$file" == "composers.list.gz" -a $MIN_TOP_LINES -lt 160 ; then
MIN_TOP_LINES=160
elif test "$file" == "costume-designers.list.gz" -a $MIN_TOP_LINES -lt 240 ; then
MIN_TOP_LINES=240
elif test "$file" == "directors.list.gz" -a $MIN_TOP_LINES -lt 160 ; then
MIN_TOP_LINES=160
elif test "$file" == "genres.list.gz" -a $MIN_TOP_LINES -lt 400 ; then
MIN_TOP_LINES=400
elif test "$file" == "keywords.list.gz" -a $MIN_TOP_LINES -lt 36000 ; then
MIN_TOP_LINES=36000
elif test "$file" == "literature.list.gz" -a $MIN_TOP_LINES -lt 320 ; then
MIN_TOP_LINES=320
elif test "$file" == "mpaa-ratings-reasons.list.gz" -a $MIN_TOP_LINES -lt 400 ; then
MIN_TOP_LINES=400
elif test "$file" == "producers.list.gz" ; then
MIN_TOP_LINES=220
elif test "$file" == "production-companies.list.gz" -a $MIN_TOP_LINES -lt 270 ; then
MIN_TOP_LINES=270
elif test "$file" == "production-designers.list.gz" -a $MIN_TOP_LINES -lt 240 ; then
MIN_TOP_LINES=240
elif test "$file" == "ratings.list.gz" -a $MIN_TOP_LINES -lt 320 ; then
MIN_TOP_LINES=320
elif test "$file" == "special-effects-companies.list.gz" -a $MIN_TOP_LINES -lt 320 ; then
MIN_TOP_LINES=320
elif test "$file" == "sound-mix.list.gz" -a $MIN_TOP_LINES -lt 340 ; then
MIN_TOP_LINES=340
elif test "$file" == "writers.list.gz" ; then
MIN_TOP_LINES=400
else
MIN_TOP_LINES="`expr $MIN_TOP_LINES + 60`"
fi
if test $MIN_TOP_LINES -gt $CONSIDER ; then
TOP_CONSIDER=$MIN_TOP_LINES
else
TOP_CONSIDER=$CONSIDER
fi
HOW_MANY="`expr $TOP_CONSIDER + $CONSIDER`"
echo "Processing $file [$KEEP_X_PERCENT%: $HOW_MANY lines]"
zcat "$file" | head -$TOP_CONSIDER > "$DEST_DIR/$NEWNAME"
zcat "$file" | tail -$CONSIDER >> "$DEST_DIR/$NEWNAME"
gzip -f -$COMPRESSION "$DEST_DIR/$NEWNAME"
done
<!--
XML Document Type Definition for IMDbPY 4.4.
XML Document Type Definition for IMDbPY 4.5.
http://imdbpy.sf.net/dtd/imdbpy44.dtd
http://imdbpy.sf.net/dtd/imdbpy45.dtd
Copyright 2009 H. Turgut Uyar <uyar@tekir.org>
2009 Davide Alberani <da@erlug.linux.it>
2009-2010 Davide Alberani <da@erlug.linux.it>
-->
<!ELEMENT movie (
airing
| akas
| akas-from-release-info
| alternate-versions
| amazon-reviews
| animation-department
......@@ -337,6 +338,7 @@
<!ELEMENT agent-address (item)*>
<!ELEMENT akas (item)*>
<!ELEMENT akas-from-release-info (item)*>
<!ELEMENT alternate-versions (item)*>
<!ELEMENT article (item)*>
<!ELEMENT biography (item)*>
......
......@@ -14,7 +14,7 @@ the appropriate options to ``use_setuptools()``.
This file can also be run as a script to install or upgrade setuptools.
"""
import sys
DEFAULT_VERSION = "0.6c9"
DEFAULT_VERSION = "0.6c11"
DEFAULT_URL = "http://pypi.python.org/packages/%s/s/setuptools/" % sys.version[:3]
md5_data = {
......@@ -28,6 +28,14 @@ md5_data = {
'setuptools-0.6b4-py2.4.egg': '4cb2a185d228dacffb2d17f103b3b1c4',
'setuptools-0.6c1-py2.3.egg': 'b3f2b5539d65cb7f74ad79127f1a908c',
'setuptools-0.6c1-py2.4.egg': 'b45adeda0667d2d2ffe14009364f2a4b',
'setuptools-0.6c10-py2.3.egg': 'ce1e2ab5d3a0256456d9fc13800a7090',
'setuptools-0.6c10-py2.4.egg': '57d6d9d6e9b80772c59a53a8433a5dd4',
'setuptools-0.6c10-py2.5.egg': 'de46ac8b1c97c895572e5e8596aeb8c7',
'setuptools-0.6c10-py2.6.egg': '58ea40aef06da02ce641495523a0b7f5',
'setuptools-0.6c11-py2.3.egg': '2baeac6e13d414a9d28e7ba5b5a596de',
'setuptools-0.6c11-py2.4.egg': 'bd639f9b0eac4c42497034dec2ec0c2b',
'setuptools-0.6c11-py2.5.egg': '64c94f3bf7a72a13ec83e0b24f2749b2',
'setuptools-0.6c11-py2.6.egg': 'bfa92100bd772d5a213eedd356d64086',
'setuptools-0.6c2-py2.3.egg': 'f0064bf6aa2b7d0f3ba0b43f20817c27',
'setuptools-0.6c2-py2.4.egg': '616192eec35f47e8ea16cd6a122b7277',
'setuptools-0.6c3-py2.3.egg': 'f181fa125dfe85a259c9cd6f1d7b78fa',
......
......@@ -25,7 +25,7 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
__all__ = ['IMDb', 'IMDbError', 'Movie', 'Person', 'Character', 'Company',
'available_access_systems']
__version__ = VERSION = '4.4'
__version__ = VERSION = '4.5.1'
# Import compatibility module (importing it is enough).
import _compat
......
No preview for this file type
......@@ -7,7 +7,7 @@ the imdb.IMDb function will return an instance of this class when
called with the 'accessSystem' argument set to "http" or "web"
or "html" (this is the default).
Copyright 2004-2009 Davide Alberani <da@erlug.linux.it>
Copyright 2004-2010 Davide Alberani <da@erlug.linux.it>
2008 H. Turgut Uyar <uyar@tekir.org>
This program is free software; you can redistribute it and/or modify
......@@ -49,6 +49,17 @@ import characterParser
import companyParser
import topBottomParser
# Logger for miscellaneous functions.
_aux_logger = logging.getLogger('imdbpy.parser.http.aux')
IN_GAE = False
try:
import google.appengine
IN_GAE = True
_aux_logger.info('IMDbPY is running in the Google App Engine environment')
except ImportError:
pass
class _ModuleProxy:
"""A proxy to instantiate and access parsers."""
......@@ -123,7 +134,9 @@ class IMDbURLopener(FancyURLopener):
# XXX: IMDb's web server doesn't like urllib-based programs,
# so lets fake to be Mozilla.
# Wow! I'm shocked by my total lack of ethic! <g>
self.set_header('User-agent', 'Mozilla/5.0')
for header in ('User-Agent', 'User-agent', 'user-agent'):
self.del_header(header)
self.set_header('User-Agent', 'Mozilla/5.0')
# XXX: This class is used also to perform "Exact Primary
# [Title|Name]" searches, and so by default the cookie is set.
c_header = 'id=%s; uu=%s' % (_cookie_id, _cookie_uu)
......@@ -166,7 +179,7 @@ class IMDbURLopener(FancyURLopener):
self.set_header('Range', 'bytes=0-%d' % size)
uopener = self.open(url)
kwds = {}
if PY_VERSION > (2, 3):
if PY_VERSION > (2, 3) and not IN_GAE:
kwds['size'] = size
content = uopener.read(**kwds)
self._last_url = uopener.url
......@@ -485,7 +498,10 @@ class IMDbHTTPAccessSystem(IMDbBase):
def get_movie_release_dates(self, movieID):
cont = self._retrieve(imdbURL_movie_main % movieID + 'releaseinfo')
return self.mProxy.releasedates_parser.parse(cont)
ret = self.mProxy.releasedates_parser.parse(cont)
ret['info sets'] = ('release dates', 'akas')
return ret
get_movie_akas = get_movie_release_dates
def get_movie_vote_details(self, movieID):
cont = self._retrieve(imdbURL_movie_main % movieID + 'ratings')
......
......@@ -239,7 +239,7 @@ class DOMHTMLMovieParser(DOMParserBase):
# Collects akas not encosed in <i> tags.
Attribute(key='other akas',
path="./h5[starts-with(text(), " \
"'Also Known As')]/../div/text()",
"'Also Known As')]/../div//text()",
postprocess=makeSplitter(sep='::')),
Attribute(key='runtimes',
path="./h5[starts-with(text(), " \
......@@ -382,6 +382,11 @@ class DOMHTMLMovieParser(DOMParserBase):
# Remove links to IMDbPro.
for proLink in self.xpath(dom, "//span[@class='pro-link']"):
proLink.drop_tree()
# Remove some 'more' links (keep others, like the one around
# the number of votes).
for tn15more in self.xpath(dom,
"//a[@class='tn15more'][starts-with(@href, '/title/')]"):
tn15more.drop_tree()
return dom
re_space = re.compile(r'\s+')
......@@ -404,12 +409,14 @@ class DOMHTMLMovieParser(DOMParserBase):
obj.accessSystem = self._as
obj.modFunct = self._modFunct
if 'akas' in data or 'other akas' in data:
other_akas = data.get('akas')
if not other_akas:
other_akas = []
data['akas'] = data.get('other akas', []) + other_akas
akas = data.get('akas') or []
akas += data.get('other akas') or []
if 'akas' in data:
del data['akas']
if 'other akas' in data:
del data['other akas']
if akas:
data['akas'] = akas
if 'runtimes' in data:
data['runtimes'] = [x.replace(' min', u'')
for x in data['runtimes']]
......@@ -826,6 +833,12 @@ class DOMHTMLQuotesParser(DOMParserBase):
(re.compile('<!-- sid: t-channel : MIDDLE_CENTER -->', re.I), '</div>')
]
def preprocess_dom(self, dom):
# Remove "link this quote" links.
for qLink in self.xpath(dom, "//p[@class='linksoda']"):
qLink.drop_tree()
return dom
def postprocess_data(self, data):
if 'quotes' not in data:
return {}
......@@ -849,12 +862,22 @@ class DOMHTMLReleaseinfoParser(DOMParserBase):
attrs=Attribute(key='release dates', multi=True,
path={'country': ".//td[1]//text()",
'date': ".//td[2]//text()",
'notes': ".//td[3]//text()"}))]
'notes': ".//td[3]//text()"})),
Extractor(label='akas',
path="//div[@class='_imdbpy_akas']/table/tr",
attrs=Attribute(key='akas', multi=True,
path={'title': "./td[1]/text()",
'countries': "./td[2]/text()"}))]
preprocessors = [
(re.compile('(<h5><a name="?akas"?.*</table>)', re.I | re.M | re.S),
r'<div class="_imdbpy_akas">\1</div>')]
def postprocess_data(self, data):
if not 'release dates' in data: return data
if not ('release dates' in data or 'akas' in data): return data
releases = data.get('release dates') or []
rl = []
for i in data['release dates']:
for i in releases:
country = i.get('country')
date = i.get('date')
if not (country and date): continue
......@@ -866,7 +889,26 @@ class DOMHTMLReleaseinfoParser(DOMParserBase):
if notes:
info += notes
rl.append(info)
data['release dates'] = rl
if releases:
del data['release dates']
if rl:
data['release dates'] = rl
akas = data.get('akas') or []
nakas = []
for aka in akas:
title = aka.get('title', '').strip()
if not title:
continue
countries = aka.get('countries', '').split('/')
if not countries:
nakas.append(title)
else:
for country in countries:
nakas.append('%s::%s' % (title, country.strip()))
if akas:
del data['akas']
if nakas:
data['akas from release info'] = nakas
return data
......@@ -896,7 +938,7 @@ class DOMHTMLRatingsParser(DOMParserBase):
attrs=Attribute(key='mean and median',
path="text()")),
Extractor(label='rating',
path="//a[starts-with(@href, '/List?ratings=')]",
path="//a[starts-with(@href, '/search/title?user_rating=')]",
attrs=Attribute(key='rating',
path="text()")),
Extractor(label='demographic voters',
......
......@@ -8,7 +8,7 @@ E.g., for "Mel Gibson" the referred pages would be:
biography: http://akas.imdb.com/name/nm0000154/bio
...and so on...
Copyright 2004-2009 Davide Alberani <da@erlug.linux.it>
Copyright 2004-2010 Davide Alberani <da@erlug.linux.it>
2008 H. Turgut Uyar <uyar@tekir.org>
This program is free software; you can redistribute it and/or modify
......@@ -64,19 +64,20 @@ class DOMHTMLMaindetailsParser(DOMParserBase):
_birth_attrs = [Attribute(key='birth date',
path={
'day': "./div/a[starts-with(@href, " \
"'/OnThisDay?')]/text()",
"'/date/')]/text()",
'year': "./div/a[starts-with(@href, " \
"'/BornInYear?')]/text()"
"'/search/name?birth_year=')]/text()"
},
postprocess=build_date),
Attribute(key='birth notes',
path="./div/a[starts-with(@href, '/BornWhere?')]/text()")]
path="./div/a[starts-with(@href, " \
"'/search/name?birth_place=')]/text()")]
_death_attrs = [Attribute(key='death date',
path={
'day': "./div/a[starts-with(@href, " \
"'/OnThisDay?')]/text()",
"'/date/')]/text()",
'year': "./div/a[starts-with(@href, " \
"'/DiedInYear?')]/text()"
"'/search/name?death_date=')]/text()"
},
postprocess=build_date),
Attribute(key='death notes',
......@@ -159,19 +160,20 @@ class DOMHTMLBioParser(DOMParserBase):
_birth_attrs = [Attribute(key='birth date',
path={
'day': "./a[starts-with(@href, " \
"'/OnThisDay?')]/text()",
"'/date/')]/text()",
'year': "./a[starts-with(@href, " \
"'/BornInYear?')]/text()"
"'/search/name?birth_year=')]/text()"
},
postprocess=build_date),
Attribute(key='birth notes',
path="./a[starts-with(@href, '/BornWhere?')]/text()")]
path="./a[starts-with(@href, " \
"'/search/name?birth_place=')]/text()")]
_death_attrs = [Attribute(key='death date',
path={
'day': "./a[starts-with(@href, " \
"'/OnThisDay?')]/text()",
"'/date/')]/text()",
'year': "./a[starts-with(@href, " \
"'/DiedInYear?')]/text()"
"'/search/name?death_date=')]/text()"
},
postprocess=build_date),
Attribute(key='death notes',
......
......@@ -8,7 +8,7 @@ E.g., for when searching for the title "the passion", the parsed
page would be:
http://akas.imdb.com/find?q=the+passion&tt=on&mx=20
Copyright 2004-2009 Davide Alberani <da@erlug.linux.it>
Copyright 2004-2010 Davide Alberani <da@erlug.linux.it>
2008 H. Turgut Uyar <uyar@tekir.org>
This program is free software; you can redistribute it and/or modify
......@@ -101,7 +101,8 @@ class DOMHTMLSearchMovieParser(DOMParserBase):
path={
'link': "./a[1]/@href",
'info': ".//text()",
'akas': ".//div[@class='_imdbpyAKA']//text()"
#'akas': ".//div[@class='_imdbpyAKA']//text()"
'akas': ".//p[@class='find-aka']//text()"
},
postprocess=lambda x: (
analyze_imdbid(x.get('link') or u''),
......@@ -122,8 +123,10 @@ class DOMHTMLSearchMovieParser(DOMParserBase):
if self._linkPrefix == '/title/tt':
# Only for movies.
html_string = html_string.replace('(TV mini-series)', '(mini)')
html_string = _reAKAStitles.sub(
r'<div class="_imdbpyAKA">\1::</div>\2', html_string)
html_string = html_string.replace('<p class="find-aka">',
'<p class="find-aka">::')
#html_string = _reAKAStitles.sub(
# r'<div class="_imdbpyAKA">\1::</div>\2', html_string)
return html_string
# Direct hit!
dbme = self._BaseParser(useModule=self._useModule)
......@@ -156,6 +159,7 @@ class DOMHTMLSearchMovieParser(DOMParserBase):
akas = filter(None, datum[2].split('::'))
if self._linkPrefix == '/title/tt':
akas = [a.replace('" - ', '::').rstrip() for a in akas]
akas = [a.replace('aka "', '', 1).lstrip() for a in akas]
datum[1]['akas'] = akas
data['data'][idx] = (datum[0], datum[1])
else:
......
......@@ -49,6 +49,9 @@ re_unhtmlsub = re_unhtml.sub
# imdb person or movie ids.
re_imdbID = re.compile(r'(?<=nm|tt|ch)([0-9]{7})\b')
# movie AKAs.
re_makas = re.compile('(<p class="find-aka">.*?</p>)')
def _unHtml(s):
"""Return a string without tags and no multiple spaces."""
......@@ -206,20 +209,17 @@ class IMDbMobileAccessSystem(IMDbHTTPAccessSystem):
lis = _findBetween(cont, 'td valign="top">', '</td>',
maxRes=results*3)
for li in lis:
akaIdx = li.find('aka <em>')
akas = []
if akaIdx != -1:
akas = [_unHtml(x) for x in li[akaIdx:].split('<br>')]
li = li[:akaIdx]
if akas:
for idx, aka in enumerate(akas):
aka = aka.replace('" - ', '::')
if aka.startswith('aka "'):
aka = aka[5:]
if aka[-1] == '"':
aka = aka[:-1]
akas[idx] = aka
akas = re_makas.findall(li)
for idx, aka in enumerate(akas):
aka = aka.replace('" - ', '::', 1)
aka = _unHtml(aka)
if aka.startswith('aka "'):
aka = aka[5:].strip()
if aka[-1] == '"':
aka = aka[:-1]
akas[idx] = aka
imdbid = re_imdbID.findall(li)
li = re_makas.sub('', li)
mtitle = _unHtml(li)
if not (imdbid and mtitle):
self._mobile_logger.debug('no title/movieID parsing' \
......@@ -428,13 +428,14 @@ class IMDbMobileAccessSystem(IMDbHTTPAccessSystem):
lang[:] = ['<a %s' % x for x in lang if x]
lang[:] = [_unHtml(x.replace(' <i>', '::')) for x in lang]
if lang: d['languages'] = lang
col = _findBetween(cont, '"/List?color-info=', '</div>')
col = _findBetween(cont, '"/search/title?colors=', '</div>')
if col:
col[:] = col[0].split(' | ')
col[:] = ['<a %s' % x for x in col if x]
col[:] = [_unHtml(x.replace(' <i>', '::')) for x in col]
if col: d['color info'] = col
sm = _findBetween(cont, '/List?sound-mix=', '</div>', maxRes=1)
sm = _findBetween(cont, '/search/title?sound_mixes=', '</div>',
maxRes=1)
if sm:
sm[:] = sm[0].split(' | ')
sm[:] = ['<a %s' % x for x in sm if x]
......
......@@ -1435,7 +1435,7 @@ class IMDbSqlAccessSystem(IMDbBase):
return returnl
def get_character_main(self, characterID, results=1000):
# Every person information is retrieved from here.
# Every character information is retrieved from here.
infosets = self.get_character_infoset()
try:
c = CharName.get(characterID)
......
"""
parser.sql.objectadapter module (imdb.parser.sql package).
This module adpts the SQLObject ORM to the internal mechanism.
This module adapts the SQLObject ORM to the internal mechanism.
Copyright 2008-2009 Davide Alberani <da@erlug.linux.it>
......
......@@ -3,7 +3,7 @@ utils module (imdb package).