Commit ccbaffd4 authored by Ana Guerrero López's avatar Ana Guerrero López

Import Upstream version 3.9

parent 87612cc0
......@@ -9,6 +9,9 @@
recursive-include docs *
global-exclude *~
prune CVS
prune .svn
global-exclude CVS
global-exclude .svn
# Uncomment the following line if you don't want to install the logo images.
# exclude docs/*.png docs/*.xpm docs/*.bmp
......
This diff is collapsed.
......@@ -97,13 +97,13 @@ def doMPAA():
mpaaFileGZ = gzip.open(os.path.join(IMDB_PTDF_DIR,
'mpaa-ratings-reasons.list.gz'))
print 'Creating the mpaa-ratings-reasons.data file...',
mpaaFileOut = open(os.path.join(LOCAL_DATA_DIR,
'mpaa-ratings-reasons.data'), 'w')
for aLine in mpaaFileGZ:
print 'Creating the mpaa-ratings-reasons.data file...',
mpaaFileOut.write(aLine)
print 'DONE!'
print 'DONE!'
mpaaFileOut.close()
mpaaFileGZ.close()
......
......@@ -5,7 +5,8 @@ share the copyright over some portions of the code:
NAME: H. Turgut Uyar
EMAIL: <uyar --> tekir.org>
CONTRIBUTION: the whole new "http" data access system (using a DOM and
XPath-based approach) is based on his work.
XPath-based approach) is based on his work. The imdbpykit interface
is wholly copyrighted by him.
NAME: Giuseppe "Cowo" Corbelli
......@@ -30,3 +31,13 @@ EMAIL: <jesper --> noehr.org>
CONTRIBUTION: provided extensive testing and some patches for
the 'http' data access system.
NAME: Joachim Selke
EMAIL: <j.selke --> tu-bs.de>
CONTRIBUTION: many tests on IBM DB2 and work on the CSV support.
NAME: Timo Schulz
EMAIL: <gnuknight --> users.sourceforge.net>
CONTRIBUTION: promised me some craziness. :-)
......@@ -21,6 +21,17 @@ I'd like to thank the following people for their help:
* Alen Ribic for some bug reports and hints.
* Joachim Selke for some bug reports with SQLAlchemy and DB2 and a lot
of testing and debugging of the ibm_db driver (plus a lot of hints
about how to improve the imdbpy2sql.py script).
* Indy (indyx) for a bug about series cast parsing using BeautifulSoup.
* Yoav Aviram for a bug report about tv mini-series.
* Arjan Gijsberts for a bug report and patch for a problem with
movies listed in the Bottom 100.
* Helio MC Pereira for a bug report about unicode.
* Michael Charclo for some bug reports performing 'http' queries.
......
Changelog for IMDbPY
====================
* What's the new in release 3.9 "The Strangers" (06 Jan 2009)
[general]
- introduced the search_episode method, to search for episodes' titles.
- movie['year'] is now an integer, and no more a string.
- fixed a bug parsing company names.
- introduced the helpers.makeTextNotes function, useful to pretty-print
strings in the 'TEXT::NOTE' format.
[http]
- fixed a bug regarding movies listed in the Bottom 100.
- fixed bugs about tv mini-series.
- fixed a bug about 'series cast' using BeautifulSoup.
[sql]
- fixes for DB2 (with SQLAlchemy).
- improved support for movies' aka titles (for series).
- made imdbpy2sql.py more robust, catching exceptions even when huge
amounts of data are skipped due to errors.
- introduced CSV support in the imdbpy2sql.py script.
* What's the new in release 3.8 "Quattro Carogne a Malopasso" (03 Nov 2008)
[http]
- fixed search system for direct hits.
......
......@@ -3,7 +3,8 @@
Since version 2.0 (shame on me! I've noticed this only after more
than a year of development!!!) by default adult movies are included
in the result of the search_movie() and search_person() methods.
in the result of the search_movie(), search_episode() and search_person()
methods.
If for some unintelligible reason you don't want classics
like "Debbie Does Dallas" to show up in your list of results,
......@@ -64,7 +65,8 @@ systems (i.e.: you can set the 'adultSearch' argument and use
the 'do_adult_search' method).
Notice that for the local and the sql data access systems only
results from the search_movie() method are filtered: there's no
easy (and fast) way to tell that an actor/actress is a porn-star.
results from the search_movie() and search_episode() methods are
filtered: there's no easy (and fast) way to tell that an actor/actress
is a porn-star.
......@@ -162,6 +162,8 @@ inside the "parser" package; this new package must provide a subclass
of the imdb.IMDb class which must define at least the following methods:
_search_movie(title) - to search for a given title; must return a
list of (movieID, {movieData}) tuples.
_search_episode(title) - to search for a given episode title; must return a
list of (movieID, {movieData}) tuples.
_search_person(name) - to search for a given name; must return a
list of (movieID, {personData}) tuples.
_search_character(name) - to search for a given character's name; must
......
......@@ -113,8 +113,8 @@ the plot of a movie).
OTHER TIPS
==========
Remember that, calling the search_movie() and search_person()
methods of the "IMDb" object, you can provide a "results"
Remember that, calling the search_movie(), search_episode() and
search_person() methods of the "IMDb" object, you can provide a "results"
parameter, to download only a limited amount of results (20,
by default).
......
......@@ -87,10 +87,10 @@ Obviously, if no imdbpy.cfg file is found (or is not readable or it can't
be parsed), 'http' is still considered the default.
The imdb_access object has nine main methods: search_movie(title),
The imdb_access object has ten main methods: search_movie(title),
get_movie(movieID), search_person(name), get_person(personID),
search_character(name), get_character(characterID), search_company(name),
get_company(companyID) and update(MovieOrPersonObject)
get_company(companyID), search_episode() and update(MovieOrPersonObject)
Methods description:
......@@ -115,6 +115,11 @@ movie title and year, and with a "movieID" instance variable:
if, for example, you've put IMDb's data into a local database, the
movieID can be an index in a given table of the database, and so on.
search_episode(title) is identical to search_movie(), except that its
tailored to search for episodes' titles; best results are expected
searching for just the title of the episode, _without_ the title of
the TV series.
get_movie(movieID) will fetch the needed data and return a Movie object
for the movie referenced by the given movieID; the Movie class can be
found in the Movie module; a Movie object presents basically the same
......@@ -315,14 +320,15 @@ The "currentRole" attribute is always None.
Since release 1.2, it's possibile to retrieve almost every piece of
information about a given movie or person; this can be a problem, because
(at least for the 'http' data access system) it means that a lot of
web pages must be fetched and parsed, and this can be time consuming,
especially if you're interested only in a small set of information.
web pages must be fetched and parsed, and this can be time and
bandwidth consuming, especially if you're interested only in a small set
of information.
Now the get_person, get_movie, get_character, get_company and update
methods have an optional 'info' argument, which can be set to a list
of strings, each one representing an "information set".
Movie/Person/Character/Company objects have, respectively, their own list of
available "information sets".
Movie/Person/Character/Company objects have, respectively, their own list
of available "information sets".
E.g.: the Movie class have a set called 'taglines' for the taglines
of the movie, a set called 'vote details' for the number of votes for
rating [1-10], demographic breakdowns and top 250 rank; the Person
......@@ -354,6 +360,14 @@ and Character classes. Each object instance of Movie, Person or Character,
also have a current_info instance variable, to remember the information sets
already retrieved.
Beware that the information sets vary from an access system to another:
locally not every data is accessible, while - for example for sql -
accessing one set of data automatically means automatic access to a number
of other unrelated information (without major performace drawbacks).
You can get the list of available info set with the methods:
i.get_movie_infoset(), i.get_person_infoset(), i.get_character_infoset()
and i.get_company_infoset().
Person OBJECTS INSIDE A Movie CLASS AND Movie OBJECTS INSIDE A Person OBJECT
============================================================================
......@@ -461,6 +475,12 @@ It's easier to understand if you look at it; look at the output of:
m = i.get_movie('0094226')
print m['akas']
As a rule, there's as much as one '::' separator inside a string,
splitting it two logical pieces: "TEXT::NOTE".
In the helpers module there's the makeTextNotes function, that can
be used to create a custom function to pretty-print this kind of
information. See its documentation for more info.
MOVIE TITLES AND PERSON/CHARACTER NAMES REFERENCES
==================================================
......
......@@ -49,7 +49,7 @@ like "Series, The" (2004) {An Episode (#2.5)})
SERIES
======
You can retrieve information about seasons and episodes for a tv series:
You can retrieve information about seasons and episodes for a tv (mini) series:
from imdb import IMDb
i = IMDb()
......@@ -104,7 +104,7 @@ Summary of keys of the Movie object for a series:
FULL CREDITS
============
Retrieving credits for a tv series, you may notice that many long lists
Retrieving credits for a tv (mini) series, you may notice that many long lists
(like "cast", "writers", ...) are incomplete.
You can fetch the complete list of cast and crew with the "full credits"
data set; e.g.:
......@@ -138,7 +138,7 @@ keyword 'episodes' (and not 'episodes cast' or 'guests').
RATINGS
=======
You can retrieve rating information about every episode in a tv series
You can retrieve rating information about every episode in a tv (mini) series
using the 'episodes rating' data set.
......
......@@ -26,6 +26,11 @@ maybe other database backends were added.
Since release 3.8, SQLAlchemy (version 0.4 and 0.5) is also supported
(this adds at least DB2/Informix IDS to the list of supported databases).
Since release 3.9, there's a partial support to output large tables
in a set of CSV (Comma Separated Values) files, to be later imported
in a database. Actually only MySQL, PostgreSQL and IBM DB2 are
supported.
REQUIREMENTS
============
......@@ -321,3 +326,17 @@ To use transactions to speed-up SQLite, try:
Which is also the same thing the command line option '--sqlite-transactions'
does.
CSV files
=========
Keep in mind that actually only MySQL, PostgreSQL and IBM DB2 are
currently supported. Moreover, you may incur in problems (e.g.: your
postgres _server_ process must have reading access to the directory
you're storing the CSV files).
To create (and import) a set of CSV files, run imdbpy2sql.py with the
syntax:
./imdbpy2sql.py -d /dir/with/plainTextDataFiles/ -u URI -c /directory/where/to/store/CSVfiles
......@@ -99,6 +99,9 @@ for more information.
Since release 3.8, IMDbPY supports both SQLObject and SQLAlchemy; see
README.sqldb for more information.
Since release 3.9 support dumping the plain text data files in CSV files;
see README.sqldb for more information.
FEATURES
========
......
......@@ -51,9 +51,9 @@ E.g., you're writing on a terminal with iso-8859-1 charset (aka latin-1):
>>>
>>> results = ia.search_person(utf8_str)
If you pass a string to search_person() and search_movie() functions, IMDbPY
attempts to guess the encoding, using the sys.stdin.encoding or the value
returned from the sys.getdefaultencoding function.
If you pass a string to search_person(), search_movie() or search_episode()
functions, IMDbPY attempts to guess the encoding, using the sys.stdin.encoding
or the value returned from the sys.getdefaultencoding function.
Trust me: you want to provide an unicode string...
Maybe in a future release the IMDb() function can take a "defaultInputEncoding"
......
......@@ -18,14 +18,8 @@ NOTE: it's always time to clean the code! <g>
at least for 'http' and 'mobile', since they are used by mobile devices.
* The analyze_title/build_title functions are grown too complex and
beyond their initial goals.
* the 'year' keyword can probably be an int, instead of a string;
the '????' case can be handled directly by the analyze_title/build_title
functions. But how much code will be broken?
* for local and sql data access systems: some episode titles are
marked as {{SUSPENDED}}; they should probably be ignored.
* the text data can be store as instances of an hypothetical TextInfo
class, so that values and notes can be easily retrieved separately:
tinfo.txt and tinfo.notes (must provide __str__ and __unicode__).
[searches]
......@@ -37,7 +31,7 @@ NOTE: it's always time to clean the code! <g>
* Define invariable names for the sections (the keys you use to access
info stored in a Movie object).
* Should a movie object automatically build a Movie object, when
a 'episode of' dictionary is in the data?
an 'episode of' dictionary is in the data?
* Should isSameTitle() check first the accessSystem and the movieID,
and use 'title' only if movieID is None?
* For TV series the list of directors/writers returned by 'local'
......@@ -85,8 +79,6 @@ NOTE: it's always time to clean the code! <g>
or call the imdbObject.set_proxy(None) method.
* If the access through the proxy fails, is it possible to
automatically try without? It doesn't seem easy...
* Some (many?) HTML parser can be interrupted as long as they've
parsed every needed information, as HTMLSearchMovieParser does.
* Access to the "my IMDb" functions for registered users would
be really cool.
* Gather more movies' data: user comments, laserdisc details, trailers,
......@@ -119,6 +111,9 @@ NOTE: it's always time to clean the code! <g>
with 'Surname, Name' (qv).
Names and titles with "'" are not handled properly and so on...
This is a problem of the Plain Text Data Files, and can't be fixed.
* Keywords are messed up, since the list in plain text data file at
some point contains an out-of-order element, and this messes the
mkdb program.
* You need the mkdb executable from the moviedb program to generate the
database/index files; moviedb is not open source software (although
it can be downloaded and used without paying money).
......@@ -154,4 +149,7 @@ Things to do:
SQL database, but I think this is a nearly impossible task.
* There are a lot of things to do to improve SQLAlchemy support (especially
in terms of performances); see FIXME/TODO/XXX notices in the code.
* The pysqlite2.dbapi2.OperationalError exception is raise when SQLite
is used with SQLAlchemy (but only if the --sqlite-transactions command
line argument is used).
......@@ -6,7 +6,7 @@ a person from the IMDb database.
It can fetch data through different media (e.g.: the IMDb web pages,
a local installation, a SQL database, etc.)
Copyright 2004-2008 Davide Alberani <da@erlug.linux.it>
Copyright 2004-2009 Davide Alberani <da@erlug.linux.it>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
......@@ -25,7 +25,7 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
__all__ = ['IMDb', 'IMDbError', 'Movie', 'Person', 'Character', 'Company',
'available_access_systems']
__version__ = VERSION = '3.8'
__version__ = VERSION = '3.9'
# Import compatibility module (importing it is enough).
import _compat
......@@ -335,13 +335,15 @@ class IMDbBase:
self.update(movie, info)
return movie
get_episode = get_movie
def _search_movie(self, title, results):
"""Return a list of tuples (movieID, {movieData})"""
# XXX: for the real implementation, see the method of the
# subclass, somewhere under the imdb.parser package.
raise NotImplementedError, 'override this method'
def search_movie(self, title, results=None):
def search_movie(self, title, results=None, _episodes=False):
"""Return a list of Movie objects for a query for the given title.
The results argument is the maximum number of results to return."""
if results is None:
......@@ -354,11 +356,26 @@ class IMDbBase:
# an unicode string... this is just a guess.
if not isinstance(title, UnicodeType):
title = unicode(title, encoding, 'replace')
res = self._search_movie(title, results)
if not _episodes:
res = self._search_movie(title, results)
else:
res = self._search_episode(title, results)
return [Movie.Movie(movieID=self._get_real_movieID(mi),
data=md, modFunct=self._defModFunct,
accessSystem=self.accessSystem) for mi, md in res][:results]
def _search_episode(self, title, results):
"""Return a list of tuples (movieID, {movieData})"""
# XXX: for the real implementation, see the method of the
# subclass, somewhere under the imdb.parser package.
raise NotImplementedError, 'override this method'
def search_episode(self, title, results=None):
"""Return a list of Movie objects for a query for the given title.
The results argument is the maximum number of results to return;
this method searches only for titles of tv (mini) series' episodes."""
return self.search_movie(title, results=results, _episodes=True)
def get_person(self, personID, info=Person.Person.default_info,
modFunct=None):
"""Return a Person object for the given personID.
......
......@@ -64,11 +64,48 @@ re_subst = re.compile(r'%\((.+?)\)s')
# Regular expression for <if condition>....</if condition> clauses.
re_conditional = re.compile(r'<if\s+(.+?)\s*>(.+?)</if\s+\1\s*>')
def makeTextNotes(replaceTxtNotes):
"""Create a function useful to handle text[::optional_note] values.
replaceTxtNotes is a format string, which can include the following
values: %(text)s and %(notes)s.
Portions of the text can be conditionally excluded, if one of the
values is absent. E.g.: <if notes>[%(notes)s]</if notes> will be replaced
with '[notes]' if notes exists, or by an empty string otherwise.
The returned function is suitable be passed as applyToValues argument
of the makeObject2Txt function."""
def _replacer(s):
outS = replaceTxtNotes
if not isinstance(s, (unicode, str)):
return s
ssplit = s.split('::', 1)
text = ssplit[0]
# Used to keep track of text and note existence.
keysDict = {}
if text:
keysDict['text'] = True
outS = outS.replace('%(text)s', text)
if len(ssplit) == 2:
keysDict['notes'] = True
outS = outS.replace('%(notes)s', ssplit[1])
else:
outS = outS.replace('%(notes)s', u'')
def _excludeFalseConditionals(matchobj):
# Return an empty string if the conditional is false/empty.
if matchobj.group(1) in keysDict:
return matchobj.group(2)
return u''
while re_conditional.search(outS):
outS = re_conditional.sub(_excludeFalseConditionals, outS)
return outS
return _replacer
def makeObject2Txt(movieTxt=None, personTxt=None, characterTxt=None,
companyTxt=None, joiner=' / ',
applyToValues=lambda x: x, _recurse=True):
""""Return a function useful to pretty-print Movie, Person and
Character instances.
""""Return a function useful to pretty-print Movie, Person,
Character and Company instances.
*movieTxt* -- how to format a Movie object.
*personTxt* -- how to format a Person object.
......
......@@ -83,15 +83,15 @@ char *articles[ART_COUNT] = {"the ", "la ", "a ", "die ", "der ", "le ", "el ",
"l'", "il ", "das ", "les ", "i ", "o ", "ein ", "un ", "de ", "los ",
"an ", "una ", "las ", "eine ", "den ", "het ", "gli ", "lo ", "os ",
"ang ", "oi ", "az ", "een ", "ha-", "det ", "ta ", "al-",
"mga ", "un'", "uno ", "ett ", "dem ", "egy ", "els ", "eines ", " ",
" ", " ", " "};
"mga ", "un'", "uno ", "ett ", "dem ", "egy ", "els ", "eines ", " ",
" ", " ", " "};
char *articlesNoSP[ART_COUNT] = {"the", "la", "a", "die", "der", "le", "el",
"l'", "il", "das", "les", "i", "o", "ein", "un", "de", "los",
"an", "una", "las", "eine", "den", "het", "gli", "lo", "os",
"ang", "oi", "az", "een", "ha-", "det", "ta", "al-",
"mga", "un'", "uno", "ett", "dem", "egy", "els", "eines", "",
"", "", ""};
"mga", "un'", "uno", "ett", "dem", "egy", "els", "eines", "",
"", "", ""};
//*****************************************
......@@ -224,6 +224,7 @@ pyratcliff(PyObject *self, PyObject *pArgs)
/*========== titles and names searches ==========*/
/* Search for the 'name1', 'name2' and 'name3' name variations
* in the key file keyFileName, returning at most nrResults results.
* If _scan_character is True, we're handling characters' names.
*
* See also the documentation of the _search_person() method of the
* parser.sql python module, and the _nameVariations() method of the
......@@ -355,6 +356,7 @@ search_name(PyObject *self, PyObject *pArgs, PyObject *pKwds)
/* Search for the 'title1', title2' and 'title3' title variations
* in the key file keyFileName, returning at most nrResults results.
* If _only_episodes is True, only the titles of episodes are considered.
*
* See also the documentation of the _search_movie() method of the
* parser.sql python module, and the _titleVariations() method of the
......@@ -371,7 +373,7 @@ search_title(PyObject *self, PyObject *pArgs, PyObject *pKwds)
FILE *keyFile;
char line[MXLINELEN+1];
char origLine[MXLINELEN+1];
char *cp;
char *cp, *cp2;
char *key;
unsigned short hasArt = 0;
unsigned short matchHasArt = 0;
......@@ -381,12 +383,15 @@ search_title(PyObject *self, PyObject *pArgs, PyObject *pKwds)
unsigned short searchingEpisode = 0;
unsigned int count = 0;
char noArt[MXLINELEN+1] = "";
PyObject *onlyEpisodes = NULL;
unsigned short onlyEps = 0;
static char *argnames[] = {"keyFile", "title1", "title2", "title3",
"results", NULL};
"results", "_only_episodes", NULL};
PyObject *result = PyList_New(0);
if (!PyArg_ParseTupleAndKeywords(pArgs, pKwds, "ss|ssi",
argnames, &keyFileName, &title1, &title2, &title3, &nrResults))
if (!PyArg_ParseTupleAndKeywords(pArgs, pKwds, "ss|ssiO",
argnames, &keyFileName, &title1, &title2, &title3, &nrResults,
&onlyEpisodes))
return NULL;
if (strlen(title1) > MXLINELEN)
......@@ -412,6 +417,9 @@ search_title(PyObject *self, PyObject *pArgs, PyObject *pKwds)
return NULL;
}
if (onlyEpisodes != NULL && PyObject_IsTrue(onlyEpisodes))
onlyEps = 1;
linelen = strlen(title1);
for (count = 0; count < ART_COUNT; count++) {
artlen = strlen(articlesNoSP[count]);
......@@ -433,6 +441,35 @@ search_title(PyObject *self, PyObject *pArgs, PyObject *pKwds)
strcpy(origLine, line);
} else { continue; }
/* We're interested only in the title of the episode, without
* considering the title of the series. */
if (onlyEps) {
if (line[strlen(line)-1] != '}')
continue;
cp = strrchr(line, '{');
if (cp == NULL)
continue;
line[strlen(line)-1] = '\0';
if (line[strlen(line)-1] == ')') {
line[strlen(line)-1] = '\0';
cp2 = strrchr(cp+1, '(');
if (cp2 != NULL) {
*cp2 = '\0';
if ((cp2-1)[0] == ' ')
*(cp2-1) = '\0';
}
}
cp++;
if (cp[0] == '\0')
continue;
strtolower(cp);
ratio = ratcliff(title1, cp);
if (ratio >= RO_THRESHOLD)
PyList_Append(result, Py_BuildValue("(dis)",
ratio, strtol(key, NULL, 16), origLine));
continue;
}
/* We're searching a tv series episode, and this is not one. */
if (searchingEpisode) {
if (line[strlen(line)-1] != '}')
......@@ -547,7 +584,7 @@ search_company_name(PyObject *self, PyObject *pArgs, PyObject *pKwds)
}
if (line[strlen(line)-1] == ']')
withoutCountry = 0;
withoutCountry = 0;
while (fgets(line, MXLINELEN+1, keyFile) != NULL) {
/* Split a "origLine|key" line. */
......@@ -556,14 +593,14 @@ search_company_name(PyObject *self, PyObject *pArgs, PyObject *pKwds)
key = cp+1;
strcpy(origLine, line);
} else { continue; }
var = 0.0;
var = 0.0;
/* Strip the optional countryCode, if required. */
if (withoutCountry && (cp = strrchr(line, '[')) != NULL) {
*(cp-1) = '\0';
var = -0.05;
}
var = -0.05;
}
strtolower(line);
strtolower(line);
ratio = ratcliff(name1, line) + var;
......
......@@ -287,7 +287,7 @@ def scan_names(name_list, name1, name2, name3, results=0, ro_thresold=None,
def scan_titles(titles_list, title1, title2, title3, results=0,
searchingEpisode=0, ro_thresold=None):
searchingEpisode=0, onlyEpisodes=0, ro_thresold=None):
"""Scan a list of titles, searching for best matches against
the given variations."""
if ro_thresold is not None: RO_THRESHOLD = ro_thresold
......@@ -297,7 +297,6 @@ def scan_titles(titles_list, title1, title2, title3, results=0,
sm3 = SequenceMatcher()
sm1.set_seq1(title1.lower())
sm2.set_seq2(title2.lower())
#searchingEpisode = 0
if title3:
sm3.set_seq1(title3.lower())
if title3[-1] == '}': searchingEpisode = 1
......@@ -305,6 +304,20 @@ def scan_titles(titles_list, title1, title2, title3, results=0,
if title2 != title1: hasArt = 1
resd = {}
for i, t_data in titles_list:
if onlyEpisodes:
if t_data.get('kind') != 'episode':
continue
til = t_data['title']
if til[-1] == ')':
dateIdx = til.rfind('(')
if dateIdx != -1:
til = til[:dateIdx].rstrip()
if not til:
continue
ratio = ratcliff(title1, til, sm1)
if ratio >= RO_THRESHOLD:
resd[i] = (ratio, (i, t_data))
continue
if searchingEpisode:
if t_data.get('kind') != 'episode': continue
elif t_data.get('kind') == 'episode': continue
......
......@@ -32,6 +32,7 @@ from codecs import lookup
from imdb import IMDbBase, imdbURL_movie_main, imdbURL_person_main, \
imdbURL_character_main, imdbURL_company_main, imdbURL_find
from imdb.utils import analyze_title
from imdb._exceptions import IMDbDataAccessError, IMDbParserError
import searchMovieParser
......@@ -393,6 +394,8 @@ class IMDbHTTPAccessSystem(IMDbBase):
ton = ton.encode('utf-8')
##params = 'q=%s&%s=on&mx=%s' % (quote_plus(ton), kind, str(results))
params = 's=%s;mx=%s;q=%s' % (kind, str(results), quote_plus(ton))
if kind == 'ep':
params = params.replace('s=ep;', 's=tt;ttype=ep;', 1)
cont = self._retrieve(imdbURL_find % params)
#print 'URL:', imdbURL_find % params
if cont.find('more than 500 partial matches') == -1:
......@@ -413,6 +416,13 @@ class IMDbHTTPAccessSystem(IMDbBase):
cont = self._get_search_content('tt', title, results)
return self.smProxy.search_movie_parser.parse(cont, results=results)['data']
def _search_episode(self, title, results):
t_dict = analyze_title(title)
if t_dict['kind'] == 'episode':
title = t_dict['title']
cont = self._get_search_content('ep', title, results)
return self.smProxy.search_movie_parser.parse(cont, results=results)['data']
def get_movie_main(self, movieID):
if not self.isThin:
cont = self._retrieve(imdbURL_movie_main % movieID + 'combined')
......
......@@ -4,6 +4,7 @@ parser.http.bsoupadapter module (imdb.parser.http package).
This module adapts the beautifulsoup xpath support to the internal mechanism.
Copyright 2008 H. Turgut Uyar <uyar@tekir.org>
2008 Davide Alberani <da@erlug.linux.it>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
......@@ -40,6 +41,21 @@ def tostring(element):
return str(element)
def appendchild(parent, tagname, attrs=None, text=None, dom=None):
"""Append a child element to an existing element."""
if dom is None:
raise ValueError("A soup instance must be supplied")
child = BeautifulSoup.Tag(dom, tagname)
if attrs is not None:
for key in attrs:
setattribute(child, key, attrs[key])
if text is not None:
textnode = BeautifulSoup.NavigableString(text)
child.append(textnode)
parent.append(child)
return child
def getattribute(node, attrName):
"""Return an attribute value or None."""
return node.get(attrName)
......@@ -58,6 +74,13 @@ def getparent(node):
return node.parent
def droptree(node):
"""Remove a node and all its children."""
# XXX: catch the raised exception, if the node is already gone?
# i.e.: removing <p> contained in an already removed <p>.
node.extract()
def clone(node):
"""Return a clone of the given node."""
# XXX: test with deepcopy? Check if there are problems with
......
......@@ -254,7 +254,10 @@ class PathStep:
elif self.axis == AXIS_DESCENDANT:
found = node.findAll(recursive=True, **self.soup_args)
elif self.axis == AXIS_ATTRIBUTE:
found = [node[self.node_test]]
try:
found = [node[self.node_test]]
except KeyError:
found = []
elif self.axis == AXIS_FOLLOWING_SIBLING:
found = node.findNextSiblings(**self.soup_args)
elif self.axis == AXIS_PRECEDING_SIBLING:
......
......@@ -4,6 +4,7 @@ parser.http.lxmladapter module (imdb.parser.http package).
This module adapts the lxml xpath support to the internal mechanism.
Copyright 2008 H. Turgut Uyar <uyar@tekir.org>
2008 Davide Alberani <da@erlug.linux.it>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
......@@ -20,6 +21,7 @@ along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
"""
from lxml import etree
from lxml import html
......@@ -37,6 +39,18 @@ def tostring(element):
return html.tostring(element, encoding=unicode)
def appendchild(parent, tagname, attrs=None, text=None, dom=None):
"""Append a child element to an existing element."""
child = etree.Element(tagname)
if attrs is not None:
for key in attrs:
setattribute(child, key, attrs[key])
if text is not None:
child.text = text
parent.append(child)
return child