...
 
Commits (2)
#### Issue description
*write the description here*
#### Version of IMDbPY, Python and OS
- **Python:** `python3 -V` or, if you are using Python 2, `python -V`
- **IMDbPY:** `python3 -c 'import imdb ; print(imdb.VERSION)'` or, if you are using Python 2, `python -c 'import imdb ; print(imdb.VERSION)'`
- **OS:** `python -c 'import platform ; print(platform.uname())'`
#### Steps to reproduce the issue
*if possible, provide a minimal code to reproduce the problem*
```
#!python
# your code here
```
#### What's the expected result?
-
#### What's the actual result?
-
#### Additional details
-
*.pyc
*.pyo
*.egg-info
*.mo
*.so
.pytest_cache/
build/
_build/
__pycache__
dist/
.idea
.vscode
.cache
.tox
.coverage
prof
syntax: glob
.cache
.tox
__pycache__
build
dist
.cache
.tox
__pycache__
*.egg-info
*.mo
*.pyc
*.pyo
*.so
*.pyd
*~
*.swp
setuptools-*.egg
c3dba80881f0a810b3bf93051a56190b297e7a50 4.6
c8b07121469a2173a587b1a34beb4f1fecd640b6 4.7
ba221c9050599463b4b78c89a8bdada7d7aef173 4.8
e807ba790392d406018af0f98d5dad5117721a4d 4.8.1
b02c61369b27e0d5af0a755a8a2fc3355c08bb67 4.8.2
7f39f8ac4838b45fbf59f4167796dd17cd15c437 4.9
398c01b961076362958c27584d85fbdfa921ac63 5.0
cb1e19b508d03499e8f34bd066d8b930aca6aa2d 5.1
language: python
python:
- "2.7"
- "3.4"
- "3.5"
- "3.6"
install:
- python setup.py install
script:
- py.test
notifications:
email:
on_success: never
on_failure: always
Metadata-Version: 1.1
Name: IMDbPY
Version: 5.1
Summary: Python package to access the IMDb's database
Home-page: http://imdbpy.sf.net/
Author: Davide Alberani
Author-email: da@erlug.linux.it
License: GPL
Download-URL: http://imdbpy.sf.net/?page=download
Description: IMDbPY is a Python package useful to retrieve and
manage the data of the IMDb movie database about movies, people,
characters and companies.
Platform-independent and written in pure Python (and few C lines),
it can retrieve data from both the IMDb's web server and a local copy
of the whole database.
IMDbPY package can be very easily used by programmers and developers
to provide access to the IMDb's data to their programs.
Some simple example scripts - useful for the end users - are included
in this package; other IMDbPY-based programs are available at the
home page: http://imdbpy.sf.net/
Keywords: imdb,movie,people,database,cinema,film,person,cast,actor,actress,director,sql,character,company,package,plain text data files,keywords,top250,bottom100,xml
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Classifier: Environment :: Handhelds/PDA's
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Natural Language :: English
Classifier: Natural Language :: Italian
Classifier: Natural Language :: Turkish
Classifier: Programming Language :: Python
Classifier: Programming Language :: C
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database :: Front-Ends
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content :: CGI Tools/Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
MANIFEST.in
ez_setup.py
setup.cfg
setup.py
./bin/get_character.py
./bin/get_company.py
./bin/get_first_character.py
./bin/get_first_company.py
./bin/get_first_movie.py
./bin/get_first_person.py
./bin/get_keyword.py
./bin/get_movie.py
./bin/get_person.py
./bin/get_top_bottom_movies.py
./bin/imdbpy2sql.py
./bin/search_character.py
./bin/search_company.py
./bin/search_keyword.py
./bin/search_movie.py
./bin/search_person.py
IMDbPY.egg-info/PKG-INFO
IMDbPY.egg-info/SOURCES.txt
IMDbPY.egg-info/dependency_links.txt
IMDbPY.egg-info/requires.txt
IMDbPY.egg-info/top_level.txt
docs/AUTHOR.txt
docs/CONTRIBUTORS.txt
docs/CREDITS.txt
docs/Changelog.txt
docs/DISCLAIMER.txt
docs/FAQS.txt
docs/GPL.txt
docs/INSTALL.txt
docs/LICENSE.txt
docs/README.adult
docs/README.companies
docs/README.currentRole
docs/README.devel
docs/README.http
docs/README.info2xml
docs/README.keywords
docs/README.local
docs/README.locale
docs/README.logging
docs/README.mobile
docs/README.newparsers
docs/README.package
docs/README.redesign
docs/README.series
docs/README.sqldb
docs/README.txt
docs/README.unicode
docs/README.users
docs/TODO.txt
docs/imdbpy.cfg
docs/imdbpy48.dtd
docs/imdbpyPowered.png
docs/imdbpy_new_logo.png
docs/imdbpyico.png
docs/imdbpyico.xpm
docs/imdbpyico16x16.ico
docs/imdbpyico32x32.ico
docs/imdbpywin.bmp
docs/goodies/README.txt
docs/goodies/applydiffs.sh
docs/goodies/download_applydiffs.py
docs/goodies/reduce.sh
imdb/Character.py
imdb/Company.py
imdb/Movie.py
imdb/Person.py
imdb/__init__.py
imdb/_compat.py
imdb/_exceptions.py
imdb/_logging.py
imdb/helpers.py
imdb/linguistics.py
imdb/utils.py
imdb/locale/__init__.py
imdb/locale/__init__.pyc
imdb/locale/generatepot.py
imdb/locale/imdbpy-ar.po
imdb/locale/imdbpy-bg.po
imdb/locale/imdbpy-de.po
imdb/locale/imdbpy-en.po
imdb/locale/imdbpy-es.po
imdb/locale/imdbpy-fr.po
imdb/locale/imdbpy-it.po
imdb/locale/imdbpy-pt_BR.po
imdb/locale/imdbpy-tr.po
imdb/locale/imdbpy.pot
imdb/locale/msgfmt.py
imdb/locale/msgfmt.pyc
imdb/locale/rebuildmo.py
imdb/locale/rebuildmo.pyc
imdb/locale/ar/LC_MESSAGES/imdbpy.mo
imdb/locale/bg/LC_MESSAGES/imdbpy.mo
imdb/locale/de/LC_MESSAGES/imdbpy.mo
imdb/locale/en/LC_MESSAGES/imdbpy.mo
imdb/locale/es/LC_MESSAGES/imdbpy.mo
imdb/locale/fr/LC_MESSAGES/imdbpy.mo
imdb/locale/it/LC_MESSAGES/imdbpy.mo
imdb/locale/pt_BR/LC_MESSAGES/imdbpy.mo
imdb/locale/tr/LC_MESSAGES/imdbpy.mo
imdb/parser/__init__.py
imdb/parser/http/__init__.py
imdb/parser/http/characterParser.py
imdb/parser/http/companyParser.py
imdb/parser/http/movieParser.py
imdb/parser/http/personParser.py
imdb/parser/http/searchCharacterParser.py
imdb/parser/http/searchCompanyParser.py
imdb/parser/http/searchKeywordParser.py
imdb/parser/http/searchMovieParser.py
imdb/parser/http/searchPersonParser.py
imdb/parser/http/topBottomParser.py
imdb/parser/http/utils.py
imdb/parser/http/bsouplxml/__init__.py
imdb/parser/http/bsouplxml/_bsoup.py
imdb/parser/http/bsouplxml/bsoupxpath.py
imdb/parser/http/bsouplxml/etree.py
imdb/parser/http/bsouplxml/html.py
imdb/parser/mobile/__init__.py
imdb/parser/sql/__init__.py
imdb/parser/sql/alchemyadapter.py
imdb/parser/sql/cutils.c
imdb/parser/sql/dbschema.py
imdb/parser/sql/objectadapter.py
\ No newline at end of file
SQLObject
FormEncode
SQLAlchemy
sqlalchemy-migrate
lxml
This diff is collapsed.
......@@ -4,21 +4,7 @@
# Manifest template for creating the Distutils source distribution.
#
# Comment out the "recursive-include docs" entry if you don't want
# to install the documentation.
recursive-include docs *
recursive-include imdb/locale *
global-exclude *~
prune CVS
prune .svn
prune .hg
global-exclude CVS
global-exclude .svn
# Try to force the inclusion of ez_setup.py.
include ez_setup.py
# Uncomment the following line if you don't want to install the logo images.
# exclude docs/*.png docs/*.xpm docs/*.bmp
global-exclude __pycache__
.PHONY: help clean clean-build clean-pyc clean-docs lint test test-all coverage docs dist
help:
@echo "clean - clean everything"
@echo "clean-build - remove build artifacts"
@echo "clean-pyc - remove Python file artifacts"
@echo "clean-docs - remove Sphinx documentation artifacts"
@echo "lint - check style with flake8"
@echo "test - run tests quickly with the default Python"
@echo "test-all - run tests on every Python version with tox"
@echo "coverage - check code coverage quickly with the default Python"
@echo "docs - generate Sphinx HTML documentation, including API docs"
@echo "dist - package"
clean: clean-build clean-pyc clean-docs
clean-build:
rm -fr build/
rm -fr dist/
rm -fr *.egg-info
clean-pyc:
find . -name '*.pyc' -exec rm -f {} +
find . -name '*.pyo' -exec rm -f {} +
find . -name '*~' -exec rm -f {} +
clean-docs:
make -C docs clean
lint:
python setup.py flake8
test:
pytest
test-all:
tox
coverage:
pytest --cov-report term-missing --cov=imdb tests
docs:
$(MAKE) -C docs clean
$(MAKE) -C docs html
dist: clean
python setup.py check -r -s
python setup.py sdist
python setup.py bdist_wheel
Metadata-Version: 1.1
Name: IMDbPY
Version: 5.1
Summary: Python package to access the IMDb's database
Home-page: http://imdbpy.sf.net/
Author: Davide Alberani
Author-email: da@erlug.linux.it
License: GPL
Download-URL: http://imdbpy.sf.net/?page=download
Description: IMDbPY is a Python package useful to retrieve and
manage the data of the IMDb movie database about movies, people,
characters and companies.
Platform-independent and written in pure Python (and few C lines),
it can retrieve data from both the IMDb's web server and a local copy
of the whole database.
IMDbPY package can be very easily used by programmers and developers
to provide access to the IMDb's data to their programs.
Some simple example scripts - useful for the end users - are included
in this package; other IMDbPY-based programs are available at the
home page: http://imdbpy.sf.net/
Keywords: imdb,movie,people,database,cinema,film,person,cast,actor,actress,director,sql,character,company,package,plain text data files,keywords,top250,bottom100,xml
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Classifier: Environment :: Handhelds/PDA's
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: Natural Language :: English
Classifier: Natural Language :: Italian
Classifier: Natural Language :: Turkish
Classifier: Programming Language :: Python
Classifier: Programming Language :: C
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database :: Front-Ends
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content :: CGI Tools/Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
.. image:: https://travis-ci.org/alberanid/imdbpy.svg?branch=master
:target: https://travis-ci.org/alberanid/imdbpy
**IMDbPY** is a Python package for retrieving and managing the data
of the `IMDb`_ movie database about movies, people and companies.
:Homepage: https://imdbpy.sourceforge.io/
:PyPI: https://pypi.org/project/IMDbPY/
:Repository: https://github.com/alberanid/imdbpy
:Documentation: https://imdbpy.readthedocs.io/
:Support: https://imdbpy.sourceforge.io/support.html
.. admonition:: Revamp notice
:class: note
Starting on November 2017, many things were improved and simplified:
- moved the package to Python 3 (compatible with Python 2.7)
- removed dependencies: SQLObject, C compiler, BeautifulSoup
- removed the "mobile" and "httpThin" parsers
- introduced a test suite (`please help with it!`_)
Main features
-------------
- written in Python 3 (compatible with Python 2.7)
- platform-independent
- can retrieve data from both the IMDb's web server, or a local copy
of the database
- simple and complete API
- released under the terms of the GPL 2 license
IMDbPY powers many other software and has been used in various research papers.
`Curious about that`_?
Installation
------------
Whenever possible, please use the latest version from the repository::
pip install git+https://github.com/alberanid/imdbpy
But if you want, you can also install the latest release from PyPI::
pip install imdbpy
Example
-------
Here's an example that demonstrates how to use IMDbPY:
.. code-block:: python
from imdb import IMDb
# create an instance of the IMDb class
ia = IMDb()
# get a movie
movie = ia.get_movie('0133093')
# print the names of the directors of the movie
print('Directors:')
for director in movie['directors']:
print(director['name'])
# print the genres of the movie
print('Genres:')
for genre in movie['genres']:
print(genre)
# search for a person name
people = ia.search_person('Mel Gibson')
for person in people:
print(person.personID, person['name'])
.. _IMDb: https://www.imdb.com/
.. _please help with it!: http://imdbpy.readthedocs.io/en/latest/devel/test.html
.. _Curious about that: https://imdbpy.sourceforge.io/ecosystem.html
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_character.py
Usage: get_character "characterID"
Usage: get_character "character_id"
Show some info about the character with the given characterID (e.g. '0000001'
Show some info about the character with the given character_id (e.g. '0000001'
for "Jesse James", using 'http' or 'mobile').
Notice that characterID, using 'sql', are not the same IDs used on the web.
Notice that character_id, using 'sql', are not the same IDs used on the web.
"""
import sys
......@@ -15,33 +16,32 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "characterID"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "character_id"' % sys.argv[0])
sys.exit(2)
characterID = sys.argv[1]
character_id = sys.argv[1]
i = imdb.IMDb()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
try:
# Get a character object with the data about the character identified by
# the given characterID.
character = i.get_character(characterID)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
# the given character_id.
character = i.get_character(character_id)
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
if not character:
print 'It seems that there\'s no character with characterID "%s"' % characterID
print(("It seems that there's no character"
' with character_id "%s"' % character_id))
sys.exit(4)
# XXX: this is the easier way to print the main info about a character;
......@@ -51,6 +51,4 @@ if not character:
# to access the data stored in a character object, so look below; the
# commented lines show some ways to retrieve information from a
# character object.
print character.summary().encode(out_encoding, 'replace')
print(character.summary())
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_company.py
Usage: get_company "companyID"
Usage: get_company "company_id"
Show some info about the company with the given companyID (e.g. '0071509'
Show some info about the company with the given company_id (e.g. '0071509'
for "Columbia Pictures [us]", using 'http' or 'mobile').
Notice that companyID, using 'sql', are not the same IDs used on the web.
Notice that company_id, using 'sql', are not the same IDs used on the web.
"""
import sys
......@@ -15,33 +16,31 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "companyID"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "company_id"' % sys.argv[0])
sys.exit(2)
companyID = sys.argv[1]
company_id = sys.argv[1]
i = imdb.IMDb()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
try:
# Get a company object with the data about the company identified by
# the given companyID.
company = i.get_company(companyID)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
# the given company_id.
company = i.get_company(company_id)
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
if not company:
print 'It seems that there\'s no company with companyID "%s"' % companyID
print('It seems that there\'s no company with company_id "%s"' % company_id)
sys.exit(4)
# XXX: this is the easier way to print the main info about a company;
......@@ -51,6 +50,4 @@ if not company:
# to access the data stored in a company object, so look below; the
# commented lines show some ways to retrieve information from a
# company object.
print company.summary().encode(out_encoding, 'replace')
print(company.summary())
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_first_character.py
......@@ -13,13 +14,13 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "character name"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "character name"' % sys.argv[0])
sys.exit(2)
name = sys.argv[1]
......@@ -27,24 +28,20 @@ name = sys.argv[1]
i = imdb.IMDb()
in_encoding = sys.stdin.encoding or sys.getdefaultencoding()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
name = unicode(name, in_encoding, 'replace')
try:
# Do the search, and get the results (a list of character objects).
results = i.search_character(name)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
if not results:
print 'No matches for "%s", sorry.' % name.encode(out_encoding, 'replace')
print('No matches for "%s", sorry.' % name)
sys.exit(0)
# Print only the first result.
print ' Best match for "%s"' % name.encode(out_encoding, 'replace')
print(' Best match for "%s"' % name)
# This is a character instance.
character = results[0]
......@@ -53,7 +50,4 @@ character = results[0]
# name; retrieve main information:
i.update(character)
print character.summary().encode(out_encoding, 'replace')
print(character.summary())
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_first_company.py
......@@ -13,13 +14,13 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "company name"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "company name"' % sys.argv[0])
sys.exit(2)
name = sys.argv[1]
......@@ -27,24 +28,20 @@ name = sys.argv[1]
i = imdb.IMDb()
in_encoding = sys.stdin.encoding or sys.getdefaultencoding()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
name = unicode(name, in_encoding, 'replace')
try:
# Do the search, and get the results (a list of company objects).
results = i.search_company(name)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
if not results:
print 'No matches for "%s", sorry.' % name.encode(out_encoding, 'replace')
print('No matches for "%s", sorry.' % name)
sys.exit(0)
# Print only the first result.
print ' Best match for "%s"' % name.encode(out_encoding, 'replace')
print(' Best match for "%s"' % name)
# This is a company instance.
company = results[0]
......@@ -53,7 +50,4 @@ company = results[0]
# name; retrieve main information:
i.update(company)
print company.summary().encode(out_encoding, 'replace')
print(company.summary())
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_first_movie.py
......@@ -13,13 +14,13 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "movie title"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "movie title"' % sys.argv[0])
sys.exit(2)
title = sys.argv[1]
......@@ -27,24 +28,20 @@ title = sys.argv[1]
i = imdb.IMDb()
in_encoding = sys.stdin.encoding or sys.getdefaultencoding()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
title = unicode(title, in_encoding, 'replace')
try:
# Do the search, and get the results (a list of Movie objects).
results = i.search_movie(title)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
if not results:
print 'No matches for "%s", sorry.' % title.encode(out_encoding, 'replace')
print('No matches for "%s", sorry.' % title)
sys.exit(0)
# Print only the first result.
print ' Best match for "%s"' % title.encode(out_encoding, 'replace')
print(' Best match for "%s"' % title)
# This is a Movie instance.
movie = results[0]
......@@ -53,7 +50,4 @@ movie = results[0]
# title and the year; retrieve main information:
i.update(movie)
print movie.summary().encode(out_encoding, 'replace')
print(movie.summary())
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_first_person.py
......@@ -13,13 +14,13 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "person name"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "person name"' % sys.argv[0])
sys.exit(2)
name = sys.argv[1]
......@@ -27,24 +28,20 @@ name = sys.argv[1]
i = imdb.IMDb()
in_encoding = sys.stdin.encoding or sys.getdefaultencoding()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
name = unicode(name, in_encoding, 'replace')
try:
# Do the search, and get the results (a list of Person objects).
results = i.search_person(name)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
if not results:
print 'No matches for "%s", sorry.' % name.encode(out_encoding, 'replace')
print('No matches for "%s", sorry.' % name)
sys.exit(0)
# Print only the first result.
print ' Best match for "%s"' % name.encode(out_encoding, 'replace')
print(' Best match for "%s"' % name)
# This is a Person instance.
person = results[0]
......@@ -53,7 +50,4 @@ person = results[0]
# name; retrieve main information:
i.update(person)
print person.summary().encode(out_encoding, 'replace')
print(person.summary())
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_keyword.py
......@@ -13,13 +14,13 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "keyword"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "keyword"' % sys.argv[0])
sys.exit(2)
name = sys.argv[1]
......@@ -27,27 +28,21 @@ name = sys.argv[1]
i = imdb.IMDb()
in_encoding = sys.stdin.encoding or sys.getdefaultencoding()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
name = unicode(name, in_encoding, 'replace')
try:
# Do the search, and get the results (a list of movies).
results = i.get_keyword(name, results=20)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
# Print the results.
print ' %s result%s for "%s":' % (len(results),
('', 's')[len(results) != 1],
name.encode(out_encoding, 'replace'))
print ' : movie title'
print(' %s result%s for "%s":' % (len(results),
('', 's')[len(results) != 1],
name))
print(' : movie title')
# Print the long imdb title for every movie.
for idx, movie in enumerate(results):
outp = u'%d: %s' % (idx+1, movie['long imdb title'])
print outp.encode(out_encoding, 'replace')
outp = '%d: %s' % (idx+1, movie['long imdb title'])
print(outp)
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_movie.py
Usage: get_movie "movieID"
Usage: get_movie "movie_id"
Show some info about the movie with the given movieID (e.g. '0133093'
Show some info about the movie with the given movie_id (e.g. '0133093'
for "The Matrix", using 'http' or 'mobile').
Notice that movieID, using 'sql', are not the same IDs used on the web.
Notice that movie_id, using 'sql', are not the same IDs used on the web.
"""
import sys
......@@ -15,33 +16,31 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "movieID"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "movie_id"' % sys.argv[0])
sys.exit(2)
movieID = sys.argv[1]
movie_id = sys.argv[1]
i = imdb.IMDb()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
try:
# Get a Movie object with the data about the movie identified by
# the given movieID.
movie = i.get_movie(movieID)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
# the given movie_id.
movie = i.get_movie(movie_id)
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
if not movie:
print 'It seems that there\'s no movie with movieID "%s"' % movieID
print('It seems that there\'s no movie with movie_id "%s"' % movie_id)
sys.exit(4)
# XXX: this is the easier way to print the main info about a movie;
......@@ -51,21 +50,22 @@ if not movie:
# to access the data stored in a Movie object, so look below; the
# commented lines show some ways to retrieve information from a
# Movie object.
print movie.summary().encode(out_encoding, 'replace')
print(movie.summary())
# Show some info about the movie.
# This is only a short example; you can get a longer summary using
# 'print movie.summary()' and the complete set of information looking for
# the output of the movie.keys() method.
#print '==== "%s" / movieID: %s ====' % (movie['title'], movieID)
#
# print '==== "%s" / movie_id: %s ====' % (movie['title'], movie_id)
# XXX: use the IMDb instance to get the IMDb web URL for the movie.
#imdbURL = i.get_imdbURL(movie)
#if imdbURL:
# imdbURL = i.get_imdbURL(movie)
# if imdbURL:
# print 'IMDb URL: %s' % imdbURL
#
# XXX: many keys return a list of values, like "genres".
#genres = movie.get('genres')
#if genres:
# genres = movie.get('genres')
# if genres:
# print 'Genres: %s' % ' '.join(genres)
#
# XXX: even when only one value is present (e.g.: movie with only one
......@@ -73,8 +73,8 @@ print movie.summary().encode(out_encoding, 'replace')
# Note that the 'name' variable is a Person object, but since its
# __str__() method returns a string with the name, we can use it
# directly, instead of name['name']
#director = movie.get('director')
#if director:
# director = movie.get('director')
# if director:
# print 'Director(s): ',
# for name in director:
# sys.stdout.write('%s ' % name)
......@@ -82,25 +82,23 @@ print movie.summary().encode(out_encoding, 'replace')
#
# XXX: notice that every name in the cast is a Person object, with a
# currentRole instance variable, which is a string for the played role.
#cast = movie.get('cast')
#if cast:
# cast = movie.get('cast')
# if cast:
# print 'Cast: '
# cast = cast[:5]
# for name in cast:
# print ' %s (%s)' % (name['name'], name.currentRole)
# XXX: some information are not lists of strings or Person objects, but simple
# strings, like 'rating'.
#rating = movie.get('rating')
#if rating:
# rating = movie.get('rating')
# if rating:
# print 'Rating: %s' % rating
# XXX: an example of how to use information sets; retrieve the "trivia"
# info set; check if it contains some data, select and print a
# random entry.
#import random
#i.update(movie, info=['trivia'])
#trivia = movie.get('trivia')
#if trivia:
# import random
# i.update(movie, info=['trivia'])
# trivia = movie.get('trivia')
# if trivia:
# rand_trivia = trivia[random.randrange(len(trivia))]
# print 'Random trivia: %s' % rand_trivia
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_person.py
Usage: get_person "personID"
Usage: get_person "person_id"
Show some info about the person with the given personID (e.g. '0000210'
Show some info about the person with the given person_id (e.g. '0000210'
for "Julia Roberts".
Notice that personID, using 'sql', are not the same IDs used on the web.
Notice that person_id, using 'sql', are not the same IDs used on the web.
"""
import sys
......@@ -15,33 +16,31 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "personID"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "person_id"' % sys.argv[0])
sys.exit(2)
personID = sys.argv[1]
person_id = sys.argv[1]
i = imdb.IMDb()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
try:
# Get a Person object with the data about the person identified by
# the given personID.
person = i.get_person(personID)
except imdb.IMDbError, e:
print "Probably you're not connected to Internet. Complete error report:"
print e
# the given person_id.
person = i.get_person(person_id)
except imdb.IMDbError as e:
print("Probably you're not connected to Internet. Complete error report:")
print(e)
sys.exit(3)
if not person:
print 'It seems that there\'s no person with personID "%s"' % personID
print('It seems that there\'s no person with person_id "%s"' % person_id)
sys.exit(4)
# XXX: this is the easier way to print the main info about a person;
......@@ -51,40 +50,38 @@ if not person:
# to access the data stored in a Person object, so look below; the
# commented lines show some ways to retrieve information from a
# Person object.
print person.summary().encode(out_encoding, 'replace')
print(person.summary())
# Show some info about the person.
# This is only a short example; you can get a longer summary using
# 'print person.summary()' and the complete set of information looking for
# the output of the person.keys() method.
#print '==== "%s" / personID: %s ====' % (person['name'], personID)
# print '==== "%s" / person_id: %s ====' % (person['name'], person_id)
# XXX: use the IMDb instance to get the IMDb web URL for the person.
#imdbURL = i.get_imdbURL(person)
#if imdbURL:
# imdbURL = i.get_imdbURL(person)
# if imdbURL:
# print 'IMDb URL: %s' % imdbURL
# XXX: print the birth date and birth notes.
#d_date = person.get('birth date')
#if d_date:
# d_date = person.get('birth date')
# if d_date:
# print 'Birth date: %s' % d_date
# b_notes = person.get('birth notes')
# if b_notes:
# print 'Birth notes: %s' % b_notes
# XXX: print the last five movies he/she acted in, and the played role.
#movies_acted = person.get('actor') or person.get('actress')
#if movies_acted:
# movies_acted = person.get('actor') or person.get('actress')
# if movies_acted:
# print 'Last roles played: '
# for movie in movies_acted[:5]:
# print ' %s (in "%s")' % (movie.currentRole, movie['title'])
# XXX: example of the use of information sets.
#import random
#i.update(person, info=['awards'])
#awards = person.get('awards')
#if awards:
# import random
# i.update(person, info=['awards'])
# awards = person.get('awards')
# if awards:
# rand_award = awards[random.randrange(len(awards))]
# s = 'Random award: in year '
# s += rand_award.get('year', '')
# s += ' %s "%s"' % (rand_award.get('result', '').lower(),
# rand_award.get('award', ''))
# print s
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
get_top_bottom_movies.py
......@@ -13,12 +14,12 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 1:
print 'No arguments are required.'
print('No arguments are required.')
sys.exit(2)
i = imdb.IMDb()
......@@ -26,14 +27,11 @@ i = imdb.IMDb()
top250 = i.get_top250_movies()
bottom100 = i.get_bottom100_movies()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()
for label, ml in [('top 10', top250[:10]), ('bottom 10', bottom100[:10])]:
print ''
print '%s movies' % label
print 'rating\tvotes\ttitle'
print('')
print('%s movies' % label)
print('rating\tvotes\ttitle')
for movie in ml:
outl = u'%s\t%s\t%s' % (movie.get('rating'), movie.get('votes'),
movie['long imdb title'])
print outl.encode(out_encoding, 'replace')
outl = '%s\t%s\t%s' % (movie.get('rating'), movie.get('votes'),
movie['long imdb title'])
print(outl)
This source diff could not be displayed because it is too large. You can view the blob instead.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
s32imdbpy.py script.
This script imports the s3 dataset distributed by IMDb into a SQL database.
Copyright 2017-2018 Davide Alberani <da@erlug.linux.it>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
"""
import os
import glob
import gzip
import logging
import argparse
import sqlalchemy
from imdb.parser.s3.utils import DB_TRANSFORM, title_soundex, name_soundexes
TSV_EXT = '.tsv.gz'
# how many entries to write to the database at a time.
BLOCK_SIZE = 10000
logger = logging.getLogger()
logger.setLevel(logging.INFO)
metadata = sqlalchemy.MetaData()
def generate_content(fd, headers, table):
"""Generate blocks of rows to be written to the database.
:param fd: a file descriptor for the .tsv.gz file
:type fd: :class:`_io.TextIOWrapper`
:param headers: headers in the file
:type headers: list
:param table: the table that will populated
:type table: :class:`sqlalchemy.Table`
:returns: block of data to insert
:rtype: list
"""
data = []
headers_len = len(headers)
data_transf = {}
table_name = table.name
for column, conf in DB_TRANSFORM.get(table_name, {}).items():
if 'transform' in conf:
data_transf[column] = conf['transform']
for line in fd:
s_line = line.decode('utf-8').strip().split('\t')
if len(s_line) != headers_len:
continue
info = dict(zip(headers, [x if x != r'\N' else None for x in s_line]))
for key, tranf in data_transf.items():
if key not in info:
continue
info[key] = tranf(info[key])
if table_name == 'title_basics':
info['t_soundex'] = title_soundex(info['primaryTitle'])
elif table_name == 'title_akas':
info['t_soundex'] = title_soundex(info['title'])
elif table_name == 'name_basics':
info['ns_soundex'], info['sn_soundex'], info['s_soundex'] = name_soundexes(info['primaryName'])
data.append(info)
if len(data) >= BLOCK_SIZE:
yield data
data = []
if data:
yield data
data = []
def build_table(fn, headers):
"""Build a Table object from a .tsv.gz file.
:param fn: the .tsv.gz file
:type fn: str
:param headers: headers in the file
:type headers: list
"""
logging.debug('building table for file %s' % fn)
table_name = fn.replace(TSV_EXT, '').replace('.', '_')
table_map = DB_TRANSFORM.get(table_name) or {}
columns = []
all_headers = set(headers)
all_headers.update(table_map.keys())
for header in all_headers:
col_info = table_map.get(header) or {}
col_type = col_info.get('type') or sqlalchemy.UnicodeText
if 'length' in col_info and col_type is sqlalchemy.String:
col_type = sqlalchemy.String(length=col_info['length'])
col_args = {
'name': header,
'type_': col_type,
'index': col_info.get('index', False)
}
col_obj = sqlalchemy.Column(**col_args)
columns.append(col_obj)
return sqlalchemy.Table(table_name, metadata, *columns)
def import_file(fn, engine):
"""Import data from a .tsv.gz file.
:param fn: the .tsv.gz file
:type fn: str
:param engine: SQLAlchemy engine
:type engine: :class:`sqlalchemy.engine.base.Engine`
"""
logging.info('begin processing file %s' % fn)
connection = engine.connect()
count = 0
with gzip.GzipFile(fn, 'r') as gz_file:
headers = gz_file.readline().decode('utf-8').strip().split('\t')
logging.debug('headers of file %s: %s' % (fn, ','.join(headers)))
table = build_table(os.path.basename(fn), headers)
try:
table.drop()
logging.debug('table %s dropped' % table.name)
except:
pass
insert = table.insert()
metadata.create_all(tables=[table])
try:
for block in generate_content(gz_file, headers, table):
try:
connection.execute(insert, block)
except Exception as e:
logging.error('error processing data: %d entries lost: %s' % (len(block), e))
continue
count += len(block)
except Exception as e:
logging.error('error processing data on table %s: %s' % (table.name, e))
logging.info('end processing file %s: %d entries' % (fn, count))
def import_dir(dir_name, engine):
"""Import data from a series of .tsv.gz files.
:param dir_name: directory containing the .tsv.gz files
:type dir_name: str
:param engine: SQLAlchemy engine
:type engine: :class:`sqlalchemy.engine.base.Engine`
"""
for fn in glob.glob(os.path.join(dir_name, '*%s' % TSV_EXT)):
if not os.path.isfile(fn):
logging.debug('skipping file %s' % fn)
continue
import_file(fn, engine)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('tsv_files_dir')
parser.add_argument('db_uri')
parser.add_argument('--verbose', help='increase verbosity', action='store_true')
args = parser.parse_args()
dir_name = args.tsv_files_dir
db_uri = args.db_uri
if args.verbose:
logger.setLevel(logging.DEBUG)
engine = sqlalchemy.create_engine(db_uri, echo=False)
metadata.bind = engine
import_dir(dir_name, engine)
#!/usr/bin/env python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
search_character.py
......@@ -13,13 +14,13 @@ import sys
try:
import imdb
except ImportError:
print 'You bad boy! You need to install the IMDbPY package!'
print('You bad boy! You need to install the IMDbPY package!')
sys.exit(1)
if len(sys.argv) != 2:
print 'Only one argument is required:'
print ' %s "character name"' % sys.argv[0]
print('Only one argument is required:')
print(' %s "character name"' % sys.argv[0])
sys.exit(2)
name = sys.argv[1]
......@@ -27,28 +28,25 @@ name = sys.argv[1]
i = imdb.IMDb()
in_encoding = sys.stdin.encoding or sys.getdefaultencoding()
out_encoding = sys.stdout.encoding or sys.getdefaultencoding()