Skip to content
Snippets Groups Projects
Commit eab817f7 authored by Edward Betts's avatar Edward Betts
Browse files

New upstream version 1.0.2

parent cafce6c8
No related branches found
No related tags found
No related merge requests found
Metadata-Version: 1.1
Name: nameparser
Version: 0.5.6
Version: 1.0.2
Summary: A simple Python module for parsing human names into their individual components.
Home-page: https://github.com/derek73/python-nameparser
Author: Derek Gulbranson
Author-email: derek73@gmail.com
License: LGPL
Description-Content-Type: UNKNOWN
Description: Name Parser
===========
.. image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master
:target: https://travis-ci.org/derek73/python-nameparser
.. image:: https://badge.fury.io/py/nameparser.svg
:target: http://badge.fury.io/py/nameparser
|Build Status| |PyPI| |PyPI version| |Documentation|
A simple Python (3.2+ & 2.6+) module for parsing human names into their
individual components.
......@@ -24,6 +20,7 @@ Description: Name Parser
* hn.last
* hn.suffix
* hn.nickname
* hn.surnames *(middle + last)*
Supported Name Structures
~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -61,9 +58,9 @@ Description: Name Parser
``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser``
If you're looking for a web service, check out
`eyeseast's nameparse service <https://github.com/eyeseast/nameparse>`_, a
simple Heroku-friendly Flask wrapper for this module.
If you need to handle lists of names, check out
`namesparser <https://github.com/gwu-libraries/namesparser>`_, a
compliment to this module that handles multiple names in a string.
Quick Start Example
......@@ -145,19 +142,25 @@ Description: Name Parser
.. _CONTRIBUTING.md: https://github.com/derek73/python-nameparser/tree/master/CONTRIBUTING.md
.. _Start a New Issue: https://github.com/derek73/python-nameparser/issues
.. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.py
.. |Build Status| image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master
:target: https://travis-ci.org/derek73/python-nameparser
.. |PyPI| image:: https://img.shields.io/pypi/v/nameparser.svg
:target: https://pypi.org/project/nameparser/
.. |Documentation| image:: https://readthedocs.org/projects/nameparser/badge/?version=latest
:target: http://nameparser.readthedocs.io/en/latest/?badge=latest
.. |PyPI version| image:: https://img.shields.io/pypi/pyversions/nameparser.svg
:target: https://pypi.org/project/nameparser/
Keywords: names,parser
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Development Status :: 5 - Production/Stable
Classifier: Natural Language :: English
Classifier: Topic :: Software Development :: Libraries :: Python Modules
......
Name Parser
===========
.. image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master
:target: https://travis-ci.org/derek73/python-nameparser
.. image:: https://badge.fury.io/py/nameparser.svg
:target: http://badge.fury.io/py/nameparser
|Build Status| |PyPI| |PyPI version| |Documentation|
A simple Python (3.2+ & 2.6+) module for parsing human names into their
individual components.
......@@ -15,6 +12,7 @@ individual components.
* hn.last
* hn.suffix
* hn.nickname
* hn.surnames *(middle + last)*
Supported Name Structures
~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -52,9 +50,9 @@ install with pip using the command below.
``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser``
If you're looking for a web service, check out
`eyeseast's nameparse service <https://github.com/eyeseast/nameparse>`_, a
simple Heroku-friendly Flask wrapper for this module.
If you need to handle lists of names, check out
`namesparser <https://github.com/gwu-libraries/namesparser>`_, a
compliment to this module that handles multiple names in a string.
Quick Start Example
......@@ -135,4 +133,14 @@ https://github.com/derek73/python-nameparser
.. _CONTRIBUTING.md: https://github.com/derek73/python-nameparser/tree/master/CONTRIBUTING.md
.. _Start a New Issue: https://github.com/derek73/python-nameparser/issues
.. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.py
\ No newline at end of file
.. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.py
.. |Build Status| image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master
:target: https://travis-ci.org/derek73/python-nameparser
.. |PyPI| image:: https://img.shields.io/pypi/v/nameparser.svg
:target: https://pypi.org/project/nameparser/
.. |Documentation| image:: https://readthedocs.org/projects/nameparser/badge/?version=latest
:target: http://nameparser.readthedocs.io/en/latest/?badge=latest
.. |PyPI version| image:: https://img.shields.io/pypi/pyversions/nameparser.svg
:target: https://pypi.org/project/nameparser/
Metadata-Version: 1.1
Name: nameparser
Version: 0.5.6
Version: 1.0.2
Summary: A simple Python module for parsing human names into their individual components.
Home-page: https://github.com/derek73/python-nameparser
Author: Derek Gulbranson
Author-email: derek73@gmail.com
License: LGPL
Description-Content-Type: UNKNOWN
Description: Name Parser
===========
.. image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master
:target: https://travis-ci.org/derek73/python-nameparser
.. image:: https://badge.fury.io/py/nameparser.svg
:target: http://badge.fury.io/py/nameparser
|Build Status| |PyPI| |PyPI version| |Documentation|
A simple Python (3.2+ & 2.6+) module for parsing human names into their
individual components.
......@@ -24,6 +20,7 @@ Description: Name Parser
* hn.last
* hn.suffix
* hn.nickname
* hn.surnames *(middle + last)*
Supported Name Structures
~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -61,9 +58,9 @@ Description: Name Parser
``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser``
If you're looking for a web service, check out
`eyeseast's nameparse service <https://github.com/eyeseast/nameparse>`_, a
simple Heroku-friendly Flask wrapper for this module.
If you need to handle lists of names, check out
`namesparser <https://github.com/gwu-libraries/namesparser>`_, a
compliment to this module that handles multiple names in a string.
Quick Start Example
......@@ -145,19 +142,25 @@ Description: Name Parser
.. _CONTRIBUTING.md: https://github.com/derek73/python-nameparser/tree/master/CONTRIBUTING.md
.. _Start a New Issue: https://github.com/derek73/python-nameparser/issues
.. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.py
.. |Build Status| image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master
:target: https://travis-ci.org/derek73/python-nameparser
.. |PyPI| image:: https://img.shields.io/pypi/v/nameparser.svg
:target: https://pypi.org/project/nameparser/
.. |Documentation| image:: https://readthedocs.org/projects/nameparser/badge/?version=latest
:target: http://nameparser.readthedocs.io/en/latest/?badge=latest
.. |PyPI version| image:: https://img.shields.io/pypi/pyversions/nameparser.svg
:target: https://pypi.org/project/nameparser/
Keywords: names,parser
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Development Status :: 5 - Production/Stable
Classifier: Natural Language :: English
Classifier: Topic :: Software Development :: Libraries :: Python Modules
......
VERSION = (0, 5, 6)
VERSION = (1, 0, 2)
__version__ = '.'.join(map(str, VERSION))
__author__ = "Derek Gulbranson"
__author_email__ = 'derek73@gmail.com'
......
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
#: Name pieces that appear before a last name. They join to the piece that follows them to make one new piece.
#: Name pieces that appear before a last name. Prefixes join to the piece
#: that follows them to make one new piece. They can be chained together, e.g
#: "von der" and "de la". Because they only appear in middle or last names,
#: they also signifiy that all following name pieces should be in the same name
#: part, for example, "von" will be joined to all following pieces that are not
#: prefixes or suffixes, allowing recognition of double last names when they
#: appear after a prefixes. So in "pennie von bergen wessels MD", "von" will
#: join with all following name pieces until the suffix "MD", resulting in the
#: correct parsing of the last name "von bergen wessels".
PREFIXES = set([
'abu',
'bin',
......@@ -19,8 +27,10 @@ PREFIXES = set([
'dello',
'der',
'di',
'du',
'',
'do',
'dos',
'du',
'ibn',
'la',
'le',
......
......@@ -23,11 +23,14 @@ REGEXES = set([
("word", re.compile(r"(\w|\.)+", re.U)),
("mac", re.compile(r'^(ma?c)(\w{2,})', re.I | re.U)),
("initial", re.compile(r'^(\w\.|[A-Z])?$', re.U)),
("nickname", re.compile(r'\s*?[\("](.+?)[\)"]', re.U)),
("quoted_word", re.compile(r'\'([^\s]*?)\'', re.U)),
("double_quotes", re.compile(r'\"(.*?)\"', re.U)),
("parenthesis", re.compile(r'\((.*?)\)', re.U)),
("roman_numeral", re.compile(r'^(X|IX|IV|V?I{0,3})$', re.I | re.U)),
("no_vowels",re.compile(r'^[^aeyiuo]+$', re.I | re.U)),
("period_not_at_end",re.compile(r'.*\..+$', re.I | re.U)),
("emoji",re_emoji),
("phd", re.compile(r'\s(ph\.?\s+d\.?)', re.I | re.U)),
])
"""
All regular expressions used by the parser are precompiled and stored in the config.
......
......@@ -7,6 +7,7 @@ SUFFIX_NOT_ACRONYMS = set([
'esquire',
'jr',
'jnr',
'junior',
'sr',
'snr',
'2',
......
......@@ -194,7 +194,6 @@ TITLES = FIRST_NAME_TITLES | set([
'comtesse',
'conductor',
'consultant',
'contessa',
'controller',
'corporal',
'corporate',
......@@ -245,6 +244,7 @@ TITLES = FIRST_NAME_TITLES | set([
'doyen',
'dpty',
'dr',
'dra',
'dramatist',
'druid',
'drummer',
......@@ -566,6 +566,7 @@ TITLES = FIRST_NAME_TITLES | set([
'special',
'sr',
'sra',
'srta',
'ssg',
'ssgt',
'staff',
......
......@@ -43,7 +43,8 @@ class HumanName(object):
* :py:attr:`last`
* :py:attr:`suffix`
* :py:attr:`nickname`
* :py:attr:`surnames`
:param str full_name: The name string to be parsed.
:param constants constants:
a :py:class:`~nameparser.config.Constants` instance. Pass ``None`` for
......@@ -243,7 +244,21 @@ class HumanName(object):
parenthesis (``()``)
"""
return " ".join(self.nickname_list) or self.C.empty_attribute_default
@property
def surnames_list(self):
"""
List of middle names followed by last name.
"""
return self.middle_list + self.last_list
@property
def surnames(self):
"""
A string of all middle names followed by the last name.
"""
return " ".join(self.surnames_list) or self.C.empty_attribute_default
### setter methods
def _set_list(self, attr, value):
......@@ -296,7 +311,7 @@ class HumanName(object):
def is_prefix(self, piece):
"""
Lowercase and no periods version of piece is in the
`~nameparser.config.titles.PREFIXES` set.
:py:data:`~nameparser.config.prefixes.PREFIXES` set.
"""
return lc(piece) in self.C.prefixes
......@@ -319,7 +334,7 @@ class HumanName(object):
return ((lc(piece).replace('.','') in self.C.suffix_acronyms) \
or (lc(piece) in self.C.suffix_not_acronyms)) \
and not self.is_an_initial(piece)
def are_suffixes(self, pieces):
"""Return True if all pieces are suffixes."""
for piece in pieces:
......@@ -361,17 +376,21 @@ class HumanName(object):
def collapse_whitespace(self, string):
# collapse multiple spaces into single space
return self.C.regexes.spaces.sub(" ", string.strip())
string = self.C.regexes.spaces.sub(" ", string.strip())
if string.endswith(","):
string = string[:-1]
return string
def pre_process(self):
"""
This method happens at the beginning of the :py:func:`parse_full_name`
before any other processing of the string aside from unicode
normalization, so it's a good place to do any custom handling in a
subclass. Runs :py:func:`parse_nicknames`.
subclass. Runs :py:func:`parse_nicknames` and py:func:`squash_emoji`.
"""
self.fix_phd()
self.parse_nicknames()
self.squash_emoji()
......@@ -382,17 +401,34 @@ class HumanName(object):
"""
self.handle_firstnames()
def fix_phd(self):
_re = self.C.regexes.phd
match = _re.search(self._full_name)
if match:
self.suffix_list.append(match.group(1))
self._full_name = _re.sub('', self._full_name)
def parse_nicknames(self):
"""
The content of parenthesis or double quotes in the name will
be treated as nicknames. This happens before any other
processing of the name.
The content of parenthesis or quotes in the name will be added to the
nicknames list. This happens before any other processing of the name.
Single quotes cannot span white space characters to allow for single
quotes in names like O'Connor. Double quotes and parenthesis can span
white space.
Loops through 3 :py:data:`~nameparser.config.regexes.REGEXES`;
`quoted_word`, `double_quotes` and `parenthesis`.
"""
# https://code.google.com/p/python-nameparser/issues/detail?id=33
re_nickname = self.C.regexes.nickname
if re_nickname.search(self._full_name):
self.nickname_list = re_nickname.findall(self._full_name)
self._full_name = re_nickname.sub('', self._full_name)
re_quoted_word = self.C.regexes.quoted_word
re_double_quotes = self.C.regexes.double_quotes
re_parenthesis = self.C.regexes.parenthesis
for _re in (re_quoted_word, re_double_quotes, re_parenthesis):
if _re.search(self._full_name):
self.nickname_list += [x for x in _re.findall(self._full_name)]
self._full_name = _re.sub('', self._full_name)
def squash_emoji(self):
"""
......@@ -467,6 +503,9 @@ class HumanName(object):
self.title_list.append(piece)
continue
if not self.first:
if p_len == 1 and self.nickname:
self.last_list.append(piece)
continue
self.first_list.append(piece)
continue
if self.are_suffixes(pieces[i+1:]) or \
......@@ -534,7 +573,7 @@ class HumanName(object):
# lastname part may have suffixes in it
lastname_pieces = self.parse_pieces(parts[0].split(' '), 1)
for piece in lastname_pieces:
# the first one is always a last name, even if it look like
# the first one is always a last name, even if it looks like
# a suffix
if self.is_suffix(piece) and len(self.last_list) > 0:
self.suffix_list.append(piece)
......@@ -644,16 +683,16 @@ class HumanName(object):
# don't join on conjunctions if there's only 2 parts
if length < 3:
return pieces
rootname_pieces = [p for p in pieces if self.is_rootname(p)]
total_length = len(rootname_pieces) + additional_parts_count
# find all the conjunctions, join any conjunctions that are next to each
# other, then join those newly joined conjunctions and any single
# conjunctions to the piece before and after it
conj_index = [i for i, piece in enumerate(pieces)
conj_index = [i for i, piece in enumerate(pieces)
if self.is_conjunction(piece)]
contiguous_conj_i = []
for i, val in enumerate(conj_index):
try:
......@@ -661,10 +700,10 @@ class HumanName(object):
contiguous_conj_i += [val]
except IndexError:
pass
contiguous_conj_i = group_contiguous_integers(conj_index)
delete_i = []
delete_i = []
for i in contiguous_conj_i:
if type(i) == tuple:
new_piece = " ".join(pieces[ i[0] : i[1]+1] )
......@@ -676,7 +715,7 @@ class HumanName(object):
pieces[i] = new_piece
#add newly joined conjunctions to constants to be found later
self.C.conjunctions.add(new_piece)
for i in reversed(delete_i):
# delete pieces in reverse order or the index changes on each delete
del pieces[i]
......@@ -687,7 +726,7 @@ class HumanName(object):
# refresh conjunction index locations
conj_index = [i for i, piece in enumerate(pieces) if self.is_conjunction(piece)]
for i in conj_index:
if len(pieces[i]) == 1 and total_length < 4:
# if there are only 3 total parts (minus known titles, suffixes
......@@ -695,7 +734,7 @@ class HumanName(object):
# treating it as an initial rather than a conjunction.
# http://code.google.com/p/python-nameparser/issues/detail?id=11
continue
if i is 0:
new_piece = " ".join(pieces[i:i+2])
if self.is_title(pieces[i+1]):
......@@ -707,8 +746,8 @@ class HumanName(object):
for j,val in enumerate(conj_index):
if val > i:
conj_index[j]=val-1
else:
else:
new_piece = " ".join(pieces[i-1:i+2])
if self.is_title(pieces[i-1]):
# when joining to a title, make new_piece a title too
......@@ -726,30 +765,56 @@ class HumanName(object):
for j,val in enumerate(conj_index):
if val > i:
conj_index[j] = val - rm_count
# join prefixes to following lastnames: ['de la Vega'], ['van Buren']
prefixes = list(filter(self.is_prefix, pieces))
if prefixes:
i = pieces.index(prefixes[0])
# join everything after the prefix until the next suffix
next_suffix = list(filter(self.is_suffix, pieces[i:]))
if next_suffix:
j = pieces.index(next_suffix[0])
new_piece = ' '.join(pieces[i:j])
pieces = pieces[:i] + [new_piece] + pieces[j:]
else:
new_piece = ' '.join(pieces[i:])
pieces = pieces[:i] + [new_piece]
for prefix in prefixes:
try:
i = pieces.index(prefix)
except ValueError:
# If the prefix is no longer in pieces, it's because it has been
# combined with the prefix that appears right before (or before that when
# chained together) in the last loop, so the index of that newly created
# piece is the same as in the last loop, i==i still, and we want to join
# it to the next piece.
pass
new_piece = ''
# join everything after the prefix until the next prefix or suffix
try:
next_prefix = next(iter(filter(self.is_prefix, pieces[i + 1:])))
j = pieces.index(next_prefix)
if j == i + 1:
# if there are two prefixes in sequence, join to the following piece
j += 1
new_piece = ' '.join(pieces[i:j])
pieces = pieces[:i] + [new_piece] + pieces[j:]
except StopIteration:
try:
# if there are no more prefixes, look for a suffix to stop at
stop_at = next(iter(filter(self.is_suffix, pieces[i + 1:])))
j = pieces.index(stop_at)
new_piece = ' '.join(pieces[i:j])
pieces = pieces[:i] + [new_piece] + pieces[j:]
except StopIteration:
# if there were no suffixes, nothing to stop at so join all
# remaining pieces
new_piece = ' '.join(pieces[i:])
pieces = pieces[:i] + [new_piece]
log.debug("pieces: {0}".format(pieces))
return pieces
### Capitalization Support
def cap_word(self, word):
if self.is_prefix(word) or self.is_conjunction(word):
def cap_word(self, word, attribute):
if (self.is_prefix(word) and attribute in ('last','middle')) \
or self.is_conjunction(word):
return word.lower()
exceptions = self.C.capitalization_exceptions
if lc(word) in exceptions:
......@@ -762,10 +827,10 @@ class HumanName(object):
else:
return word.capitalize()
def cap_piece(self, piece):
def cap_piece(self, piece, attribute):
if not piece:
return ""
replacement = lambda m: self.cap_word(m.group(0))
replacement = lambda m: self.cap_word(m.group(0), attribute)
return self.C.regexes.word.sub(replacement, piece)
def capitalize(self, force=False):
......@@ -798,8 +863,8 @@ class HumanName(object):
name = u(self)
if not force and not (name == name.upper() or name == name.lower()):
return
self.title_list = self.cap_piece(self.title ).split(' ')
self.first_list = self.cap_piece(self.first ).split(' ')
self.middle_list = self.cap_piece(self.middle).split(' ')
self.last_list = self.cap_piece(self.last ).split(' ')
self.suffix_list = self.cap_piece(self.suffix).split(', ')
self.title_list = self.cap_piece(self.title , 'title').split(' ')
self.first_list = self.cap_piece(self.first , 'first').split(' ')
self.middle_list = self.cap_piece(self.middle, 'middle').split(' ')
self.last_list = self.cap_piece(self.last , 'last').split(' ')
self.suffix_list = self.cap_piece(self.suffix, 'suffix').split(', ')
......@@ -26,13 +26,8 @@ setup(name='nameparser',
'Operating System :: OS Independent',
"License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)",
'Programming Language :: Python',
'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.2',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Development Status :: 5 - Production/Stable',
'Natural Language :: English',
"Topic :: Software Development :: Libraries :: Python Modules",
......
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment