learning-from-bidict.rst 19.3 KB
Newer Older
1 2 3
Learning from bidict
--------------------

4 5
Below is an outline of some of the more fascinating
and lesser-known Python corners I got to explore further
6 7 8
thanks to working on bidict.

If you are interested in learning more about any of the following,
9 10
I highly encourage you to
`read bidict's code <https://github.com/jab/bidict/blob/master/bidict/__init__.py#L10>`__.
11

12 13 14
I've sought to optimize the code not just for correctness and performance,
but also to make for a clear and enjoyable read,
illuminating anything that could otherwise be obscure or subtle.
15

16 17 18
I hope it brings you some of the
`joy <https://joy.recurse.com/posts/148-bidict>`__
it's brought me. 😊
19 20


21 22
Python syntax hacks
===================
23

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
Bidict used to support (ab)using a specialized form of Python's :ref:`slice <slicings>` syntax
for getting and setting keys by value:

.. code-block:: python

   >>> element_by_symbol = bidict(H='hydrogen')
   >>> element_by_symbol['H']  # [normal] syntax for the forward mapping
   'hydrogen'
   >>> element_by_symbol[:'hydrogen']  # [:slice] syntax for the inverse (no longer supported)
   'H'

See `this code <https://github.com/jab/bidict/blob/356dbe3/bidict/_bidict.py#L25>`__
for how this was implemented,
and `#19 <https://github.com/jab/bidict/issues/19>`__ for why this was dropped.


Code structure
==============

Bidicts come in every combination of mutable, immutable, ordered, and unordered types,
implementing Python's various
:class:`relevant <collections.abc.Mapping>`
:class:`collections <collections.abc.MutableMapping>`
:class:`interfaces <collections.abc.Hashable>`
as appropriate.
49

50 51 52 53
Factoring the code to maximize reuse, modularity, and
adherence to `SOLID <https://en.wikipedia.org/wiki/SOLID>`__ design principles
(while not missing any chances for special-case optimizations)
has been one of the most fun parts of working on bidict.
54

55
To see how this is done, check out this code:
56

57 58 59 60 61 62 63 64
- `_base.py <https://github.com/jab/bidict/blob/master/bidict/_base.py#L10>`__
- `_delegating_mixins.py <https://github.com/jab/bidict/blob/master/bidict/_delegating_mixins.py#L10>`__
- `_frozenbidict.py <https://github.com/jab/bidict/blob/master/bidict/_frozenbidict.py#L10>`__
- `_mut.py <https://github.com/jab/bidict/blob/master/bidict/_mut.py#L10>`__
- `_bidict.py <https://github.com/jab/bidict/blob/master/bidict/_bidict.py#L10>`__
- `_orderedbase.py <https://github.com/jab/bidict/blob/master/bidict/_orderedbase.py#L10>`__
- `_frozenordered.py <https://github.com/jab/bidict/blob/master/bidict/_frozenordered.py#L10>`__
- `_orderedbidict.py <https://github.com/jab/bidict/blob/master/bidict/_orderedbidict.py#L10>`__
65

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220
Data structures are amazing
===========================

Data structures are one of the most fascinating and important
building blocks of programming and computer science.

It's all too easy to lose sight of the magic when having to implement them
for computer science courses or job interview questions.
Part of this is because many of the most interesting real-world details get left out,
and you miss all the value that comes from ongoing, direct practical application.

Bidict shows how fundamental data structures
can be implemented in Python for important real-world usage,
with practical concerns at top of mind.
Come to catch sight of a real, live, industrial-strength linked list in the wild.
Stay for the rare, exotic bidirectional mapping breeds you'll rarely see at home.
[#fn-data-struct]_

.. [#fn-data-struct] To give you a taste:

   A regular :class:`~bidict.bidict`
   encapsulates two regular dicts,
   keeping them in sync to preserve the bidirectional mapping invariants.
   Since dicts are unordered, regular bidicts are unordered too.
   How should we extend this to implement an ordered bidict?

   We'll still need two backing mappings to store the forward and inverse associations.
   To store the ordering, we use a (circular, doubly-) linked list.
   This allows us to e.g. delete an item in any position in O(1) time.

   Interestingly, the nodes of the linked list encode only the ordering of the items;
   the nodes themselves contain no key or value data.
   The two backing mappings associate the key and value data
   with the nodes, providing the final pieces of the puzzle.

   Can we use dicts for the backing mappings, as we did for the unordered bidict?
   It turns out that dicts aren't enough—the backing mappings must actually be
   (unordered) bidicts themselves!

Check out `_orderedbase.py <https://github.com/jab/bidict/blob/master/bidict/_orderedbase.py#L10>`__
to see this in action.


Property-based testing is revolutionary
=======================================

When your automated tests run,
are they only checking the test cases
you happened to hard-code into your test suite?
How do you know these test cases aren't missing
some important edge cases?

With property-based testing,
you describe the types of test case inputs your functions accept,
along with the properties that should hold for all inputs.
Rather than having to think up your test case inputs manually
and hard-code them into your test suite,
they get generated for you dynamically,
in much greater quantity and edge case-exercising diversity
than you could come up with by hand.
This dramatically increases test coverage
and confidence that your code is correct.

Bidict never would have survived so many refactorings with so few bugs
if it weren't for property-based testing, enabled by the amazing
`Hypothesis <https://hypothesis.readthedocs.io>`__ library.
It's game-changing.

Check out `bidict's property-based tests
<https://github.com/jab/bidict/blob/master/tests/properties/test_properties.py>`__
to see this in action.


Python surprises, gotchas, regrets
==================================

- See :ref:`addendum:nan as key`.

- See :ref:`addendum:Equivalent but distinct \:class\:\`~collections.abc.Hashable\`\\s`.

- What should happen when checking equality of several ordered mappings
  that contain the same items but in a different order?
  What about when comparing with an unordered mapping?

  Check out what Python's :class:`~collections.OrderedDict` does,
  and the surprising results:

  .. code-block:: python

     >>> from collections import OrderedDict
     >>> d = dict([(0, 1), (2, 3)])
     >>> od = OrderedDict([(0, 1), (2, 3)])
     >>> od2 = OrderedDict([(2, 3), (0, 1)])
     >>> d == od
     True
     >>> d == od2
     True
     >>> od == od2
     False

     >>> class MyDict(dict):
     ...   __hash__ = lambda self: 0
     ...

     >>> class MyOrderedDict(OrderedDict):
     ...   __hash__ = lambda self: 0
     ...

     >>> d = MyDict([(0, 1), (2, 3)])
     >>> od = MyOrderedDict([(0, 1), (2, 3)])
     >>> od2 = MyOrderedDict([(2, 3), (0, 1)])
     >>> len({d, od, od2})
     1
     >>> len({od, od2, d})
     2

  According to Raymond Hettinger
  (Python core developer responsible for much of Python's collections),
  this design was a mistake
  (e.g. it violates the `Liskov substitution principle
  <https://en.wikipedia.org/wiki/Liskov_substitution_principle>`__
  and the `transitive property of equality
  <https://en.wikipedia.org/wiki/Equality_(mathematics)#Basic_properties>`__),
  but it's too late now to fix.

  Fortunately, it wasn't too late for bidict to learn from this.
  Hence :ref:`eq-order-insensitive` for ordered bidicts,
  and their separate :meth:`~bidict.FrozenOrderedBidict.equals_order_sensitive` method.

- If you define a custom :meth:`~object.__eq__` on a class,
  it will *not* be used for ``!=`` comparisons on Python 2 automatically;
  you must explicitly add an :meth:`~object.__ne__` implementation
  that calls your :meth:`~object.__eq__` implementation.
  If you don't, :meth:`object.__ne__` will be used instead,
  which behaves like ``is not``. Python 3 thankfully fixes this.


Better memory usage through ``__slots__``
=========================================

Using :ref:`slots` dramatically reduces memory usage in CPython
and speeds up attribute access to boot.
Must be careful with pickling and weakrefs though!
See `BidictBase.__getstate__()
<https://github.com/jab/bidict/blob/master/bidict/_base.py>`__.


Better memory usage through :mod:`weakref`
==========================================

A bidict and its inverse use :mod:`weakref`
to avoid creating a strong reference cycle,
so that when you release your last reference to a bidict,
its memory is reclaimed immediately in CPython
rather than having to wait for the next garbage collection.
221 222
See :ref:`addendum:Bidict Avoids Reference Cycles`.

223 224
The (doubly) linked lists that back ordered bidicts also use weakrefs
to avoid creating strong reference cycles.
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244


Subclassing :func:`~collections.namedtuple` classes
===================================================

To get the performance benefits, intrinsic sortability, etc.
of :func:`~collections.namedtuple`
while customizing behavior, state, API, etc.,
you can subclass a :func:`~collections.namedtuple` class.
Just make sure to include ``__slots__ = ()``,
or you'll lose a lot of the performance benefits.

``_marker.py`` contains a small example.
Here's a larger one:

.. doctest::

   >>> from collections import namedtuple
   >>> from itertools import count

245
   >>> class Node(namedtuple('_Node', 'cost tiebreaker data parent depth')):
246 247 248 249 250 251 252 253
   ...     """Represent nodes in a graph traversal. Suitable for use with e.g. heapq."""
   ...
   ...     __slots__ = ()
   ...     _counter = count()  # break ties between equal-cost nodes, avoid comparing data
   ...
   ...     # Give call sites a cleaner API for creating new Nodes
   ...     def __new__(cls, cost, data, parent=None):
   ...         tiebreaker = next(cls._counter)
254 255
   ...         depth = parent.depth + 1 if parent else 0
   ...         return super(Node, cls).__new__(cls, cost, tiebreaker, data, parent, depth)
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272
   ...
   ...     def __repr__(self):
   ...         return 'Node(cost={cost}, data={data!r})'.format(**self._asdict())

   >>> start = Node(cost=0, data='foo')
   >>> child = Node(cost=5, data='bar', parent=start)
   >>> child
   Node(cost=5, data='bar')
   >>> child.parent
   Node(cost=0, data='foo')
   >>> child.depth
   1


:func:`~collections.namedtuple`-style dynamic class generation
==============================================================

273 274 275
See the `implementation
<https://github.com/jab/bidict/blob/master/bidict/_named.py>`__
of :func:`~bidict.namedbidict`.
276 277 278 279 280


API Design
==========

281
How to deeply integrate with Python's :mod:`collections` and other built-in APIs?
282

283 284 285 286
- Beyond implementing :class:`collections.abc.Mapping`,
  bidicts implement additional APIs
  that :class:`dict` and :class:`~collections.OrderedDict` implement
  (e.g. :func:`setdefault`, :func:`popitem`, etc.).
287

288 289 290 291
  - When creating a new API, making it familiar, memorable, and intuitive
    is hugely important to a good user experience.

- Thanks to :class:`~collections.abc.Hashable`'s
292
  implementing :meth:`abc.ABCMeta.__subclasshook__`,
293
  any class that implements the required methods of the
294 295 296 297 298 299 300
  :class:`~collections.abc.Hashable` interface
  (namely, :meth:`~collections.abc.Hashable.__hash__`)
  makes it a virtual subclass already, no need to explicitly extend.
  I.e. As long as ``Foo`` implements a ``__hash__()`` method,
  ``issubclass(Foo, Hashable)`` will always be True,
  no need to explicitly subclass via ``class Foo(Hashable): ...``

301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
- How to make your own open ABC like :class:`~collections.abc.Hashable`,
  i.e. how does :class:`~bidict.BidirectionalMapping` work?

  - Override :meth:`~abc.ABCMeta.__subclasshook__`
    to check for the interface you require.
    See the `implementation
    <https://github.com/jab/bidict/blob/master/bidict/_abc.py#L10>`__
    of :class:`~bidict.BidirectionalMapping`.

  - Interesting consequence of the ``__subclasshook__()`` design:
    the "subclass" relation becomes intransitive.
    e.g. :class:`object` is a subclass of :class:`~collections.abc.Hashable`,
    :class:`list` is a subclass of :class:`object`,
    but :class:`list` is not a subclass of :class:`~collections.abc.Hashable`.

- What if we needed to add a second metaclass
  in addition to :class:`~bidict.BidirectionalMapping`
  (e.g. to conditionally implement an optimized version of some methods
  based on the type of ``_fwmd_cls``,
  as ``_delegating_mixins.py`` currently does without a metaclass)?
  Would have to be careful to avoid
  "TypeError: metaclass conflict: the metaclass of a derived class
  must be a (non-strict) subclass of the metaclasses of all its bases".
  See the great write-up in
  https://blog.ionelmc.ro/2015/02/09/understanding-python-metaclasses/.

327 328 329 330 331 332 333 334 335 336 337 338 339
- :class:`collections.abc.Mapping` and
  :class:`collections.abc.MutableMapping`
  don't implement :meth:`~abc.ABCMeta.__subclasshook__`,
  so must either explicitly subclass
  (if you want to inherit any of their implementations)
  or use :meth:`abc.ABCMeta.register`
  (to register as a virtual subclass without inheriting any implementation)

- Notice we have :class:`collections.abc.Reversible`
  but no ``collections.abc.Ordered`` or ``collections.abc.OrderedMapping``.
  Proposed in `bpo-28912 <https://bugs.python.org/issue28912>`__ but rejected.
  Would have been useful for bidict's ``__repr__()`` implementation (see ``_base.py``),
  and potentially for interop with other ordered mapping implementations
340
  such as `SortedDict <http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html>`__.
341

342
How to make APIs Pythonic?
343

344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412
- See the `Zen of Python <https://www.python.org/dev/peps/pep-0020/>`__.

- "Errors should never pass silently.

  Unless explicitly silenced.

  In the face of ambiguity, refuse the temptation to guess."

  Manifested in bidict's default duplication policies.

- "Readability counts."

  "There should be one – and preferably only one – obvious way to do it."

  An early version of bidict allowed using the ``~`` operator to access ``.inverse``
  and a special slice syntax like ``b[:val]`` to look up a key by value,
  but these were removed in preference to the more obvious and readable
  ``.inverse``-based spellings.


Python's data model
===================

- What happens when you implement a custom :meth:`~object.__eq__`?
  e.g. What's the difference between ``a == b`` and ``b == a``
  when only ``a`` is an instance of your class?
  See the great write-up in https://eev.ee/blog/2012/03/24/python-faq-equality/
  for the answer.

- If an instance of your special mapping type
  is being compared against a mapping of some foreign mapping type
  that contains the same items,
  should your ``__eq__()`` method return true?

  Bidict says yes, again based on the `Liskov substitution principle
  <https://en.wikipedia.org/wiki/Liskov_substitution_principle>`__.
  Only returning true when the types matched exactly would violate this.
  And returning :obj:`NotImplemented` would cause Python to fall back on
  using identity comparison, which is not what is being asked for.

  (Just for fun, suppose you did only return true when the types matched exactly,
  and suppose your special mapping type were also hashable.
  Would it be worth having your ``__hash__()`` method include your type
  as well as your items?
  The only point would be to reduce collisions when multiple instances of different
  types contained the same items
  and were going to be inserted into the same :class:`dict` or :class:`set`,
  since they'd now be unequal but would hash to the same value otherwise.)

- Making an immutable type hashable
  (so it can be inserted into :class:`dict`\s and :class:`set`\s):
  Must implement :meth:`~object.__hash__` such that
  ``a == b ⇒ hash(a) == hash(b)``.
  See the :meth:`object.__hash__` and :meth:`object.__eq__` docs, and
  the `implementation <https://github.com/jab/bidict/blob/master/bidict/_frozenbidict.py#L10>`__
  of :class:`~bidict.frozenbidict`.

  - Consider :class:`~bidict.FrozenOrderedBidict`:
    its :meth:`~bidict.FrozenOrderedBidict.__eq__`
    is :ref:`order-insensitive <eq-order-insensitive>`.
    So all contained items must participate in the hash order-insensitively.

  - Can use `collections.abc.Set._hash <https://github.com/python/cpython/blob/a0374d/Lib/_collections_abc.py#L521>`__
    which provides a pure Python implementation of the same hash algorithm
    used to hash :class:`frozenset`\s.
    (Since :class:`~collections.abc.ItemsView` extends
    :class:`~collections.abc.Set`,
    :meth:`bidict.frozenbidict.__hash__`
    just calls ``ItemsView(self)._hash()``.)
413

414
    - Does this argue for making :meth:`collections.abc.Set._hash` non-private?
415

416 417 418 419
    - Why isn't the C implementation of this algorithm directly exposed in
      CPython? The only way to use it is to call ``hash(frozenset(self.items()))``,
      which wastes memory allocating the ephemeral frozenset,
      and time copying all the items into it before they're hashed.
420

421 422 423 424 425 426 427 428
  - Unlike other attributes, if a class implements ``__hash__()``,
    any subclasses of that class will not inherit it.
    It's like Python implicitly adds ``__hash__ = None`` to the body
    of every class that doesn't explicitly define ``__hash__``.
    So if you do want a subclass to inherit a base class's ``__hash__()``
    implementation, you have to set that manually,
    e.g. by adding ``__hash__ = BaseClass.__hash__`` in the class body.
    See :class:`~bidict.FrozenOrderedBidict`.
429

430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449
    This is consistent with the fact that
    :class:`object` implements ``__hash__()``,
    but subclasses of :class:`object`
    that override :meth:`~object.__eq__`
    are not hashable by default.

- Using :meth:`~object.__new__` to bypass default object initialization,
  e.g. for better :meth:`~bidict.bidict.copy` performance.
  See `_base.py <https://github.com/jab/bidict/blob/master/bidict/_bidict.py#L10>`__.

- Overriding :meth:`object.__getattribute__` for custom attribute lookup.
  See :ref:`extending:Sorted Bidict Recipes`.

- Using
  :meth:`object.__getstate__`,
  :meth:`object.__setstate__`, and
  :meth:`object.__reduce__` to make an object pickleable
  that otherwise wouldn't be,
  due to e.g. using weakrefs,
  as bidicts do (covered further below).
450 451 452 453 454 455 456


Portability
===========

- Python 2 vs. Python 3

457
  - As affects bidict, mostly :class:`dict` API changes,
458 459
    but also functions like :func:`zip`, :func:`map`, :func:`filter`, etc.

460
  - See the :meth:`~object.__ne__` gotcha for Python 2 above.
461

462
  - Borrowing methods from other classes:
463 464 465 466 467

    In Python 2, must grab the ``.im_func`` / ``__func__``
    attribute off the borrowed method to avoid getting
    ``TypeError: unbound method ...() must be called with ... instance as first argument``

468 469
    See the `implementation <https://github.com/jab/bidict/blob/master/bidict/_frozenordered.py#L10>`__
    of :class:`~bidict.FrozenOrderedBidict`.
470 471 472 473 474 475 476 477 478

- CPython vs. PyPy

  - gc / weakref

  - primitives' identities, nan, etc.

    - https://bitbucket.org/pypy/pypy/src/dafacc4/pypy/doc/cpython_differences.rst?mode=view

479 480 481 482
    - Hence ``test_no_reference_cycles()``
      in `test_properties.py
      <https://github.com/jab/bidict/blob/master/tests/properties/test_properties.py>`__
      is skipped on PyPy.
483 484


485 486
Other interesting stuff in the standard library
===============================================
487

488 489 490 491 492
- :mod:`reprlib` and :func:`reprlib.recursive_repr`
  (but not needed for bidict because there's no way to insert a bidict into itself)
- :func:`operator.methodcaller`
- :attr:`platform.python_implementation`
- See :ref:`addendum:Missing bidicts in Stdlib!`
493 494 495 496 497


Tools
=====

498
See the :ref:`Thanks <thanks:Projects>` page for some of the fantastic tools
499
for software verification, performance, code quality, etc.
500
that bidict has provided an excuse to play with and learn.