Programming Python (182 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
5.79Mb size Format: txt, pdf, ePub
Adding Relational Algebra to Sets (External)

If you are
interested in studying additional set-like operations
coded in Python, see the following files in this book’s examples
distribution:

PP4E\Dstruct\Basic\rset.py

RSet
implementation

PP4E\Dstruct\Basic\reltest.py

Test script for
RSet
; its
expected output is in
reltest.results.txt

The
RSet
subclass defined in
rset.py
adds basic relational algebra operations
for sets of dictionaries. It assumes the items in sets are mappings
(rows), with one entry per column (field).
RSet
inherits all the original
Set
operations (iteration, intersection,
union,
&
and
|
operators, uniqueness filtering, and so on),
and adds new operations as
methods
:

Select

Return a set of nodes that have a field equal to a given
value.

Bagof

Collect set nodes that satisfy an expression string.

Find

Select tuples according to a comparison, field, and
value.

Match

Find nodes in two sets with the same values for common
fields.

Product

Compute a Cartesian product: concatenate tuples from two
sets.

Join

Combine tuples from two sets that have the same value for a
field.

Project

Extract named fields from the tuples in a table.

Difference

Remove one set’s tuples from another.

These operations go beyond the tools provided by Python’s built-in
set object, and are a prime example of why you may wish to implement a
custom set type in the first place. Although I have ported this code to
run under Python 3.X, I have not revisited it in any sort of depth for
this edition, because today I would probably prefer to implement it as a
subclass of the built-in set type, rather than a part of a proprietary
set implementation. Coincidentally, that leads us to our next
topic.

Subclassing Built-in Types

Before we
move on to other classical data structures, there is one
more twist in the stack and set story. In recent Python releases, it is
also possible to subclass built-in datatypes such as lists and
dictionaries, in order to extend them. That is, because datatypes are now
themselves customizable classes, we can code unique datatypes that are
extensions of built-ins, with subclasses that inherit built-in tool sets.
This is especially true in Python 3.X, where “type” and “class” have
become veritable synonyms altogether.

To demonstrate,
Example 18-13
shows the main parts
of a module containing variants of our stack and set objects coded in the
prior sections, revised as customized lists. For variety, the set union
method has also been simplified slightly here to remove a redundant
loop.

Example 18-13. PP4E\Dstruct\Basic\typesubclass.py

"customize built-in types to extend, instead of starting from scratch"
class Stack(list):
"a list with extra methods"
def top(self):
return self[-1]
def push(self, item):
list.append(self, item)
def pop(self):
if not self:
return None # avoid exception
else:
return list.pop(self)
class Set(list):
" a list with extra methods and operators"
def __init__(self, value=[]): # on object creation
list.__init__(self)
self.concat(value)
def intersect(self, other): # other is any sequence type
res = [] # self is the instance subject
for x in self:
if x in other:
res.append(x)
return Set(res) # return a new Set
def union(self, other):
res = Set(self) # new set with a copy of my list
res.concat(other) # insert uniques from other
return res
def concat(self, value): # value: a list, string, Set...
for x in value: # filters out duplicates
if not x in self:
self.append(x)
# len, getitem, iter inherited, use list repr
def __and__(self, other): return self.intersect(other)
def __or__(self, other): return self.union(other)
def __str__(self): return ''
class FastSet(dict):
pass # this doesn't simplify much
...self-test code omitted: see examples package file...

The stack and set implemented in this code are essentially like
those we saw earlier, but instead of embedding and managing a list, these
objects really are customized lists. They add a few additional methods,
but they inherit all of the list object’s functionality.

This can reduce the amount of wrapper code required, but it can also
expose functionality that might not be appropriate in some cases. As
coded, for example, we’re able to sort and insert into stacks and reverse
a set, because we’ve inherited these methods from the built-in list
object. In most cases, such operations don’t make sense for these data
structures, and barring extra code that disables such nonfeatures, the
wrapper class approach of the prior sections may still be
preferred.

For more on the class subtype classes, see the remainder of their
implementation file in the examples package for self-test code and its
expected output. Because these objects are used in the same way as our
original stacks and sets, interacting with them is left as suggested
exercise here.

Subclassing built-in types has other applications, which may be more
useful than those demonstrated by the preceding code. Consider a queue, or
ordered dictionary, for example. The queue could take the form of a list
subclass with get and put methods to insert on one end and delete from the
other; the dictionary could be coded as a dictionary subclass with an
extra list of keys that is sorted on insertion or request. While this
scheme works well for types that resemble built-ins, though, type
subclasses may not address data structures of radically different
form—like those of the next two
sections
.

Binary Search Trees

Binary trees are a
data structure that impose an order on inserted nodes: items
less than a node are stored in the left subtree, and items greater than a
node are inserted in the right. At the bottom, the subtrees are empty.
Because of this structure, binary trees naturally support quick, recursive
traversals, and hence fast lookup and search in a wide variety of
applications—at least ideally, every time you follow a link to a subtree,
you divide the search space in half.

Built-in Options

Here too, Python
supports search operations with built-in tools.
Dictionaries, for example, already provide a highly optimized, C-coded
search table tool. In fact, indexing a dictionary by key directly is
likely to be faster than searching a Python-coded
equivalent
:

>>>
x = {}
# empty dict
>>>
for i in [3, 1, 9, 2, 7]: x[i] = None
# insert
...
>>>
x
{7: None, 1: None, 2: None, 3: None, 9: None}
>>>
>>>
for i in range(8): print((i, i in x), end=' ')
# lookup
...
(0, False) (1, True) (2, True) (3, True) (4, False) (5, False) (6, False) (7, True)

Because dictionaries are built into the language, they are always
available and will usually be faster than Python-based data structure
implementations. Built-in sets can often offer similar functionality—in
fact, it’s not too much of an abstraction to think of sets as valueless
dictionaries:

>>>
x = set()
# empty set
>>>
for i in [3, 1, 9, 2, 7]: x.add(i)
# insert
...
>>>
x
{7, 1, 2, 3, 9}
>>>
for i in range(8): print((i, i in x), end=' ')
# lookup
...
(0, False) (1, True) (2, True) (3, True) (4, False) (5, False) (6, False) (7, True)

In fact, there are a variety of ways to insert items into both
sets and dictionaries; both are useful for checking if a key is stored,
but dictionaries further allow search keys to have associated
values:

>>>
v = [3, 1, 9]
>>>
{k for k in v}
# set comprehension
{1, 3, 9}
>>>
set(v)
# set constructor
{1, 3, 9}
>>>
{k: k+100 for k in v}
# dict comprehension
{1: 101, 3: 103, 9: 109}
>>>
dict(zip(v, [99] * len(v)))
# dict constructor
{1: 99, 3: 99, 9: 99}
>>>
dict.fromkeys(v, 99)
# dict method
{1: 99, 3: 99, 9: 99}

So why bother with a custom search data structure implementation
here, given such flexible built-ins? In some applications, you might
not, but here especially, a custom implementation often makes sense to
allow for customized tree algorithms. For instance, custom tree
balancing can help speed lookups in pathological cases, and might
outperform the generalized hashing algorithms used in dictionaries and
sets. Moreover, the same motivations we gave for custom stacks and sets
apply here as well—by encapsulating tree access in class-based
interfaces, we support future extension and change in more manageable
ways.

Implementing Binary Trees

Binary trees are
named for the implied branch-like structure of their
subtree links. Typically, their nodes are implemented as a triple of
values:
(LeftSubtree, NodeValue,
RightSubtree)
. Beyond that, there is fairly wide latitude in
the tree implementation. Here we’ll use a class-based approach:

  • BinaryTree
    is a header
    object, which initializes and manages the actual tree.

  • EmptyNode
    is the empty
    object, shared at all empty subtrees (at the bottom).

  • BinaryNode
    objects are
    nonempty tree nodes with a value and two subtrees.

Instead of coding distinct search functions, binary trees are
constructed with “smart” objects—class instances that know how to handle
insert/lookup and printing requests and pass them to subtree objects. In
fact, this is another example of the object-oriented programming (OOP)
composition relationship in action: tree nodes embed other tree nodes
and pass search requests off to the embedded subtrees. A single empty
class instance is shared by all empty subtrees in a binary tree, and
inserts replace an
EmptyNode
with a
BinaryNode
at the bottom.
Example 18-14
shows what this
means in code.

Example 18-14. PP4E\Dstruct\Classics\btree.py

"a valueless binary search tree"
class BinaryTree:
def __init__(self): self.tree = EmptyNode()
def __repr__(self): return repr(self.tree)
def lookup(self, value): return self.tree.lookup(value)
def insert(self, value): self.tree = self.tree.insert(value)
class EmptyNode:
def __repr__(self):
return '*'
def lookup(self, value): # fail at the bottom
return False
def insert(self, value):
return BinaryNode(self, value, self) # add new node at bottom
class BinaryNode:
def __init__(self, left, value, right):
self.data, self.left, self.right = value, left, right
def lookup(self, value):
if self.data == value:
return True
elif self.data > value:
return self.left.lookup(value) # look in left
else:
return self.right.lookup(value) # look in right
def insert(self, value):
if self.data > value:
self.left = self.left.insert(value) # grow in left
elif self.data < value:
self.right = self.right.insert(value) # grow in right
return self
def __repr__(self):
return ('( %s, %s, %s )' %
(repr(self.left), repr(self.data), repr(self.right)))

As usual,
BinaryTree
can
contain objects of any type that support the expected interface
protocol—here,
>
and
<
comparisons. This includes class
instances with the
__lt__
and
__gt__
methods. Let’s experiment with
this module’s interfaces. The following code stuffs five integers into a
new tree, and then searches for values 0 . . . 7, as we did earlier for
dictionaries and sets:

C:\...\PP4E\Dstruct\Classics>
python
>>>
from btree import BinaryTree
>>>
x = BinaryTree()
>>>
for i in [3, 1, 9, 2, 7]: x.insert(i)
...
>>>
for i in range(8): print((i, x.lookup(i)), end=' ')
...
(0, False) (1, True) (2, True) (3, True) (4, False) (5, False) (6, False) (7, True)

To watch this tree grow, add a
print
call statement after each insert. Tree
nodes print themselves as triples, and empty nodes print as
*
. The result reflects tree nesting:

>>>
y = BinaryTree()
>>>
y
*
>>>
for i in [3, 1, 9, 2, 7]:
...
y.insert(i); print(y)
...
( *, 3, * )
( ( *, 1, * ), 3, * )
( ( *, 1, * ), 3, ( *, 9, * ) )
( ( *, 1, ( *, 2, * ) ), 3, ( *, 9, * ) )
( ( *, 1, ( *, 2, * ) ), 3, ( ( *, 7, * ), 9, * ) )

At the end of this chapter, we’ll see another way to visualize
such trees in a GUI named PyTree (you’re invited to flip ahead now if
you prefer). Node values in this tree object can be any comparable
Python object—for instance, here is a tree of strings:

>>>
z = BinaryTree()
>>>
for c in 'badce': z.insert(c)
...
>>>
z
( ( *, 'a', * ), 'b', ( ( *, 'c', * ), 'd', ( *, 'e', * ) ) )
>>>
z = BinaryTree()
>>>
for c in 'abcde': z.insert(c)
...
>>>
z
( *, 'a', ( *, 'b', ( *, 'c', ( *, 'd', ( *, 'e', * ) ) ) ) )
>>>
z = BinaryTree()
>>>
for c in 'edcba': z.insert(c)
...
>>>
z
( ( ( ( ( *, 'a', * ), 'b', * ), 'c', * ), 'd', * ), 'e', * )

Notice the last tests here: if items inserted into a binary tree
are already ordered, you wind up with a
linear
structure and lose the search-space partitioning magic of binary trees
(the tree grows in right or left branches only). This is a worst-case
scenario, and binary trees generally do a good job of dividing values in
practice. But if you are interested in pursuing this topic further, see
a data structures text for tree-balancing techniques that automatically
keep the tree as dense as possible but are beyond our scope here.

Trees with Both Keys and Values

Also note that to keep the code simple, these trees store a value
only and lookups return a true or false result. In practice, you
sometimes may want to store both a key and an associated value (or even
more) at each tree node.
Example 18-15
shows what such a
tree object looks like, for any prospective lumberjacks in the
audience.

Example 18-15. PP4E\Dstruct\Classics\btreevals.py

"a binary search tree with values for stored keys"
class KeyedBinaryTree:
def __init__(self): self.tree = EmptyNode()
def __repr__(self): return repr(self.tree)
def lookup(self, key): return self.tree.lookup(key)
def insert(self, key, val): self.tree = self.tree.insert(key, val)
class EmptyNode:
def __repr__(self):
return '*'
def lookup(self, key): # fail at the bottom
return None
def insert(self, key, val):
return BinaryNode(self, key, val, self) # add node at bottom
class BinaryNode:
def __init__(self, left, key, val, right):
self.key, self.val = key, val
self.left, self.right = left, right
def lookup(self, key):
if self.key == key:
return self.val
elif self.key > key:
return self.left.lookup(key) # look in left
else:
return self.right.lookup(key) # look in right
def insert(self, key, val):
if self.key == key:
self.val = val
elif self.key > key:
self.left = self.left.insert(key, val) # grow in left
elif self.key < key:
self.right = self.right.insert(key, val) # grow in right
return self
def __repr__(self):
return ('( %s, %s=%s, %s )' %
(repr(self.left), repr(self.key), repr(self.val), repr(self.right)))
if __name__ == '__main__':
t = KeyedBinaryTree()
for (key, val) in [('bbb', 1), ('aaa', 2), ('ccc', 3)]:
t.insert(key, val)
print(t)
print(t.lookup('aaa'), t.lookup('ccc'))
t.insert('ddd', 4)
t.insert('aaa', 5) # changes key's value
print(t)

The following shows this script’s self-test code at work; nodes
simply have more content this time around:

C:\...\PP4E\Dstruct\Classics>
python btreevals.py
( ( *, 'aaa'=2, * ), 'bbb'=1, ( *, 'ccc'=3, * ) )
2 3
( ( *, 'aaa'=5, * ), 'bbb'=1, ( *, 'ccc'=3, ( *, 'ddd'=4, * ) ) )

In fact, the effect is similar to the keys and values of a
built-in dictionary, but a custom tree structure like this might support
custom use cases and algorithms, as well as code evolution, more
robustly. To see a data structure that departs even further from the
built-in gang, though, we need to move on to the next
section.

Other books

The Complaints by Ian Rankin
Margaret Brownley by A Vision of Lucy
Stringer by Anjan Sundaram
The Nobodies Album by Carolyn Parkhurst
The King’s Sister by Anne O’Brien
Gold Diggers by Tasmina Perry
The Alien's Captive by Ruth Anne Scott
Primeras canciones by Federico García Lorca