Programming Python (180 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
9.56Mb size Format: txt, pdf, ePub
Customization: Performance Monitors

So far we’ve seen how
classes support multiple instances and integrate better
with Python’s object model by defining operator methods. One of the
other main reasons for using classes is to allow for future extensions
and customizations. By implementing stacks with a class, we can later
add subclasses that specialize the implementation for new demands. In
fact, this is often the main reason for using a custom class instead of
a built-in alternative.

For instance, suppose we’ve started using the
Stack
class in
Example 18-2
, but we start running
into performance problems. One way to isolate bottlenecks is to
instrument data structures with logic that keeps track of usage
statistics, which we can analyze after running client applications.
Because
Stack
is a class, we can add
such logic in a new subclass without affecting the original stack module
(or its clients). The subclass in
Example 18-3
extends
Stack
to keep track of overall push/pop usage
frequencies and to record the maximum size of each instance.

Example 18-3. PP4E\Dstruct\Basic\stacklog.py

"customize stack for usage data"
from stack2 import Stack # extends imported Stack
class StackLog(Stack): # count pushes/pops, max-size
pushes = pops = 0 # shared/static class members
def __init__(self, start=[]): # could also be module vars
self.maxlen = 0
Stack.__init__(self, start)
def push(self, object):
Stack.push(self, object) # do real push
StackLog.pushes += 1 # overall stats
self.maxlen = max(self.maxlen, len(self)) # per-instance stats
def pop(self):
StackLog.pops += 1 # overall counts
return Stack.pop(self) # not 'self.pops': instance
def stats(self):
return self.maxlen, self.pushes, self.pops # get counts from instance

This subclass works the same as the original
Stack
; it just adds monitoring logic. The new
stats
method is used to get a
statistics tuple through an instance:

>>>
from stacklog import StackLog
>>>
x = StackLog()
>>>
y = StackLog()
# make two stack objects
>>>
for i in range(3): x.push(i)
# and push object on them
...
>>>
for c in 'spam': y.push(c)
...
>>>
x, y
# run inherited __repr__
([Stack:[2, 1, 0]], [Stack:['m', 'a', 'p', 's']])
>>>
x.stats(), y.stats()
((3, 7, 0), (4, 7, 0))
>>>
>>>
y.pop(), x.pop()
('m', 2)
>>>
x.stats(), y.stats()
# my maxlen, all pushes, all pops
((3, 7, 2), (4, 7, 2))

Notice the use of
class
attributes to record
overall pushes and pops, and
instance
attributes
for per-instance maximum length. By hanging attributes on different
objects, we can expand or narrow their scopes.

Optimization: Tuple Tree Stacks

One of the nice things
about wrapping objects up in classes is that you are free
to change the underlying implementation without breaking the rest of
your program. Optimizations can be added in the future, for instance,
with minimal impact; the interface is unchanged, even if the internals
are. There are a variety of ways to implement stacks, some more
efficient than others. So far, our stacks have used slicing and extended
sequence assignment to implement pushing and popping. This method is
relatively inefficient: both operations make copies of the wrapped list
object. For large stacks, this practice can add a significant time
penalty.

One way to speed up such code is to change the underlying data
structure completely. For example, we can store the stacked objects in a
binary tree of tuples: each item may be recorded as a pair,
(object, tree)
, where
object
is the stacked item and
tree
is either another tuple pair giving the
rest of the stack or
None
to
designate an empty stack. A stack of items
[1,2,3,4]
would be internally stored as a
tuple tree
(1,(2,(3,(4,None))))
.

This tuple-based representation is similar to the notion of
“cons-cells” in Lisp-family languages: the object on the left is the
car
, and the rest of the tree on the
right is the
cdr
. Because we add or
remove only a top tuple to push and pop items, this structure avoids
copying the entire stack. For large stacks, the benefit might be
significant. The next class, shown in
Example 18-4
, implements these
ideas.

Example 18-4. PP4E\Dstruct\Basic\stack3.py

"optimize with tuple pair trees"
class Stack:
def __init__(self, start=[]): # init from any sequence
self.stack = None # even other (fast)stacks
for i in range(-len(start), 0):
self.push(start[-i - 1]) # push in reverse order
def push(self, node): # grow tree 'up/left'
self.stack = node, self.stack # new root tuple: (node, tree)
def pop(self):
node, self.stack = self.stack # remove root tuple
return node # TypeError if empty
def empty(self):
return not self.stack # is it 'None'?
def __len__(self): # on: len, not
len, tree = 0, self.stack
while tree:
len, tree = len+1, tree[1] # visit right subtrees
return len
def __getitem__(self, index): # on: x[i], in, for
len, tree = 0, self.stack
while len < index and tree: # visit/count nodes
len, tree = len+1, tree[1]
if tree:
return tree[0] # IndexError if out-of-bounds
else:
raise IndexError() # so 'in' and 'for' stop
def __repr__(self):
return '[FastStack:' + repr(self.stack) + ']'

This class’s
__getitem__
method
handles indexing,
in
tests, and
for
loop iteration as before (when no
__iter__
is defined), but this
version has to traverse a tree to find a node by index. Notice that this
isn’t a subclass of the original
Stack
class. Since nearly every operation is
implemented differently here, inheritance won’t really help. But clients
that restrict themselves to the operations that are common to both
classes can still use them interchangeably—they just need to import a
stack class from a different module to switch implementations. Here’s a
session with this stack version; as long as we stick to pushing,
popping, indexing, and iterating, this version is essentially
indistinguishable from the
original:

>>>
from stack3 import Stack
>>>
x = Stack()
>>>
y = Stack()
>>>
for c in 'spam': x.push(c)
...
>>>
for i in range(3): y.push(i)
...
>>>
x
[FastStack:('m', ('a', ('p', ('s', None))))]
>>>
y
[FastStack:(2, (1, (0, None)))]
>>>
len(x), x[2], x[-1]
(4, 'p', 'm')
>>>
x.pop()
'm'
>>>
x
[FastStack:('a', ('p', ('s', None)))]
>>>
>>>
while y: print(y.pop(), end=' ')
...
2 1 0 >>>
Optimization: In-Place List Modifications

The last section
tried to speed up pushes and pops with a different data
structure, but we might also be able to speed up our stack object by
falling back on the mutability of Python’s list object. Because lists
can be changed in place, they can be modified more quickly than any of
the prior examples. In-place change operations such as
append
are prone to complications when a list
is referenced from more than one place. But because the list inside the
stack object isn’t meant to be used directly, we’re probably
safe.

The module in
Example 18-5
shows one way to
implement a stack with in-place changes; some operator overloading
methods have been dropped to keep this simple. The Python
pop
method it uses is equivalent to indexing
and deleting the item at
offset −1
(top is end-of-list here). Compared to using built-in lists directly,
this class incurs some performance degradation for the extra method
calls, but it supports future changes better by encapsulating stack
operations.

Example 18-5. PP4E\Dstruct\Basic\stack4.py

"optimize with in-place list operations"
class error(Exception): pass # when imported: local exception
class Stack:
def __init__(self, start=[]): # self is the instance object
self.stack = [] # start is any sequence: stack...
for x in start: self.push(x)
def push(self, obj): # methods: like module + self
self.stack.append(obj) # top is end of list
def pop(self):
if not self.stack: raise error('underflow')
return self.stack.pop() # like fetch and delete stack[-1]
def top(self):
if not self.stack: raise error('underflow')
return self.stack[-1]
def empty(self):
return not self.stack # instance.empty()
def __len__(self):
return len(self.stack) # len(instance), not instance
def __getitem__(self, offset):
return self.stack[offset] # instance[offset], in, for
def __repr__(self):
return '[Stack:%s]' % self.stack

This version works like the original in module
stack2
, too; just replace
stack2
with
stack4
in the previous interaction to get a
feel for its operation. The only obvious difference is that stack items
are in reverse when printed (i.e., the top is
the end):

>>>
from stack4 import Stack
>>>
x = Stack()
>>>
x.push('spam')
>>>
x.push(123)
>>>
x
[Stack:['spam', 123]]
>>>
>>>
y = Stack()
>>>
y.push(3.1415)
>>>
y.push(x.pop())
>>>
x, y
([Stack:['spam']], [Stack:[3.1415, 123]])
>>>
y.top()
123
Timing the Improvements

The prior
section’s in-place changes stack object probably runs
faster than both the original and the tuple-tree versions, but the only
way to really be sure is to time the alternative
implementations.
[
70
]
Since this could be something we’ll want to do more than
once, let’s first define a general module for timing functions in
Python. In
Example 18-6
, the
built-in
time
module provides a
clock
function that we can use to get
the current CPU time in floating-point seconds, and the function
timer.test
simply calls a function
reps
times and returns the number of
elapsed seconds by subtracting stop from start times.

Example 18-6. PP4E\Dstruct\Basic\timer.py

"generic code timer tool"
def test(reps, func, *args): # or best of N? see Learning Python
import time
start = time.clock() # current CPU time in float seconds
for i in range(reps): # call function reps times
func(*args) # discard any return value
return time.clock() - start # stop time - start time

There are other ways to time code, including a best-of-N approach
and Python’s own
timeit
module, but
this module suffices for our purpose here. If you’re interested in doing
better, see
Learning
Python
, Fourth Edition, for a larger case study on
this topic, or experiment on your own.

Next, we define a test driver script which deploys the timer, in
Example 18-7
. It expects
three command-line arguments: the number of pushes, pops, and indexing
operations to perform (we’ll vary these arguments to test different
scenarios). When run at the top level, the script creates 200 instances
of the original and optimized stack classes and performs the specified
number of operations on each stack. Pushes and pops change the stack;
indexing just accesses it.

Example 18-7. PP4E\Dstruct\Basic\stacktime.py

"compare performance of stack alternatives"
import stack2 # list-based stacks: [x]+y
import stack3 # tuple-tree stacks: (x,y)
import stack4 # in-place stacks: y.append(x)
import timer # general function timer utility
rept = 200
from sys import argv
pushes, pops, items = (int(arg) for arg in argv[1:])
def stackops(stackClass):
x = stackClass('spam') # make a stack object
for i in range(pushes): x.push(i) # exercise its methods
for i in range(items): t = x[i] # 3.X: range generator
for i in range(pops): x.pop()
# or mod = __import__(n)
for mod in (stack2, stack3, stack4): # rept*(push+pop+ix)
print('%s:' % mod.__name__, end=' ')
print(timer.test(rept, stackops, getattr(mod, 'Stack')))
Results under Python 3.1

The following are some of the timings reported by the test
driver script. The three outputs represent the measured run times in
seconds for the original, tuple, and in-place stacks. For each stack
type, the first test creates 200 stack objects and performs roughly
120,000 stack operations (200 repetitions × (200 pushes + 200 indexes
+ 200 pops)) in the test duration times listed. These results were
obtained on a fairly slow Windows 7 netbook laptop under Python 3.1;
as usual for benchmarks, your mileage will probably vary.

C:\...\PP4E\Dstruct\Basic>
python stacktime.py 200 200 200
stack2: 0.838853884098
stack3: 2.52424649244
stack4: 0.215801718938
C:\...\PP4E\Dstruct\Basic>
python stacktime.py 200 50 200
stack2: 0.775219065818
stack3: 2.539294115
stack4: 0.156989574341
C:\...\PP4E\Dstruct\Basic>
python stacktime.py 200 200 50
stack2: 0.743521212289
stack3: 0.286850521181
stack4: 0.156262000363
C:\...\PP4E\Dstruct\Basic>
python stacktime.py 200 200 0
stack2: 0.721035029026
stack3: 0.116366779208
stack4: 0.141471921584

If you look closely enough, you’ll notice that the results show
that the tuple-based stack (
stack3
)
performs better when we do more pushing and popping, but worse if we
do much indexing. Indexing lists is extremely fast for built-in lists
(
stack2
and
stack4
), but very slow for tuple trees—the
Python class must traverse the tree manually.

The in-place change stacks (
stack4
) are
almost
always fastest, unless no indexing is done at all—tuples (
stack3
) win by a hair in the last test case.
When there is no indexing, as in the last test, the tuple and in-place
change stacks are roughly six and five times quicker than the simple
list-based stack, respectively. Since pushes and pops are most of what
clients would normally do to a stack, tuples are a contender here,
despite their poor indexing performance.

Of course, we’re talking about fractions of a second after many
tens of thousands of operations; in many applications, your users
probably won’t care either way. If you access a stack millions of
times in your program, though, this difference may accumulate to a
significant amount of time.

More on performance analysis

Two last notes on
performance here. Although absolute times have changed
over the years with new Pythons and test machines, these results have
remained relatively the same. That is, tuple-based stacks win when no
indexing is performed. All performance measurements in a dynamic
language like Python are prone to change over time, though, so run
such tests on your own for more accurate results.

Second, there’s often more to performance measurement than
timing alternatives this way. For a more complete picture, read about
Python’s standard library
profile
module (and its optimized workalike,
cProfile
). The profilers run Python code,
collect performance data along the way, and provide it in a report on
code exit. It’s the most complete way to isolate your code’s
bottleneck, before you start working on optimizing with better coding,
algorithms, and data structures or moving portions to the
C
language
.

For simple performance analysis, though, our timing module
provides the data we need. In fact, we’ll reuse it to measure a more
dramatic improvement in relative speed for set implementation
alternatives—the topic of the next
section.

Other books

The Wives of Los Alamos by Nesbit, Tarashea
Jubilee by Eliza Graham
Let Down Your Hair by Fiona Price
A Vampire's Promise by Carla Susan Smith
The Fall: Victim Zero by Joshua Guess
Singled Out by Simon Brett