Programming Python (6 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
10.69Mb size Format: txt, pdf, ePub
Using Per-Record Pickle Files

As mentioned
earlier, one potential disadvantage of this section’s
examples so far is that they may become slow for very large databases:
because the entire database must be loaded and rewritten to update a
single record, this approach can waste time. We could improve on this by
storing each record in the database in a separate flat file. The next
three examples show one way to do so;
Example 1-8
stores each record in its
own flat file, using each record’s original key as its filename with a
.pkl
appended (it creates the files
bob.pkl, sue.pkl
, and
tom.pkl
in the current working directory).

Example 1-8. PP4E\Preview\make_db_pickle_recs.py

from initdata import bob, sue, tom
import pickle
for (key, record) in [('bob', bob), ('tom', tom), ('sue', sue)]:
recfile = open(key + '.pkl', 'wb')
pickle.dump(record, recfile)
recfile.close()

Next,
Example 1-9
dumps
the entire database by using the standard library’s
glob
module to do filename expansion and thus collect all the
files in this directory with a
.pkl
extension. To
load a single record, we open its file and deserialize with
pickle
; we must load only one record file,
though, not the entire database, to fetch one record.

Example 1-9. PP4E\Preview\dump_db_pickle_recs.py

import pickle, glob
for filename in glob.glob('*.pkl'): # for 'bob','sue','tom'
recfile = open(filename, 'rb')
record = pickle.load(recfile)
print(filename, '=>\n ', record)
suefile = open('sue.pkl', 'rb')
print(pickle.load(suefile)['name']) # fetch sue's name

Finally,
Example 1-10
updates the database by fetching a record from its file, changing it in
memory, and then writing it back to its pickle file. This time, we have
to fetch and rewrite only a single record file, not the full database,
to update.

Example 1-10. PP4E\Preview\update_db_pickle_recs.py

import pickle
suefile = open('sue.pkl', 'rb')
sue = pickle.load(suefile)
suefile.close()
sue['pay'] *= 1.10
suefile = open('sue.pkl', 'wb')
pickle.dump(sue, suefile)
suefile.close()

Here are our file-per-record scripts in action; the results are
about the same as in the prior section, but database keys become real
filenames now. In a sense, the filesystem becomes our top-level
dictionary—filenames provide direct access to each
record.

...\PP4E\Preview>
python make_db_pickle_recs.py
...\PP4E\Preview>
python dump_db_pickle_recs.py
bob.pkl =>
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue.pkl =>
{'pay': 40000, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom.pkl =>
{'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones
...\PP4E\Preview>
python update_db_pickle_recs.py
...\PP4E\Preview>
python dump_db_pickle_recs.py
bob.pkl =>
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue.pkl =>
{'pay':
44000.0
, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom.pkl =>
{'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones
Using Shelves

Pickling
objects to files, as shown in the preceding section, is an
optimal scheme in many applications. In fact, some applications use
pickling of Python objects across network sockets as a simpler
alternative to network protocols such as the SOAP and XML-RPC web
services architectures (also supported by Python, but much heavier than
pickle
).

Moreover, assuming your filesystem can handle as many files as
you’ll need, pickling one record per file also obviates the need to load
and store the entire database for each update. If we really want keyed
access to records, though, the Python standard library offers an even
higher-level tool: shelves.

Shelves automatically pickle objects to and from a keyed-access
filesystem. They behave much like dictionaries that must be opened, and
they persist after each program exits. Because they give us key-based
access to stored records, there is no need to manually manage one flat
file per record—the shelve system automatically splits up stored records
and fetches and updates only those records that are accessed and
changed. In this way, shelves provide utility similar to per-record
pickle files, but they are usually easier to code.

The
shelve
interface is just as
simple as
pickle
: it is identical to
dictionaries, with extra open and close calls. In fact, to your code, a
shelve really does appear to be a persistent dictionary of persistent
objects; Python does all the work of mapping its content to and from a
file. For instance,
Example 1-11
shows how to store our in-memory dictionary objects in a shelve for
permanent keeping.

Example 1-11. PP4E\Preview\make_db_shelve.py

from initdata import bob, sue
import shelve
db = shelve.open('people-shelve')
db['bob'] = bob
db['sue'] = sue
db.close()

This script creates one or more files in the current directory
with the name
people-shelve
as a prefix (in Python
3.1 on Windows,
people-shelve.bak
,
people-shelve.dat
, and
people-shelve.dir
). You shouldn’t delete
these files (they are your database!), and you should be sure to use the
same base name in other scripts that access the shelve.
Example 1-12
, for instance,
reopens the shelve and indexes it by key to fetch its stored
records.

Example 1-12. PP4E\Preview\dump_db_shelve.py

import shelve
db = shelve.open('people-shelve')
for key in db:
print(key, '=>\n ', db[key])
print(db['sue']['name'])
db.close()

We still have a dictionary of dictionaries here, but the top-level
dictionary is really a shelve mapped onto a file. Much happens when you
access a shelve’s keys—it uses
pickle
internally to serialize and deserialize objects stored, and it
interfaces with a keyed-access filesystem. From your perspective,
though, it’s just a persistent dictionary.
Example 1-13
shows how to code shelve
updates.

Example 1-13. PP4E\Preview\update_db_shelve.py

from initdata import tom
import shelve
db = shelve.open('people-shelve')
sue = db['sue'] # fetch sue
sue['pay'] *= 1.50
db['sue'] = sue # update sue
db['tom'] = tom # add a new record
db.close()

Notice how this code fetches
sue
by key, updates in memory, and then
reassigns to the key to update the shelve; this is a requirement of
shelves by default, but not always of more advanced shelve-like systems
such as ZODB, covered in
Chapter 17
.
As we’ll see later
,
shelve.open
also has
a newer
writeback
keyword argument,
which, if passed
True
, causes all
records loaded from the shelve to be cached in memory, and automatically
written back to the shelve when it is closed; this avoids manual write
backs on changes, but can consume memory and make closing slow.

Also note how shelve files are explicitly closed. Although we
don’t need to pass mode flags to
shelve.open
(by default it creates the shelve
if needed, and opens it for reads and writes otherwise), some underlying
keyed-access filesystems may require a
close
call in order to flush output buffers
after changes.

Finally, here are the shelve-based scripts on the job, creating,
changing, and fetching records. The records are still dictionaries, but
the database is now a dictionary-like shelve which automatically retains
its state in a file between program runs:

...\PP4E\Preview>
python make_db_shelve.py
...\PP4E\Preview>
python dump_db_shelve.py
bob =>
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
{'pay': 40000, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
Sue Jones
...\PP4E\Preview>
python update_db_shelve.py
...\PP4E\Preview>
python dump_db_shelve.py
bob =>
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
{'pay':
60000.0
, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
{'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones

When we ran the update and dump scripts here, we added a new
record for key
tom
and increased
Sue’s pay field by 50 percent. These changes are permanent because the
record dictionaries are mapped to an external file by shelve. (In fact,
this is a particularly good script for Sue—something she might consider
scheduling to run often, using a cron job on Unix, or a Startup folder
or msconfig entry on
Windows…)

What’s in a Name?

Though it’s a surprisingly well-kept secret, Python
gets its name from the 1970s British TV comedy series
Monty Python’s Flying Circus
. According to Python
folklore, Guido
van Rossum, Python’s creator, was watching reruns of the
show at about the same time he needed a name for a new language he was
developing. And as they say in show business, “the rest is
history.”

Because of this heritage, references to the comedy group’s work
often show up in examples and discussion. For instance, the name
Brian
appears often in scripts; the words
spam, lumberjack
, and
shrubbery
have a special connotation to Python
users; and presentations are sometimes referred to as
The
Spanish Inquisition
. As a rule, if a Python user starts
using phrases that have no relation to reality, they’re probably
borrowed from the Monty Python series or movies. Some of these phrases
might even pop up in this book. You don’t have to run out and rent
The Meaning of Life
or
The Holy
Grail
to do useful work in Python, of course, but it can’t
hurt.

While “Python” turned out to be a distinctive name, it has also
had some interesting side effects. For instance, when the Python
newsgroup, comp.lang.python, came online in 1994, its first few weeks
of activity were almost entirely taken up by people wanting to discuss
topics from the TV show. More recently, a special Python supplement in
the
Linux Journal
magazine featured photos of
Guido garbed in an obligatory “nice red uniform.”

Python’s news list still receives an occasional post from fans
of the show. For instance, one early poster innocently offered to swap
Monty Python scripts with other fans. Had he known the nature of the
forum, he might have at least mentioned whether they were portable or
not.

Step 3: Stepping Up to OOP

Let’s step back
for a moment and consider how far we’ve come. At this point,
we’ve created a database of records: the shelve, as well as per-record
pickle file approaches of the prior section suffice for basic data storage
tasks. As is, our records are represented as simple dictionaries, which
provide easier-to-understand access to fields than do lists (by key,
rather than by position). Dictionaries, however, still have some
limitations that may become more critical as our program grows over
time.

For one thing, there is no central place for us to collect record
processing logic. Extracting last names and giving raises, for instance,
can be accomplished with code like the following:

>>>
import shelve
>>>
db = shelve.open('people-shelve')
>>>
bob = db['bob']
>>>
bob['name'].split()[-1]
# get bob's last name
'Smith'
>>>
sue = db['sue']
>>>
sue['pay'] *= 1.25
# give sue a raise
>>>
sue['pay']
75000.0
>>>
db['sue'] = sue
>>>
db.close()

This works, and it might suffice for some short programs. But if we
ever need to change the way last names and raises are implemented, we
might have to update this kind of code in many places in our program. In
fact, even finding all such magical code snippets could be a challenge;
hardcoding or cutting and pasting bits of logic redundantly like this in
more than one place will almost always come back to haunt you
eventually.

It would be better to somehow hide—that is,
encapsulate
—such bits of code. Functions in a module
would allow us to implement such operations in a single place and thus
avoid code redundancy, but still wouldn’t naturally associate them with
the records themselves. What we’d like is a way to bind processing logic
with the data stored in the database in order to make it easier to
understand, debug, and reuse.

Another downside to using dictionaries for records is that they are
difficult to expand over time. For example, suppose that the set of data
fields or the procedure for giving raises is different for different kinds
of people (perhaps some people get a bonus each year and some do not). If
we ever need to extend our program, there is no natural way to customize
simple dictionaries. For future growth, we’d also like our software to
support extension and customization in a natural way.

If you’ve already studied Python in any sort of depth, you probably
already know that this is where its OOP support begins to become
attractive:

Structure

With OOP, we can naturally associate processing logic with
record data—classes provide both a program unit that combines logic
and data in a single package and a hierarchy that allows code to be
easily factored to avoid redundancy.

Encapsulation

With OOP, we can also wrap up details such as name processing
and pay increases behind method functions—i.e., we are free to
change method implementations without breaking their users.

Customization

And with OOP, we have a natural growth path. Classes can be
extended and customized by coding new subclasses, without changing
or breaking already working code.

That is, under OOP, we program by customizing and reusing, not by
rewriting. OOP is an option in Python and, frankly, is sometimes better
suited for strategic than for tactical tasks. It tends to work best when
you have time for upfront planning—something that might be a luxury if
your users have already begun storming the gates.

But especially for larger systems that change over time, its code
reuse and structuring advantages far outweigh its learning curve, and it
can substantially cut development time. Even in our simple case, the
customizability and reduced redundancy we gain from classes can be a
decided
advantage.

Using Classes

OOP is easy to
use in Python, thanks largely to Python’s dynamic typing
model. In fact, it’s so easy that we’ll jump right into an example:
Example 1-14
implements our
database records as class instances rather than as dictionaries.

Example 1-14. PP4E\Preview\person_start.py

class Person:
def __init__(self, name, age, pay=0, job=None):
self.name = name
self.age = age
self.pay = pay
self.job = job
if __name__ == '__main__':
bob = Person('Bob Smith', 42, 30000, 'software')
sue = Person('Sue Jones', 45, 40000, 'hardware')
print(bob.name, sue.pay)
print(bob.name.split()[-1])
sue.pay *= 1.10
print(sue.pay)

There is not much to this class—just a constructor method that
fills out the instance with data passed in as arguments to the class
name. It’s sufficient to represent a database record, though, and it can
already provide tools such as defaults for pay and job fields that
dictionaries cannot. The self-test code at the bottom of this file
creates two instances (records) and accesses their attributes (fields);
here is this file’s output when run under IDLE (a system command-line
works just as well):

Bob Smith 40000
Smith
44000.0

This isn’t a database yet, but we could stuff these objects into a
list or dictionary as before in order to collect them as a unit:

>>>
from person_start import Person
>>>
bob = Person('Bob Smith', 42)
>>>
sue = Person('Sue Jones', 45, 40000)
>>>
people = [bob, sue]
# a "database" list
>>>
for person in people:
print(person.name, person.pay)
Bob Smith 0
Sue Jones 40000
>>>
x = [(person.name, person.pay) for person in people]
>>>
x
[('Bob Smith', 0), ('Sue Jones', 40000)]
>>>
[rec.name for rec in people if rec.age >= 45]
# SQL-ish query
['Sue Jones']
>>>
[(rec.age ** 2 if rec.age >= 45 else rec.age) for rec in people]
[42, 2025]

Notice that Bob’s pay defaulted to zero this time because we
didn’t pass in a value for that argument (maybe Sue is supporting him
now?). We might also implement a class that represents the database,
perhaps as a subclass of the built-in list or dictionary types, with
insert and delete methods that encapsulate the way the database is
implemented. We’ll abandon this path for now, though, because it will be
more useful to store these records persistently in a shelve, which
already encapsulates stores and fetches behind an interface for us.
Before we do, though, let’s add some
logic.

Adding Behavior

So far, our
class is just data: it replaces dictionary keys with
object attributes, but it doesn’t add much to what we had before. To
really leverage the power of classes, we need to add some behavior. By
wrapping up bits of behavior in class method functions, we can insulate
clients from changes. And by packaging methods in classes along with
data, we provide a natural place for readers to look for code. In a
sense, classes combine records and the programs that process those
records; methods provide logic that interprets and updates the data (we
say they are
object-oriented
, because they always
process an object’s data).

For instance,
Example 1-15
adds the last-name and raise logic as class methods; methods use the
self
argument to access or update the
instance (record) being processed.

Example 1-15. PP4E\Preview\person.py

class Person:
def __init__(self, name, age, pay=0, job=None):
self.name = name
self.age = age
self.pay = pay
self.job = job
def lastName(self):
return self.name.split()[-1]
def giveRaise(self, percent):
self.pay *= (1.0 + percent)
if __name__ == '__main__':
bob = Person('Bob Smith', 42, 30000, 'software')
sue = Person('Sue Jones', 45, 40000, 'hardware')
print(bob.name, sue.pay)
print(bob.lastName())
sue.giveRaise(.10)
print(sue.pay)

The output of this script is the same as the last, but the results
are being computed by methods now, not by hardcoded logic that appears
redundantly wherever it is required:

Bob Smith 40000
Smith
44000.0
Adding Inheritance

One last
enhancement to our records before they become permanent:
because they are implemented as classes now, they naturally support
customization through the inheritance search mechanism in Python.
Example 1-16
, for instance, customizes
the last section’s
Person
class in
order to give a 10 percent bonus by default to managers whenever they
receive a raise (any relation to practice in the real world is purely
coincidental).

Example 1-16. PP4E\Preview\manager.py

from person import Person
class Manager(Person):
def giveRaise(self, percent, bonus=0.1):
self.pay *= (1.0 + percent + bonus)
if __name__ == '__main__':
tom = Manager(name='Tom Doe', age=50, pay=50000)
print(tom.lastName())
tom.giveRaise(.20)
print(tom.pay)

When run, this script’s self-test prints the following:

Doe
65000.0

Here, the
Manager
class appears
in a module of its own, but it could have been added to the
person
module instead (Python doesn’t require
just one class per file). It inherits the constructor and last-name
methods from its superclass, but it customizes just the
giveRaise
method (there are a variety of ways
to code this extension, as we’ll see later). Because this change is
being added as a new subclass, the original
Person
class, and any objects generated from
it, will continue working unchanged. Bob and Sue, for example, inherit
the original raise logic, but Tom gets the custom version because of the
class from which he is created. In OOP, we program by
customizing
, not by changing.

In fact, code that uses our objects doesn’t need to be at all
aware of what the raise method does—it’s up to the object to do the
right thing based on the class from which it is created. As long as the
object supports the expected interface (here, a method called
giveRaise
), it will be compatible with the
calling code, regardless of its specific type, and even if its method
works differently than others.

If you’ve already studied Python, you may know this behavior as
polymorphism
; it’s a core property of the language,
and it accounts for much of your code’s flexibility. When the following
code calls the
giveRaise
method, for
example, what happens depends on the
obj
object being processed; Tom gets a 20
percent raise instead of 10 percent because of the
Manager
class’s customization:

>>>
from person import Person
>>>
from manager import Manager
>>>
bob = Person(name='Bob Smith', age=42, pay=10000)
>>>
sue = Person(name='Sue Jones', age=45, pay=20000)
>>>
tom = Manager(name='Tom Doe', age=55, pay=30000)
>>>
db = [bob, sue, tom]
>>>
for obj in db:
obj.giveRaise(.10)
# default or custom
>>>
for obj in db:
print(obj.lastName(), '=>', obj.pay)
Smith => 11000.0
Jones => 22000.0
Doe => 36000.0
Refactoring Code

Before we
move on, there are a few coding alternatives worth noting
here. Most of these underscore the Python OOP model, and they serve as a
quick review.

Augmenting methods

As a first
alternative, notice that we have introduced some
redundancy in
Example 1-16
:
the raise calculation is now repeated in two places (in the two
classes). We could also have implemented the customized
Manager
class by
augmenting
the inherited raise method instead of
replacing it completely:

class Manager(Person):
def giveRaise(self, percent, bonus=0.1):
Person.giveRaise(self, percent + bonus)

The trick here is to call back the superclass’s version of the
method directly, passing in the
self
argument explicitly. We still redefine
the method, but we simply run the general version after adding 10
percent (by default) to the passed-in percentage. This coding pattern
can help reduce code redundancy (the original raise method’s logic
appears in only one place and so is easier to change) and is
especially handy for kicking off superclass constructor methods in
practice.

If you’ve already studied Python OOP, you know that this coding
scheme works because we can always call methods through either an
instance or the class name. In general, the following are equivalent,
and both forms may be used explicitly:

instance.method(arg1, arg2)
class.method(instance, arg1, arg2)

In fact, the first form is mapped to the second—when calling
through the instance, Python determines the class by searching the
inheritance tree for the method name and passes in the instance
automatically. Either way, within
giveRaise
,
self
refers to the instance that is the
subject of the call.

Display format

For more
object-oriented fun, we could also add a few operator
overloading methods to our people classes. For example, a
__str__
method, shown here, could return a
string to give the display format for our objects when they are
printed as a whole—much better than the default display we get for an
instance:

class Person:
def __str__(self):
return '<%s => %s>' % (self.__class__.__name__, self.name)
tom = Manager('Tom Jones', 50)
print(tom) # prints: Tom Jones>

Here
__class__
gives the
lowest class from which
self
was
made, even though
__str__
may be
inherited. The net effect is that
__str__
allows us to print instances
directly instead of having to print specific attributes. We could
extend this
__str__
to loop through
the instance’s
__dict__
attribute
dictionary to display all attributes generically; for this preview
we’ll leave this as a suggested exercise.

We might even code an
__add__
method to make
+
expressions
automatically call the
giveRaise
method. Whether we should is another question; the fact that a
+
expression gives a person a raise
might seem more magical to the next person reading our code than it
should.

Constructor customization

Finally, notice
that we didn’t pass the
job
argument when making a manager in
Example 1-16
; if we had, it would
look like this with keyword arguments:

tom = Manager(name='Tom Doe', age=50, pay=50000, job='manager')

The reason we didn’t include a job in the example is that it’s
redundant with the class of the object: if someone is a manager, their
class should imply their job title. Instead of leaving this field
blank, though, it may make more sense to provide an explicit
constructor for managers, which fills in this field
automatically:

class Manager(Person):
def __init__(self, name, age, pay):
Person.__init__(self, name, age, pay, 'manager')

Now when a manager is created, its job is filled in
automatically. The trick here is to call to the superclass’s version
of the method explicitly, just as we did for the
give
Raise
method earlier in this section;
the only difference here is the unusual name for the constructor
method.

Alternative classes

We won’t
use any of this section’s three extensions in later
examples, but to demonstrate how they work,
Example 1-17
collects these ideas in
an alternative implementation of our
Person
classes.

Example 1-17. PP4E\Preview\person_alternative.py

"""
Alternative implementation of person classes, with data, behavior,
and operator overloading (not used for objects stored persistently)
"""
class Person:
"""
a general person: data+logic
"""
def __init__(self, name, age, pay=0, job=None):
self.name = name
self.age = age
self.pay = pay
self.job = job
def lastName(self):
return self.name.split()[-1]
def giveRaise(self, percent):
self.pay *= (1.0 + percent)
def __str__(self):
return ('<%s => %s: %s, %s>' %
(self.__class__.__name__, self.name, self.job, self.pay))
class Manager(Person):
"""
a person with custom raise
inherits general lastname, str
"""
def __init__(self, name, age, pay):
Person.__init__(self, name, age, pay, 'manager')
def giveRaise(self, percent, bonus=0.1):
Person.giveRaise(self, percent + bonus)
if __name__ == '__main__':
bob = Person('Bob Smith', 44)
sue = Person('Sue Jones', 47, 40000, 'hardware')
tom = Manager(name='Tom Doe', age=50, pay=50000)
print(sue, sue.pay, sue.lastName())
for obj in (bob, sue, tom):
obj.giveRaise(.10) # run this obj's giveRaise
print(obj) # run common __str__ method

Notice the polymorphism in this module’s self-test loop: all
three objects share the constructor, last-name, and printing methods,
but the raise method called is dependent upon the class from which an
instance is created. When run,
Example 1-17
prints the following to
standard output—the manager’s job is filled in at construction, we get
the new custom display format for our objects, and the new version of
the manager’s raise method works as before:

 Sue Jones: hardware, 40000> 40000 Jones
Bob Smith: None, 0.0>
Sue Jones: hardware, 44000.0>
Tom Doe: manager, 60000.0>

Such
refactoring
(restructuring) of code is
common as class hierarchies grow and evolve. In fact, as is, we still
can’t give someone a raise if his pay is zero (Bob is out of luck); we
probably need a way to set pay, too, but we’ll leave such extensions
for the next release. The good news is that Python’s flexibility and
readability make refactoring
easy—
it’s simple and quick to
restructure your code. If you haven’t used the language yet, you’ll
find that Python development is largely an exercise in rapid,
incremental, and interactive programming, which is well suited to the
shifting needs of real-world
projects.

Other books

Break No Bones by Kathy Reich
Memory Hunted by Christopher Kincaid
Armed With Steele by Kyra Jacobs
Burning September by Melissa Simonson
Wildfire by Billie Green
One Snowy Night by Grange, Amanda
Blood & Steel by Angela Knight