Programming Python (173 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
10.25Mb size Format: txt, pdf, ePub
Shelve Files

Pickling allows you to
store arbitrary objects on files and file-like objects, but
it’s still a fairly unstructured medium; it doesn’t directly support easy
access to members of collections of pickled objects. Higher-level
structures can be added to pickling, but they are not inherent:

  • You can sometimes craft your own higher-level pickle file
    organizations with the underlying filesystem (e.g., you can store each
    pickled object in a file whose name uniquely identifies the object),
    but such an organization is not part of pickling itself and must be
    manually managed.

  • You can also store arbitrarily large dictionaries in a pickled
    file and index them by key after they are loaded back into memory, but
    this will load and store the entire dictionary all at once when
    unpickled and pickled, not just the entry you are interested
    in.

Shelves provide structure for collections of pickled objects that
removes some of these constraints. They are a type of file that stores
arbitrary Python objects by key for later retrieval, and they are a
standard part of the Python system. Really, they are not much of a new
topic—shelves are simply a combination of the DBM files and object
pickling we just met:

  • To
    store
    an in-memory object by key, the
    shelve
    module
    first serializes the object to a string with the
    pickle
    module, and then it stores
    that string in a DBM file by key with the
    dbm
    module.

  • To
    fetch
    an object back by key, the
    shelve
    module first loads the
    object’s serialized string by key from a DBM file with the
    dbm
    module, and then converts it back to the
    original in-memory object with the
    pickle
    module.

Because
shelve
uses
pickle
internally, it can store any object that
pickle
can: strings, numbers, lists,
dictionaries, cyclic objects, class instances, and more. Because shelve
uses
dbm
internally, it
inherits all of that module’s capabilities, as well as its portability
constraints.

Using Shelves

In other words,
shelve
is
just a go-between; it serializes and deserializes objects
so that they can be placed in string-based DBM files. The net effect is
that shelves let you store nearly arbitrary Python objects on a file by
key and fetch them back later with the
same
key
.

Your scripts never see all of this interfacing, though. Like DBM
files, shelves provide an interface that looks like a dictionary that
must be opened. In fact, a shelve is simply a persistent dictionary of
persistent Python objects—the shelve dictionary’s content is
automatically mapped to a file on your computer so that it is retained
between program runs. This is quite a feat, but it’s simpler to your
code than it may sound. To gain access to a shelve, import the module
and open your file:

import shelve
dbase = shelve.open("mydbase")

Internally, Python opens a DBM file with the name
mydbase
, or creates it if it does not yet exist (it
uses the DBM
'c'
input/output open
mode by default). Assigning to a shelve key stores an object:

dbase['key'] = object      # store object

Internally, this assignment converts the object to a serialized
byte stream with pickling and stores it by key on a DBM file. Indexing a
shelve fetches a stored object:

value = dbase['key']       # fetch object

Internally, this index operation loads a string by key from a DBM
file and unpickles it into an in-memory object that is the same as the
object originally stored. Most dictionary operations are supported here,
too:

len(dbase)                 # number of items stored
dbase.keys() # stored item key index iterable

And except for a few fine points, that’s really all there is to
using a shelve. Shelves are processed with normal Python dictionary
syntax, so there is no new database API to learn. Moreover, objects
stored and fetched from shelves are normal Python objects; they do not
need to be instances of special classes or types to be stored away. That
is, Python’s persistence system is external to the persistent objects
themselves.
Table 17-2
summarizes these and other
commonly used shelve
operations.

Table 17-2. Shelve file operations

Python
code

Action

Description

import shelve

Import

Get
bsddb
,
gdbm
, and so on…whatever is
installed

file=shelve.open('filename')

Open

Create or open an
existing shelve’s DBM file

file['key'] = anyvalue

Store

Create or change the
entry for
key

value = file['key']

Fetch

Load the value for the
entry
key

count = len(file)

Size

Return the number of
entries stored

index = file.keys()

Index

Fetch the stored keys
list (an iterable view)

found = 'key' in file

Query

See if there’s an entry
for
key

del file['key']

Delete

Remove the entry for
key

for key in file:

Iterate

Iterate over stored
keys

file.close()

Close

Manual close, not always
needed

Because shelves export a dictionary-like interface, too, this
table is almost identical to the DBM operation table. Here, though, the
module name
dbm
is replaced by
shelve
,
open
calls do not require a second
c
argument, and stored values can be nearly
arbitrary kinds of objects, not just strings. Keys are still strings,
though (technically, keys are always a
str
which is encoded to and from
bytes
automatically per UTF-8), and you still
should
close
shelves explicitly after
making changes to be safe: shelves use
dbm
internally, and
some underlying DBMs require closes to avoid data loss or
damage.

Note

Recent changes
: The
shelve
module now has an optional
writeback
argument; if passed
True
, all entries fetched are cached in
memory, and written back to disk automatically at close time. This
obviates the need to manually reassign changed mutable entries to
flush them to disk, but can perform poorly if many items are
fetched—it may require a large amount of memory for the cache, and it
can make the
close
operation slow
since all fetched entries must be written back to disk (Python cannot
tell which of the objects may have been changed).

Besides allowing values to be arbitrary objects instead of just
strings, in Python 3.X the shelve interface differs from the DBM
interface in two subtler ways. First, the
keys
method returns an iterable
view
object (not a physical list). Second, the
values of
keys
are always
str
in your code, not
bytes
—on fetches, stores, deletes, and other
contexts, the
str
keys you use are
encoded to the
bytes
expected by
DBM using the UTF-8 Unicode encoding. This means that unlike
dbm
, you cannot use
bytes
for
shelve
keys in your code to employ arbitrary
encodings.

Shelve keys are also decoded from
bytes
to
str
per UTF-8 whenever they are returned
from the shelve API (e.g., keys iteration). Stored
values
are always the
bytes
object produced by the pickler to
represent a serialized object. We’ll see these behaviors in action in
the examples of this
section
.

Storing Built-in Object Types in Shelves

Let’s run an interactive
session to experiment with shelve interfaces. As
mentioned, shelves are essentially just persistent dictionaries of
objects, which you open and close:

C:\...\PP4E\Dbase>
python
>>>
import shelve
>>>
dbase = shelve.open("mydbase")
>>>
object1 = ['The', 'bright', ('side', 'of'), ['life']]
>>>
object2 = {'name': 'Brian', 'age': 33, 'motto': object1}
>>>
dbase['brian'] = object2
>>>
dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'}
>>>
dbase.close()

Here, we open a shelve and store two fairly complex dictionary and
list data structures away permanently by simply assigning them to shelve
keys. Because
shelve
uses
pickle
internally, almost anything goes
here—the trees of nested objects are automatically serialized into
strings for storage. To fetch them back, just reopen the shelve and
index:

C:\...\PP4E\Dbase>
python
>>>
import shelve
>>>
dbase = shelve.open("mydbase")
>>>
len(dbase)
# entries
2
>>>
dbase.keys()
# index
KeysView()
>>>
list(dbase.keys())
['brian', 'knight']
>>>
dbase['knight']
# fetch
{'motto': 'Ni!', 'name': 'Knight'}
>>>
for row in dbase.keys():
# .keys() is optional
...
print(row, '=>')
...
for field in dbase[row].keys():
...
print(' ', field, '=', dbase[row][field])
...
brian =>
motto = ['The', 'bright', ('side', 'of'), ['life']]
age = 33
name = Brian
knight =>
motto = Ni!
name = Knight

The nested loops at the end of this session step through nested
dictionaries—the outer scans the shelve and the inner scans the objects
stored in the shelve (both could use key iterators and omit their
.keys()
calls). The crucial point to
notice is that we’re using normal Python syntax, both to store and to
fetch these persistent objects, as well as to process them after
loading. It’s persistent Python data on disk.

Storing Class Instances in Shelves

One of the more useful kinds of
objects to store in a shelve is a class instance. Because
its attributes record state and its inherited methods define behavior,
persistent class objects effectively serve the roles of both database
records
and database-processing
programs
. We can also use the underlying
pickle
module to serialize instances to flat
files and other file-like objects (e.g., network sockets), but the
higher-level
shelve
module also gives
us a convenient keyed-access storage medium. For instance, consider the
simple class shown in
Example 17-2
, which is used to
model people in a hypothetical work scenario.

Example 17-2. PP4E\Dbase\person.py (version 1)

"a person object: fields + behavior"
class Person:
def __init__(self, name, job, pay=0):
self.name = name
self.job = job
self.pay = pay # real instance data
def tax(self):
return self.pay * 0.25 # computed on call
def info(self):
return self.name, self.job, self.pay, self.tax()

Nothing about this class suggests it will be used for database
records—it can be imported and used independent of external storage.
It’s easy to use it for a database’s records, though: we can make some
persistent objects from this class by simply creating instances as
usual, and then storing them by key on an opened shelve:

C:\...\PP4E\Dbase>
python
>>>
from person import Person
>>>
bob = Person('bob', 'psychologist', 70000)
>>>
emily = Person('emily', 'teacher', 40000)
>>>
>>>
import shelve
>>>
dbase = shelve.open('cast')
# make new shelve
>>>
for obj in (bob, emily):
# store objects
...
dbase[obj.name] = obj
# use name for key
...
>>>
dbase.close()
# need for bsddb

Here we used the instance objects’
name
attribute as their key in the shelve
database. When we come back and fetch these objects in a later Python
session or script, they are re-created in memory exactly as they were
when they were stored:

C:\...\PP4E\Dbase>
python
>>>
import shelve
>>>
dbase = shelve.open('cast')
# reopen shelve
>>>
>>>
list(dbase.keys())
# both objects are here
['bob', 'emily']
>>>
print(dbase['emily'])

>>>
>>>
print(dbase['bob'].tax())
# call: bob's tax
17500.0

Notice that calling Bob’s
tax
method works even though we didn’t import the
Person
class in this last session. Python is
smart enough to link this object back to its original class when
unpickled, such that all the original methods are available through
fetched objects.

Other books

All In by Molly Bryant
Brian Garfield by Tripwire
Craving by Sofia Grey
The Faerie Queene by Edmund Spenser
Cold Dawn by Carla Neggers
Carnival-SA by Elizabeth Bear
Absolutely Almost by Lisa Graff
Seamless by Griffin, R. L.