Programming Python (44 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
10.37Mb size Format: txt, pdf, ePub
Global Replacements in Directory Trees (Visitor)

But since I brought it up:
given a general tree traversal class, it’s easy to code a
global search-and-replace subclass, too. The
ReplaceVisitor
class
in
Example 6-20
is a
SearchVisitor
subclass that
customizes the
visitfile
method to
globally replace any appearances of one string with another, in all text
files at and below a root directory. It also collects the names of all
files that were changed in a list just in case you wish to go through
and verify the automatic edits applied (a text editor could be
automatically popped up on each changed file, for instance).

Example 6-20. PP4E\Tools\visitor_replace.py

"""
Use: "python ...\Tools\visitor_replace.py rootdir fromStr toStr".
Does global search-and-replace in all files in a directory tree: replaces
fromStr with toStr in all text files; this is powerful but dangerous!!
visitor_edit.py runs an editor for you to verify and make changes, and so
is safer; use visitor_collect.py to simply collect matched files list;
listonly mode here is similar to both SearchVisitor and CollectVisitor;
"""
import sys
from visitor import SearchVisitor
class ReplaceVisitor(SearchVisitor):
"""
Change fromStr to toStr in files at and below startDir;
files changed available in obj.changed list after a run
"""
def __init__(self, fromStr, toStr, listOnly=False, trace=0):
self.changed = []
self.toStr = toStr
self.listOnly = listOnly
SearchVisitor.__init__(self, fromStr, trace)
def visitmatch(self, fname, text):
self.changed.append(fname)
if not self.listOnly:
fromStr, toStr = self.context, self.toStr
text = text.replace(fromStr, toStr)
open(fname, 'w').write(text)
if __name__ == '__main__':
listonly = input('List only?') == 'y'
visitor = ReplaceVisitor(sys.argv[2], sys.argv[3], listonly)
if listonly or input('Proceed with changes?') == 'y':
visitor.run(startDir=sys.argv[1])
action = 'Changed' if not listonly else 'Found'
print('Visited %d files' % visitor.fcount)
print(action, '%d files:' % len(visitor.changed))
for fname in visitor.changed: print(fname)

To run this script over a directory tree, run the following sort
of command line with appropriate “from” and “to” strings. On my
shockingly underpowered netbook machine, doing this on a 1429-file tree
and changing 101 files along the way takes roughly three seconds of real
clock time when the system isn’t particularly busy.

C:\...\PP4E\Tools>
visitor_replace.py C:\temp\PP3E\Examples PP3E PP4E
List only?
y
Visited 1429 files
Found 101 files:
C:\temp\PP3E\Examples\README-root.txt
C:\temp\PP3E\Examples\PP3E\echoEnvironment.pyw
C:\temp\PP3E\Examples\PP3E\Launcher.py
...more matching filenames omitted...
C:\...\PP4E\Tools>
visitor_replace.py C:\temp\PP3E\Examples PP3E PP4E
List only?
n
Proceed with changes?
y
Visited 1429 files
Changed 101 files:
C:\temp\PP3E\Examples\README-root.txt
C:\temp\PP3E\Examples\PP3E\echoEnvironment.pyw
C:\temp\PP3E\Examples\PP3E\Launcher.py
...more changed filenames omitted...
C:\...\PP4E\Tools>
visitor_replace.py C:\temp\PP3E\Examples PP3E PP4E
List only?
n
Proceed with changes?
y
Visited 1429 files
Changed 0 files:

Naturally, we can also check our work by running the visitor
script (and
Search
Visitor
superclass):

C:\...\PP4E\Tools>
visitor.py 2 C:\temp\PP3E\Examples PP3E
Found in 0 files, visited 1429
C:\...\PP4E\Tools>
visitor.py 2 C:\temp\PP3E\Examples PP4E
C:\temp\PP3E\Examples\README-root.txt has PP4E
C:\temp\PP3E\Examples\PP3E\echoEnvironment.pyw has PP4E
C:\temp\PP3E\Examples\PP3E\Launcher.py has PP4E
...more matching filenames omitted...
Found in 101 files, visited 1429

This is both wildly powerful and dangerous. If the string to be
replaced can show up in places you didn’t anticipate, you might just
ruin an entire tree of files by running the
ReplaceVisitor
object defined here. On the
other hand, if the string is something very specific, this object can
obviate the need to manually edit suspicious files. For instance,
website addresses in HTML files are likely too specific to show up in
other places by
chance.

Counting Source Code Lines (Visitor)

The two
preceding
visitor
module clients were both search-oriented, but it’s just as easy to
extend the basic walker class for more specific goals.
Example 6-21
, for instance,
extends
FileVisitor
to count the
number of lines in program source code files of various types throughout
an entire tree. The effect is much like calling the
visitfile
method of this class for each
filename returned by the
find
tool we
wrote earlier in this chapter, but the OO structure here is arguably
more flexible and extensible.

Example 6-21. PP4E\Tools\visitor_sloc.py

"""
Count lines among all program source files in a tree named on the command
line, and report totals grouped by file types (extension). A simple SLOC
(source lines of code) metric: skip blank and comment lines if desired.
"""
import sys, pprint, os
from visitor import FileVisitor
class LinesByType(FileVisitor):
srcExts = [] # define in subclass
def __init__(self, trace=1):
FileVisitor.__init__(self, trace=trace)
self.srcLines = self.srcFiles = 0
self.extSums = {ext: dict(files=0, lines=0) for ext in self.srcExts}
def visitsource(self, fpath, ext):
if self.trace > 0: print(os.path.basename(fpath))
lines = len(open(fpath, 'rb').readlines())
self.srcFiles += 1
self.srcLines += lines
self.extSums[ext]['files'] += 1
self.extSums[ext]['lines'] += lines
def visitfile(self, filepath):
FileVisitor.visitfile(self, filepath)
for ext in self.srcExts:
if filepath.endswith(ext):
self.visitsource(filepath, ext)
break
class PyLines(LinesByType):
srcExts = ['.py', '.pyw'] # just python files
class SourceLines(LinesByType):
srcExts = ['.py', '.pyw', '.cgi', '.html', '.c', '.cxx', '.h', '.i']
if __name__ == '__main__':
walker = SourceLines()
walker.run(sys.argv[1])
print('Visited %d files and %d dirs' % (walker.fcount, walker.dcount))
print('-'*80)
print('Source files=>%d, lines=>%d' % (walker.srcFiles, walker.srcLines))
print('By Types:')
pprint.pprint(walker.extSums)
print('\nCheck sums:', end=' ')
print(sum(x['lines'] for x in walker.extSums.values()), end=' ')
print(sum(x['files'] for x in walker.extSums.values()))
print('\nPython only walk:')
walker = PyLines(trace=0)
walker.run(sys.argv[1])
pprint.pprint(walker.extSums)

When run as a script, we get trace messages during the walk
(omitted here to save space), and a report with line counts grouped by
file type. Run this on trees of your own to watch its progress; my tree
has 907 source files and 48K source lines, including 783 files and 34K
lines of “.py” Python code:

C:\...\PP4E\Tools>
visitor_sloc.py C:\temp\PP3E\Examples
Visited 1429 files and 186 dirs
--------------------------------------------------------------------------------
Source files=>907, lines=>48047
By Types:
{'.c': {'files': 45, 'lines': 7370},
'.cgi': {'files': 5, 'lines': 122},
'.cxx': {'files': 4, 'lines': 2278},
'.h': {'files': 7, 'lines': 297},
'.html': {'files': 48, 'lines': 2830},
'.i': {'files': 4, 'lines': 49},
'.py': {'files': 783, 'lines': 34601},
'.pyw': {'files': 11, 'lines': 500}}
Check sums: 48047 907
Python only walk:
{'.py': {'files': 783, 'lines': 34601}, '.pyw': {'files': 11, 'lines': 500}}
Recoding Copies with Classes (Visitor)

Let’s peek
at one more visitor use case. When I first wrote the
cpall.py
script earlier in this
chapter, I couldn’t see a way that the
visitor
class hierarchy we met earlier would
help.
Two
directories needed to be traversed in
parallel (the original and the copy), and
visitor
is based on walking just one tree with
os.walk
. There seemed no easy way to
keep track of where the script was in the copy directory.

The trick I eventually stumbled onto is not to keep track at all.
Instead, the script in
Example 6-22
simply replaces the
“from” directory path string with the “to” directory path string, at the
front of all directory names and pathnames passed in from
os.walk
. The results of the string
replacements are the paths to which the original files and directories
are to be copied.

Example 6-22. PP4E\Tools\visitor_cpall.py

"""
Use: "python ...\Tools\visitor_cpall.py fromDir toDir trace?"
Like System\Filetools\cpall.py, but with the visitor classes and os.walk;
does string replacement of fromDir with toDir at the front of all the names
that the walker passes in; assumes that the toDir does not exist initially;
"""
import os
from visitor import FileVisitor # visitor is in '.'
from PP4E.System.Filetools.cpall import copyfile # PP4E is in a dir on path
class CpallVisitor(FileVisitor):
def __init__(self, fromDir, toDir, trace=True):
self.fromDirLen = len(fromDir) + 1
self.toDir = toDir
FileVisitor.__init__(self, trace=trace)
def visitdir(self, dirpath):
toPath = os.path.join(self.toDir, dirpath[self.fromDirLen:])
if self.trace: print('d', dirpath, '=>', toPath)
os.mkdir(toPath)
self.dcount += 1
def visitfile(self, filepath):
toPath = os.path.join(self.toDir, filepath[self.fromDirLen:])
if self.trace: print('f', filepath, '=>', toPath)
copyfile(filepath, toPath)
self.fcount += 1
if __name__ == '__main__':
import sys, time
fromDir, toDir = sys.argv[1:3]
trace = len(sys.argv) > 3
print('Copying...')
start = time.clock()
walker = CpallVisitor(fromDir, toDir, trace)
walker.run(startDir=fromDir)
print('Copied', walker.fcount, 'files,', walker.dcount, 'directories', end=' ')
print('in', time.clock() - start, 'seconds')

This version accomplishes roughly the same goal as the original,
but it has made a few assumptions to keep the code simple. The “to”
directory is assumed not to exist initially, and exceptions are not
ignored along the way. Here it is copying the book examples tree from
the prior edition again on Windows:

C:\...\PP4E\Tools>
set PYTHONPATH
PYTHONPATH=C:\Users\Mark\Stuff\Books\4E\PP4E\dev\Examples
C:\...\PP4E\Tools>
rmdir /S copytemp
copytemp, Are you sure (Y/N)?
y
C:\...\PP4E\Tools>
visitor_cpall.py C:\temp\PP3E\Examples copytemp
Copying...
Copied 1429 files, 186 directories in 11.1722033777 seconds
C:\...\PP4E\Tools>
fc /B copytemp\PP3E\Launcher.py
C:\temp\PP3E\Examples\PP3E\Launcher.py
Comparing files COPYTEMP\PP3E\Launcher.py and C:\TEMP\PP3E\EXAMPLES\PP3E\LAUNCHER.PY
FC: no differences encountered

Despite the extra string slicing going on, this version seems to
run just as fast as the original (the actual difference can be chalked
up to system load variations). For tracing purposes, this version also
prints all the “from” and “to” copy paths during the traversal if you
pass in a third argument on the command
line:

C:\...\PP4E\Tools>
rmdir /S copytemp
copytemp, Are you sure (Y/N)?
y
C:\...\PP4E\Tools>
visitor_cpall.py C:\temp\PP3E\Examples copytemp 1
Copying...
d C:\temp\PP3E\Examples => copytemp\
f C:\temp\PP3E\Examples\README-root.txt => copytemp\README-root.txt
d C:\temp\PP3E\Examples\PP3E => copytemp\PP3E
...more lines omitted: try this on your own for the full output...
Other Visitor Examples (External)

Although the
visitor is widely applicable, we don’t have space to
explore additional subclasses in this book. For more example clients and
use cases, see the following examples in book’s examples distribution
package described in the
Preface
:

  • Tools\visitor_collect.py
    collects and/or prints files containing a search string

  • Tools\visitor_poundbang.py
    replaces
    directory paths in “#!” lines at the top of Unix scripts

  • Tools\visitor_cleanpyc.py
    is a visitor-based recoding of our earlier bytecode cleanup
    scripts

  • Tools\visitor_bigpy.py
    is
    a visitor-based version of the “biggest file” example at the start
    of this chapter

Most of these are almost as trivial as the
visitor_edit.py
code in
Example 6-19
, because the visitor
framework handles walking details automatically. The collector, for
instance, simply appends to a list as a search visitor detects matched
files and allows the default list of text filename extensions in the
search visitor to be overridden per
instance—
it’s roughly like a combination
of
find
and
grep
on Unix:

>>>
from visitor_collect import CollectVisitor
>>>
V = CollectVisitor('mimetypes', testexts=['.py', '.pyw'], trace=0)
>>>
V.run(r'C:\temp\PP3E\Examples')
>>>
for name in V.matches: print(name)
# .py and .pyw files with 'mimetypes'
...
C:\temp\PP3E\Examples\PP3E\Internet\Email\mailtools\mailParser.py
C:\temp\PP3E\Examples\PP3E\Internet\Email\mailtools\mailSender.py
C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\downloadflat.py
C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\downloadflat_modular.py
C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\ftptools.py
C:\temp\PP3E\Examples\PP3E\Internet\Ftp\mirror\uploadflat.py
C:\temp\PP3E\Examples\PP3E\System\Media\playfile.py
C:\...\PP4E\Tools>
visitor_collect.py mimetypes C:\temp\PP3E\Examples
# as script

The core logic of the biggest-file visitor is similarly
straightforward, and harkens back to chapter start:

class BigPy(FileVisitor):
def __init__(self, trace=0):
FileVisitor.__init__(self, context=[], trace=trace)
def visitfile(self, filepath):
FileVisitor.visitfile(self, filepath)
if filepath.endswith('.py'):
self.context.append((os.path.getsize(filepath), filepath))

And the bytecode-removal visitor brings us back full circle,
showing an additional alternative to those we met earlier in this
chapter. It’s essentially the same code, but it runs
os.remove
on “.pyc” file visits.

In the end, while the visitor classes are really just simple
wrappers for
os.walk
, they further
automate walking chores and provide a general framework and alternative
class-based structure which may seem more natural to some than simple
unstructured loops. They’re also representative of how Python’s OOP
support maps well to real-world structures like file systems. Although
os.walk
works well for one-off
scripts, the better extensibility, reduced redundancy, and greater
encapsulation possible with OOP can be a major asset in real work as our
needs change and evolve over
time.

Note

In fact, those needs
have
changed over
time. Between the third and fourth editions of this book, the original
os.path.walk
call was removed in
Python 3.X, and
os.walk
became the
only automated way to perform tree walks in the standard library.
Examples from the prior edition that used
os.path.walk
were effectively broken. By
contrast, although the visitor classes used this call, too, its
clients did not. Because updating the visitor classes to use
os.walk
internally did not alter those
classes’ interfaces, visitor-based tools continued to work
unchanged.

This seems a prime example of the benefits of OOP’s support for
encapsulation. Although the future is never completely predictable, in
practice, user-defined tools like visitor tend to give you more
control over changes than standard library tools like
os.walk
. Trust me on that; as someone who
has had to update three Python books over the last 15 years, I can say
with some certainty that Python change is a constant!

Other books

Reckless Creed by Alex Kava
Eternal Life Inc. by James Burkard
nancy werlocks diary s02e15 by dawson, julie ann
Starfish and Coffee by Kele Moon
Mail-Order Man by Martha Hix
I Can't Believe He Was My First! (Kari's Lessons) by Zara, Cassandra, Lane, Lucinda
The Ruby Quest by Gill Vickery
Sweetly by Jackson Pearce