Finally, let’s run a few more
experiments with these Python system utilities to
demonstrate other usage modes. When run without full command-line
arguments, bothsplit
andjoin
are smart enough to input their
parameters
interactively
. Here they are chopping
and gluing the Python self-installer file on Windows again, with
parameters typed in the DOS console window:
C:\temp>python C:\...\PP4E\System\Filetools\split.py
File to be split?python-3.1.msi
Directory to store part files?splitout
Splitting C:\temp\python-3.1.msi to C:\temp\splitout by 1433600
Split finished: 10 parts are in C:\temp\splitout
Press Enter key
C:\temp>python C:\...\PP4E\System\Filetools\join.py
Directory containing part files?splitout
Name of file to be recreated?newpy31.msi
Joining C:\temp\splitout to make C:\temp\newpy31.msi
Join complete: see C:\temp\newpy31.msi
Press Enter key
C:\temp>fc /B python-3.1.msi newpy31.msi
Comparing files python-3.1.msi and NEWPY31.MSI
FC: no differences encountered
When these program files are
double-clicked
in a Windows file explorer GUI, they work the same way (there are
usually no command-line arguments when they are launched this way). In
this mode, absolute path displays help clarify where files really are.
Remember, the current working directory is the script’s home directory
when clicked like this, so a simple name actually maps to a source code
directory; type a full path to make the split files show up somewhere
else:
[in a pop-up DOS console box when split.py is clicked]
File to be split?c:\temp\python-3.1.msi
Directory to store part files?c:\temp\parts
Splitting c:\temp\python-3.1.msi to c:\temp\parts by 1433600
Split finished: 10 parts are in c:\temp\parts
Press Enter key
[in a pop-up DOS console box when join.py is clicked]
Directory containing part files?c:\temp\parts
Name of file to be recreated?c:\temp\morepy31.msi
Joining c:\temp\parts to make c:\temp\morepy31.msi
Join complete: see c:\temp\morepy31.msi
Press Enter key
Because these scripts package their core logic in functions,
though, it’s just as easy to reuse their code by
importing
and calling from another Python component
(make sure your module import search path includes the directory
containing the PP4E root first; the first abbreviated line here is one
way to do so):
C:\temp>set PYTHONPATH=C:\...\dev\Examples
C:\temp>python
>>>from PP4E.System.Filetools.split import split
>>>from PP4E.System.Filetools.join import join
>>>
>>>numparts = split('python-3.1.msi', 'calldir')
>>>numparts
10
>>>join('calldir', 'callpy31.msi')
>>>
>>>import os
>>>os.system('fc /B python-3.1.msi callpy31.msi')
Comparing files python-3.1.msi and CALLPY31.msi
FC: no differences encountered
0
A word about performance: all thesplit
andjoin
tests shown so far process a 13 MB file,
but they take less than one second of real wall-clock time to finish on
my Windows 7 2GHz Atom processor laptop computer—plenty fast for just
about any use I could imagine. Both scripts run just as fast for other
reasonable
part file sizes
, too; here is the
splitter chopping up the file into 4MB and 500KB parts:
C:\temp>C:\...\PP4E\System\Filetools\split.py python-3.1.msi tempsplit 4000000
Splitting C:\temp\python-3.1.msi to C:\temp\tempsplit by 4000000
Split finished: 4 parts are in C:\temp\tempsplit
C:\temp>dir tempsplit
...more...
Directory of C:\temp\tempsplit
02/21/2010 01:27 PM.
02/21/2010 01:27 PM..
02/21/2010 01:27 PM 4,000,000 part0001
02/21/2010 01:27 PM 4,000,000 part0002
02/21/2010 01:27 PM 4,000,000 part0003
02/21/2010 01:27 PM 1,814,272 part0004
4 File(s) 13,814,272 bytes
2 Dir(s) 188,671,983,616 bytes free
C:\temp>C:\...\PP4E\System\Filetools\split.py python-3.1.msi tempsplit 500000
Splitting C:\temp\python-3.1.msi to C:\temp\tempsplit by 500000
Split finished: 28 parts are in C:\temp\tempsplit
C:\temp>dir tempsplit
...more...
Directory of C:\temp\tempsplit
02/21/2010 01:27 PM.
02/21/2010 01:27 PM..
02/21/2010 01:27 PM 500,000 part0001
02/21/2010 01:27 PM 500,000 part0002
02/21/2010 01:27 PM 500,000 part0003
02/21/2010 01:27 PM 500,000 part0004
02/21/2010 01:27 PM 500,000 part0005
...more lines omitted...
02/21/2010 01:27 PM 500,000 part0024
02/21/2010 01:27 PM 500,000 part0025
02/21/2010 01:27 PM 500,000 part0026
02/21/2010 01:27 PM 500,000 part0027
02/21/2010 01:27 PM 314,272 part0028
28 File(s) 13,814,272 bytes
2 Dir(s) 188,671,946,752 bytes free
The split can take noticeably longer to finish, but only if the
part file’s size is set small enough to generate thousands of part
files—splitting into 1,382 parts works but runs slower (though some
machines today are quick enough that you might not notice):
C:\temp>C:\...\PP4E\System\Filetools\split.py python-3.1.msi tempsplit 10000
Splitting C:\temp\python-3.1.msi to C:\temp\tempsplit by 10000
Split finished: 1382 parts are in C:\temp\tempsplit
C:\temp>C:\...\PP4E\System\Filetools\join.py tempsplit manypy31.msi
Joining C:\temp\tempsplit to make C:\temp\manypy31.msi
Join complete: see C:\temp\manypy31.msi
C:\temp>fc /B python-3.1.msi manypy31.msi
Comparing files python-3.1.msi and MANYPY31.MSI
FC: no differences encountered
C:\temp>dir tempsplit
...more...
Directory of C:\temp\tempsplit
02/21/2010 01:40 PM.
02/21/2010 01:40 PM..
02/21/2010 01:39 PM 10,000 part0001
02/21/2010 01:39 PM 10,000 part0002
02/21/2010 01:39 PM 10,000 part0003
02/21/2010 01:39 PM 10,000 part0004
02/21/2010 01:39 PM 10,000 part0005
...over 1,000 lines deleted...
02/21/2010 01:40 PM 10,000 part1378
02/21/2010 01:40 PM 10,000 part1379
02/21/2010 01:40 PM 10,000 part1380
02/21/2010 01:40 PM 10,000 part1381
02/21/2010 01:40 PM 4,272 part1382
1382 File(s) 13,814,272 bytes
2 Dir(s) 188,651,008,000 bytes free
Finally, the splitter is also smart enough to create the output
directory if it doesn’t yet exist and to clear out any old files there
if it does exist—the following, for example, leaves only new files in
the output directory. Because the joiner combines whatever files exist
in the output directory, this is a nice ergonomic touch. If the output
directory was not cleared before each split, it would be too easy to
forget that a prior run’s files are still there. Given that target
audience for these scripts, they needed to be as forgiving as possible;
your user base may vary (though you often shouldn’t assume so).
C:\temp>C:\...\PP4E\System\Filetools\split.py python-3.1.msi tempsplit 5000000
Splitting C:\temp\python-3.1.msi to C:\temp\tempsplit by 5000000
Split finished: 3 parts are in C:\temp\tempsplit
C:\temp>dir tempsplit
...more...
Directory of C:\temp\tempsplit
02/21/2010 01:47 PM.
02/21/2010 01:47 PM..
02/21/2010 01:47 PM 5,000,000 part0001
02/21/2010 01:47 PM 5,000,000 part0002
02/21/2010 01:47 PM 3,814,272 part0003
3 File(s) 13,814,272 bytes
2 Dir(s) 188,654,452,736 bytes free
Of course, the dilemma that these scripts address might today be
more easily addressed by simply buying a bigger memory stick or giving
kids their own Internet access. Still, once you catch the scripting bug,
you’ll find the ease and flexibility of Python to be powerful and
enabling tools, especially for writing custom automation scripts like
these. When used well, Python may well become your Swiss Army knife
of
computing
.
[
19
]
I should note that this background story stems from the second
edition of this book, written in 2000. Some ten years later, floppies
have largely gone the way of the parallel port and the dinosaur.
Moreover, burning a CD or DVD is no longer as painful as it once was;
there are new options today such as large flash memory cards, wireless
home networks, and simple email; and naturally, my home computers
configuration isn’t what it once was. For that matter, some of my kids
are no longer kids (though they’ve retained some backward
compatibility with their former selves).
[
20
]
It turns out that thezip
,gzip
, andtar
commands can all be replaced with pure
Python code today, too. Thegzip
module in the Python standard library provides tools for reading and
writing compressedgzip
files,
usually named with a
.gz
filename extension. It
can serve as an all-Python equivalent of the standardgzip
andgunzip
command-line utility programs. This
built-in module uses another module calledzlib
that implementsgzip
-compatible data compressions. In
recent Python releases, thezipfile
module can be imported to make and
use ZIP format archives (zip
is
an archive and compression format,gzip
is a compression scheme), and thetarfile
module allows scripts to
read and write tar archives. See the Python library manual for
details.
Moving is rarely
painless, even in cyberspace. Changing your website’s
Internet address can lead to all sorts of confusion. You need to ask known
contacts to use the new address and hope that others will eventually
stumble onto it themselves. But if you rely on the Internet, moves are
bound to generate at least as much confusion as an address change in the
real world.
Unfortunately, such site relocations are often unavoidable. Both
Internet Service Providers (ISPs)
and server machines can come and go over the years.
Moreover, some ISPs let their service fall to intolerably low levels; if
you are unlucky enough to have signed up with such an ISP, there is not
much recourse but to change providers, and that often implies a change of
web addresses.
[
21
]
Imagine, though, that you are an O’Reilly author and have published
your website’s address in multiple books sold widely all over the world.
What do you do when your ISP’s service level requires a site change?
Notifying each of the hundreds of thousands of readers out there isn’t
exactly a practical solution.
Probably the best you can do is to leave forwarding instructions at
the old site for some reasonably long period of time—the virtual
equivalent of a “We’ve Moved” sign in a storefront window. On the Web,
such a sign can also send visitors to the new site automatically: simply
leave a page at the old site containing a hyperlink to the page’s address
at the new site, along with timed auto-relocation specifications. With
such
forward-link files
in place, visitors to the old
addresses will be only one click or a few seconds away from reaching the
new ones.
That sounds simple enough. But because visitors might try to
directly access the address of
any
file at your old
site, you generally need to leave one forward-link file for every old
file—HTML pages, images, and so on. Unless your prior server supports
auto-redirection (and mine did not), this represents a dilemma. If you
happen to enjoy doing lots of mindless typing, you could create each
forward-link file by hand. But given that my home site contained over 100
HTML files at the time I wrote this paragraph, the prospect of running one
editor session per file was more than enough motivation for an automated
solution.
Here’s what I came up with. First of all, I create a
general
page template
text file, shown in
Example 6-7
, to describe how all
the forward-link files should look, with parts to be filled in
later.
Example 6-7. PP4E\System\Filetools\template.html
Site Redirection Page: $file$ This page has moved
This page now lives at this address:
Please click on the new address to jump to this page, and
update any links accordingly. You will be redirectly shortly.
To fully understand this template, you have to know something
about HTML, a web page description language that we’ll explore in
Part IV
. But for the purposes of this example,
you can ignore most of this file and focus on just the parts surrounded
by dollar signs: the strings$server$
,$home$
, and$file$
are targets to be replaced with real
values by global text substitutions. They represent items that vary per
site relocation and file.
Now, given a
page template file, the Python script in
Example 6-8
generates all the required
forward-link files automatically.
Example 6-8. PP4E\System\Filetools\site-forward.py
"""
################################################################################
Create forward-link pages for relocating a web site.
Generates one page for every existing site html file; upload the generated
files to your old web site. See ftplib later in the book for ways to run
uploads in scripts either after or during page file creation.
################################################################################
"""
import os
servername = 'learning-python.com' # where site is relocating to
homedir = 'books' # where site will be rooted
sitefilesdir = r'C:\temp\public_html' # where site files live locally
uploaddir = r'C:\temp\isp-forward' # where to store forward files
templatename = 'template.html' # template for generated pages
try:
os.mkdir(uploaddir) # make upload dir if needed
except OSError: pass
template = open(templatename).read() # load or import template text
sitefiles = os.listdir(sitefilesdir) # filenames, no directory prefix
count = 0
for filename in sitefiles:
if filename.endswith('.html') or filename.endswith('.htm'):
fwdname = os.path.join(uploaddir, filename)
print('creating', filename, 'as', fwdname)
filetext = template.replace('$server$', servername) # insert text
filetext = filetext.replace('$home$', homedir) # and write
filetext = filetext.replace('$file$', filename) # file varies
open(fwdname, 'w').write(filetext)
count += 1
print('Last file =>\n', filetext, sep='')
print('Done:', count, 'forward files created.')
Notice that the template’s text is loaded by reading a
file
; it would work just as well to code it as an
imported Python string variable (e.g., a triple-quoted string in a
module file). Also observe that all configuration options are
assignments at the top of the
script
, not
command-line arguments; since they change so seldom, it’s convenient to
type them just once in the script itself.
But the main thing worth noticing here is that this script doesn’t
care what the template file looks like at all; it simply performs global
substitutions blindly in its text, with a different filename value for
each generated file. In fact, we can change the template file any way we
like without having to touch the script. Though a fairly simple
technique, such a division of labor can be used in all sorts of
contexts—generating “makefiles,” form letters, HTML replies from CGI
scripts on web servers, and so on. In terms of library tools, the
generator script:
Usesos.listdir
to step
through all the filenames in the site’s directory (glob.glob
would work too, but may require
stripping directory prefixes from file names)
Uses the string object’sreplace
method to perform global
search-and-replace operations that fill in the$
-delimited targets in the template file’s
text, andendswith
to skip
non-HTML files (e.g., images—most browsers won’t know what to do
with HTML text in a “.jpg” file)
Usesos.path.join
and
built-in file objects to write the resulting text out to a
forward
-link file of the same name in
an output directory
The end result is a mirror image of the original website
directory, containing only
forward
-
link
files generated from the page
template. As an added bonus, the generator script can be run on just
about any Python platform—I can run it on my Windows laptop (where I’m
writing this book), as well as on a Linux server (where my
http://learning-python.com
domain is hosted). Here it is
in action on Windows:
C:\...\PP4E\System\Filetools>python site-forward.py
creating about-lp.html as C:\temp\isp-forward\about-lp.html
creating about-lp1e.html as C:\temp\isp-forward\about-lp1e.html
creating about-lp2e.html as C:\temp\isp-forward\about-lp2e.html
creating about-lp3e.html as C:\temp\isp-forward\about-lp3e.html
creating about-lp4e.html as C:\temp\isp-forward\about-lp4e.html
...many more lines deleted...
creating training.html as C:\temp\isp-forward\training.html
creating whatsnew.html as C:\temp\isp-forward\whatsnew.html
creating whatsold.html as C:\temp\isp-forward\whatsold.html
creating xlate-lp.html as C:\temp\isp-forward\xlate-lp.html
creating zopeoutline.htm as C:\temp\isp-forward\zopeoutline.htm
Last file =>Site Redirection Page: zopeoutline.htm This page has moved
This page now lives at this address:
http://learning-python.com/books/zopeoutline.htmPlease click on the new address to jump to this page, and
update any links accordingly. You will be redirectly shortly.
Done: 124 forward files created.
To verify this script’s output, double-click on any of the output
files to see what they look like in a web browser (or run astart
command in a DOS console on
Windows—e.g.,start
).
isp-forward\about-lp4e.html
Figure 6-1
shows what one generated page
looks like on my machine.
Figure 6-1. Site-forward output file page
To complete the process, you still need to install the forward
links: upload all the generated files in the output directory to your
old site’s web directory. If that’s too much to do by hand, too, be sure
to see the FTP site upload scripts in
Chapter 13
for an automatic way to do that step
with Python as well
(
PP4E\Internet\Ftp\uploadflat.py
will do the job).
Once you’ve started scripting in earnest, you’ll be amazed at how much
manual labor Python can automate. The next section provides another
prime
example.
[
21
]
It happens. In fact, most people who spend any substantial
amount of time in cyberspace could probably tell a horror story or
two. Mine goes like this: a number of years ago, I had an account with
an ISP that went completely offline for a few weeks in response to a
security breach by an ex-employee. Worse, not only was personal email
disabled, but queued up messages were permanently lost. If your
livelihood depends on email and the Web as much as mine does, you’ll
appreciate the havoc such an outage can wreak.