Programming Python (115 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
8.94Mb size Format: txt, pdf, ePub

This may take a few moments to complete, depending on your site’s
size and your connection speed (it’s bound by network speed constraints,
and it usually takes roughly two to three minutes for my site on my
current laptop and wireless broadband connection). It is much more
accurate and easier than downloading files by hand, though. The script
simply iterates over all the remote files returned by the
nlst
method, and downloads each with the FTP
protocol (i.e., over sockets) in turn. It uses text transfer mode for
names that imply text data, and binary mode for others.

With the script running this way, I make sure the initial
assignments in it reflect the machines involved, and then run the script
from the local directory where I want
the
site copy
to be stored. Because the target download directory
is often not where the script lives, I may need to give Python the full
path to the script file. When run on a server in a Telnet or SSH session
window, for instance, the execution and script directory paths are
different, but the script works the same way.

If you elect to delete local files in the download directory, you
may also see a batch of “deleting local…” messages scroll by on the
screen before any “downloading…” lines appear: this automatically cleans
out any garbage lingering from a prior download. And if you botch the
input of the remote site password, a Python exception is raised; I
sometimes need to run it again (and type more slowly):

C:\...\PP4E\Internet\Ftp\Mirror>
downloadflat.py test
Password for lutz on home.rmi.net:
Clean local directory first?
connecting...
Traceback (most recent call last):
File "C:\...\PP4E\Internet\Ftp\Mirror\downloadflat.py", line 29, in
connection.login(remoteuser, remotepass) # login as user/password
File "C:\Python31\lib\ftplib.py", line 375, in login
if resp[0] == '3': resp = self.sendcmd('PASS ' + passwd)
File "C:\Python31\lib\ftplib.py", line 245, in sendcmd
return self.getresp()
File "C:\Python31\lib\ftplib.py", line 220, in getresp
raise error_perm(resp)
ftplib.error_perm: 530 Login incorrect.

It’s worth noting that this script is at least partially
configured by assignments near the top of the file. In addition, the
password and deletion options are given by interactive inputs, and one
command-line argument is allowed—the local directory name to store the
downloaded files (it defaults to “.”, the directory where the script is
run).
Command
-line arguments could
be employed to universally configure all the other download parameters
and options, too, but because of Python’s simplicity and lack of
compile/link steps, changing settings in the text of Python scripts is
usually just as easy as typing words on a command
line.

Note

To check for version skew after a batch of downloads and
uploads, you can run the
diffall
script we wrote in
Chapter 6
,
Example 6-12
. For instance, I find
files that have diverged over time due to updates on multiple
platforms by comparing the download to a local copy of my website
using a shell command line such as
C:\...\PP4E\Internet\Ftp>
..\..\System\Filetools\diffall.py Mirror\test
C:\...\Websites\public_html
. See
Chapter 6
for more details on this tool,
and file
diffall.out.txt
in the
diffs
subdirectory of the
examples distribution for a sample run; its text file differences stem
from either final line newline characters or newline differences
reflecting binary transfers that Windows
fc
commands and FTP servers do not
notice.

Uploading Site Directories

Uploading a full directory is
symmetric to downloading: it’s mostly a matter of swapping
the local and remote machines and operations in the program we just met.
The script in
Example 13-11
uses
FTP to copy all files in a directory on the local machine on which it
runs up to a directory on a remote machine.

I really use this script, too, most often to upload all of the
files maintained on my laptop PC to my ISP account in one fell swoop. I
also sometimes use it to copy my site from my PC to a mirror machine or
from the mirror machine back to my ISP. Because this script runs on any
computer with Python and sockets, it happily transfers a directory from
any machine on the Net to any machine running an FTP server. Simply
change the initial setting in this module as appropriate for the
transfer you have in mind.

Example 13-11. PP4E\Internet\Ftp\Mirror\uploadflat.py

#!/bin/env python
"""
##############################################################################
use FTP to upload all files from one local dir to a remote site/directory;
e.g., run me to copy a web/FTP site's files from your PC to your ISP;
assumes a flat directory upload: uploadall.py does nested directories.
see downloadflat.py comments for more notes: this script is symmetric.
##############################################################################
"""
import os, sys, ftplib
from getpass import getpass
from mimetypes import guess_type
nonpassive = False # passive FTP by default
remotesite = 'learning-python.com' # upload to this site
remotedir = 'books' # from machine running on
remoteuser = 'lutz'
remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite))
localdir = (len(sys.argv) > 1 and sys.argv[1]) or '.'
cleanall = input('Clean remote directory first? ')[:1] in ['y', 'Y']
print('connecting...')
connection = ftplib.FTP(remotesite) # connect to FTP site
connection.login(remoteuser, remotepass) # log in as user/password
connection.cwd(remotedir) # cd to directory to copy
if nonpassive: # force active mode FTP
connection.set_pasv(False) # most servers do passive
if cleanall:
for remotename in connection.nlst(): # try to delete all remotes
try: # first, to remove old files
print('deleting remote', remotename)
connection.delete(remotename) # skips . and .. if attempted
except:
print('cannot delete remote', remotename)
count = 0 # upload all local files
localfiles = os.listdir(localdir) # listdir() strips dir path
# any failure ends script
for localname in localfiles:
mimetype, encoding = guess_type(localname) # e.g., ('text/plain', 'gzip')
mimetype = mimetype or '?/?' # may be (None, None)
maintype = mimetype.split('/')[0] # .jpg ('image/jpeg', None')
localpath = os.path.join(localdir, localname)
print('uploading', localpath, 'to', localname, end=' ')
print('as', maintype, encoding or '')
if maintype == 'text' and encoding == None:
# use ascii mode xfer and bytes file
# need rb mode for ftplib's crlf logic
localfile = open(localpath, 'rb')
connection.storlines('STOR ' + localname, localfile)
else:
# use binary mode xfer and bytes file
localfile = open(localpath, 'rb')
connection.storbinary('STOR ' + localname, localfile)
localfile.close()
count += 1
connection.quit()
print('Done:', count, 'files uploaded.')

Similar to the mirror download script, this program illustrates a
handful of new FTP interfaces and a set of FTP scripting
techniques:

Deleting all remote files

Just like the mirror script, the upload begins by asking
whether we want to delete all the files in the remote target
directory before copying any files there. This
cleanall
option is useful if we’ve
deleted files in the local copy of the directory in the client—the
deleted files would remain on the server-side copy unless we
delete all files there first.

To implement the remote cleanup, this script simply gets a
listing of all the files in the remote directory with the FTP
nlst
method, and deletes each
in turn with the
FTP
delete
method. Assuming we have delete permission, the directory will be
emptied (file permissions depend on the account we logged into
when connecting to the server). We’ve already moved to the target
remote directory when deletions occur, so no directory paths need
to be prepended to filenames here. Note that
nlst
may raise an exception for some
servers if the remote directory is empty; we don’t catch the
exception here, but you can simply not select a cleaning if one
fails for you. We do catch deletion exceptions, because directory
names like “.” and “..” may be returned in the listing by some
servers.

Storing all local files

To apply the upload operation to each file in the local
directory, we get a list of local filenames with the standard
os.listdir
call, and
take care to prepend the local source directory path
to each filename with the
os.path.join
call. Recall that
os.listdir
returns filenames without
directory paths, and the source directory may not be the same as
the script’s execution directory if passed on the command
line.

Uploading: Text versus binary

This script may also be run on both Windows and Unix-like
clients, so we need to handle text files specially. Like the
mirror download, this script picks text or binary transfer modes
by using Python’s
mimetypes
module
to guess a file’s type from its filename extension; HTML and text
files are moved in FTP text mode, for instance. We already met the
storbinary
FTP object method
used to upload files in binary mode—an exact, byte-for-byte copy
appears at the remote site.

Text-mode transfers
work
almost identically: the
storlines
method accepts an FTP command
string and a local file (or file-like) object, and simply copies
each line read from the local file to a same-named file on the
remote machine.

Notice, though, that the local text input file must be
opened in
rb
binary
mode
in Python3.X. Text input files are normally opened
in
r
text mode to perform
Unicode decoding and to convert any
\r\n
end-of-line sequences on Windows to
the
\n
platform-neutral
character as lines are read. However,
ftplib
in Python 3.1 requires that the
text file be opened in
rb
binary mode, because it converts all end-lines to the
\r\n
sequence for transmission; to do
so, it must read lines as raw bytes with
readlines
and perform
bytes
string processing, which implies
binary mode files.

This
ftplib
string
processing worked with text-mode files in Python 2.X, but only
because there was no separate
bytes
type;
\n
was expanded to
\r\n
. Opening the local file in binary
mode for
ftplib
to read also
means no Unicode decoding will occur: the text is sent over
sockets as a byte string in already encoded form. All of which is,
of course, a prime lesson on the impacts of Unicode encodings;
consult the module
ftplib.py
in the Python source library directory for more details.

For
binary mode transfers
, things
are simpler—we open the local file in
rb
binary mode to suppress Unicode
decoding and automatic mapping everywhere, and return the
bytes
strings expected by
ftplib
on read. Binary data is not
Unicode text, and we don’t want bytes in an audio file that happen
to have the same value as
\r
to
magically disappear when read on Windows.

As for the mirror download script, this program simply iterates
over all files to be transferred (files in the local directory listing
this time), and transfers each in turn—in either text or binary mode,
depending on the files’ names. Here is the command I use to upload my
entire website from my laptop Windows PC to a remote Linux server at my
ISP, in a single step:

C:\...\PP4E\Internet\Ftp\Mirror>
uploadflat.py test
Password for lutz on learning-python.com:
Clean remote directory first? y
connecting...
deleting remote .
cannot delete remote .
deleting remote ..
cannot delete remote ..
deleting remote 2004-longmont-classes.html
deleting remote 2005-longmont-classes.html
deleting remote 2006-longmont-classes.html
deleting remote about-lp1e.html
deleting remote about-lp2e.html
deleting remote about-lp3e.html
deleting remote about-lp4e.html
...lines omitted...
uploading test\2004-longmont-classes.html to 2004-longmont-classes.html as text
uploading test\2005-longmont-classes.html to 2005-longmont-classes.html as text
uploading test\2006-longmont-classes.html to 2006-longmont-classes.html as text
uploading test\about-lp1e.html to about-lp1e.html as text
uploading test\about-lp2e.html to about-lp2e.html as text
uploading test\about-lp3e.html to about-lp3e.html as text
uploading test\about-lp4e.html to about-lp4e.html as text
uploading test\about-pp-japan.html to about-pp-japan.html as text
...lines omitted...
uploading test\whatsnew.html to whatsnew.html as text
uploading test\whatsold.html to whatsold.html as text
uploading test\wxPython.doc.tgz to wxPython.doc.tgz as application gzip
uploading test\xlate-lp.html to xlate-lp.html as text
uploading test\zaurus0.jpg to zaurus0.jpg as image
uploading test\zaurus1.jpg to zaurus1.jpg as image
uploading test\zaurus2.jpg to zaurus2.jpg as image
uploading test\zoo-jan-03.jpg to zoo-jan-03.jpg as image
uploading test\zopeoutline.htm to zopeoutline.htm as text
Done: 297 files uploaded.

For my site and on my current laptop and wireless broadband
connection, this process typically takes six minutes, depending on
server load. As with the download script, I often run this command from
the local directory where my web files are kept, and I pass Python the
full path to the script. When I run this on a Linux server, it works in
the same way, but the paths to the script and my web files directory
differ.
[
50
]

Other books

The 13th Enumeration by William Struse, Rachel Starr Thomson
The Rainy Day Killer by Michael J. McCann
Doveland by Martha Moore
Opal Plumstead by Jacqueline Wilson
A Christmas to Remember by Thomas Kinkade