Programming Python (159 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
2.96Mb size Format: txt, pdf, ePub
Transferring Files to Clients and Servers

It’s time to explain a
bit of HTML code that’s been lurking in the shadows. Did you
notice those hyperlinks on the language selector examples’ main pages for
showing the CGI script’s source code (the links I told you to ignore)?
Normally, we can’t see such script source code, because accessing a CGI
script makes it execute—we can see only its HTML output, generated to make
the new page. The script in
Example 15-26
, referenced by a hyperlink
in the main
language.html
page, works
around that by opening the source file and sending its text as part of the
HTML response. The text is marked with


as preformatted text and is escaped
for transmission inside HTML with
cgi.escape
.

Example 15-26. PP4E\Internet\Web\cgi-bin\languages-src.py

#!/usr/bin/python
"Display languages.py script code without running it."
import cgi
filename = 'cgi-bin/languages.py'
print('Content-type: text/html\n') # wrap up in HTML
print('Languages')
print("

Source code: '%s'

" % filename)
print('
')
print(cgi.escape(open(filename).read())) # decode per platform default
print('

')

Here again, the
filename
is
relative to the server’s directory for our web server on Windows (see the
prior discussion of this, and delete the
cgi-bin
portion of its path on other platforms).
When we visit this script on the Web via the first source hyperlink in
Example 15-17
or a manually typed
URL, the script delivers a response to the client that includes the text
of the CGI script source file. It’s captured in
Figure 15-27
.

Figure 15-27. Source code viewer page

Note that here, too, it’s crucial to format the text of the file
with
cgi.escape
, because it is embedded
in the HTML code of the reply. If we don’t, any characters in the text
that mean something in HTML code are interpreted as HTML tags. For
example, the C++
<
operator
character within this file’s text may yield bizarre results if not
properly escaped. The
cgi.escape
utility converts it to the standard sequence
<
for safe
embedding
.

Displaying Arbitrary Server Files on the Client

Almost immediately after writing the languages source code viewer
script in the preceding example, it occurred to me that it wouldn’t be
much more work, and would be much more useful, to write a generic
version—one that could use a passed-in filename to display
any
file on the site. It’s a straightforward
mutation on the server side; we merely need to allow a filename to be
passed in as an input. The
getfile.py
Python script
in
Example 15-27
implements this
generalization. It assumes the filename is either typed into a web page
form or appended to the end of the URL as a parameter. Remember that
Python’s
cgi
module handles both
cases transparently, so there is no code in this script that notices any
difference.

Example 15-27. PP4E\Internet\Web\cgi-bin\getfile.py

#!/usr/bin/python
"""
##################################################################################
Display any CGI (or other) server-side file without running it. The filename can
be passed in a URL param or form field (use "localhost" as the server if local):
http://servername/cgi-bin/getfile.py?filename=somefile.html
http://servername/cgi-bin/getfile.py?filename=cgi-bin\somefile.py
http://servername/cgi-bin/getfile.py?filename=cgi-bin%2Fsomefile.py
Users can cut-and-paste or "View Source" to save file locally. On IE, running the
text/plain version (formatted=False) sometimes pops up Notepad, but end-lines are
not always in DOS format; Netscape shows the text correctly in the browser page
instead. Sending the file in text/HTML mode works on both browsers--text is
displayed in the browser response page correctly. We also check the filename here
to try to avoid showing private files; this may or may not prevent access to such
files in general: don't install this script if you can't otherwise secure source!
##################################################################################
"""
import cgi, os, sys
formatted = True # True=wrap text in HTML
privates = ['PyMailCgi/cgi-bin/secret.py'] # don't show these
try:
samefile = os.path.samefile # checks device, inode numbers
except:
def samefile(path1, path2): # not available on Windows
apath1 = os.path.abspath(path1).lower() # do close approximation
apath2 = os.path.abspath(path2).lower() # normalizes path, same case
return apath1 == apath2
html = """
Getfile response
Source code for: '%s'


%s


"""
def restricted(filename):
for path in privates:
if samefile(path, filename): # unify all paths by os.stat
return True # else returns None=false
try:
form = cgi.FieldStorage()
filename = form['filename'].value # URL param or form field
except:
filename = 'cgi-bin\getfile.py' # else default filename
try:
assert not restricted(filename) # load unless private
filetext = open(filename).read() # platform unicode encoding
except AssertionError:
filetext = '(File access denied)'
except:
filetext = '(Error opening file: %s)' % sys.exc_info()[1]
if not formatted:
print('Content-type: text/plain\n') # send plain text
print(filetext) # works on NS, not IE?
else:
print('Content-type: text/html\n') # wrap up in HTML
print(html % (filename, cgi.escape(filetext)))

This Python server-side script simply extracts the filename from
the parsed CGI inputs object and reads and prints the text of the file
to send it to the client browser. Depending on the
formatted
global variable setting, it sends
the file in either plain text mode (using
text/plain
in the response header) or wrapped
up in an HTML page definition (
text/html
).

Both modes (and others) work in general under most browsers, but
Internet Explorer doesn’t handle the plain text mode as gracefully as
Netscape does—during testing, it popped up the Notepad text editor to
view the downloaded text, but end-of-line characters in Unix format made
the file appear as one long line. (Netscape instead displays the text
correctly in the body of the response web page itself.) HTML display
mode works more portably with current browsers. More on this script’s
restricted file logic in a moment.

Let’s launch this script by typing its URL at the top of a
browser, along with a desired filename appended after the script’s name.
Figure 15-28
shows the page we get
by visiting the following URL (the second source link in the language
selector page of
Example 15-17
has a similar effect but a different file):

http://localhost/cgi-bin/getfile.py?filename=cgi-bin\languages-src.py

Figure 15-28. Generic source code viewer page

The body of this page shows the text of the server-side file whose
name we passed at the end of the URL; once it arrives, we can view its
text, cut-and-paste to save it in a file on the client, and so on. In
fact, now that we have this generalized source code viewer, we could
replace the hyperlink to the script
languages-src.py
in
language.html
, with a URL of this form
(I included both for illustration):

http://localhost/cgi-bin/getfile.py?filename=cgi-bin\languages.py

Subtle thing: notice that the query parameter in this URL and
others in this book use a backslash as the Windows directory separator.
On Windows, and using both the local Python web server of
Example 15-1
and Internet Explorer,
we can also use the two URL-escaped forms at the start of the following,
but the literal forward slash of the last in following fails (in URL
escapes,
%5C
is
\
and
%2F
is
/
):

http://localhost/cgi-bin/getfile.py?filename=cgi-bin%5Clanguages.py
OK too
http://localhost/cgi-bin/getfile.py?filename=cgi-bin%2Flanguages.py
OK too
http://localhost/cgi-bin/getfile.py?filename=cgi-bin/languages.py
fails

This reflects a change since the prior edition of this book (which
used the last of these for portability), and may or may not be ideal
behavior (though like working directory contexts, this is one of a set
of server and platform differences you’re likely to encounter when
working on the Web). It seems to stem from the fact that the
urllib.parse
module’s
quote
considers
/
safe, but
quote_plus
no longer does. If you care about
URL portability in this context, the second of the preceding forms may
be better, though arguably cryptic to remember if you have to type it
manually (escaping tools can automate this). If not, you may have to
double-up on backslashes to avoid clashes with other string escapes,
because of the way URL parameter data is handled; see the links to this
script in
Example 15-20
for an
example involving
\f
.

From a higher perspective, URLs like these are really direct calls
(albeit across the Web) to our Python script, with filename parameters
passed explicitly—we’re using the script much like a subroutine located
elsewhere in cyberspace which returns the text of a file we wish to
view. As we’ve seen, parameters passed in URLs are treated the same as
field inputs in forms; for convenience, let’s also write a simple web
page that allows the desired file to be typed directly into a form, as
shown in
Example 15-28
.

Example 15-28. PP4E\Internet\Web\getfile.html

Getfile: download page


Type name of server file to be viewed





View script code

Figure 15-29
shows the
page we receive when we visit this file’s URL. We need to type only the
filename in this page, not the full CGI script address; notice that I
can use forward slashes here because the browser will escape on
transmission and Python’s open allows either type of slash on Windows
(in query parameters created manually, it’s up to coders or generators
to do the right thing).

Figure 15-29. Source code viewer selection page

When we press this page’s Download button to submit the form, the
filename is transmitted to the server, and we get back the same page as
before, when the filename was appended to the URL (it’s the same as
Figure 15-28
, albeit with a
different directory separator slash). In fact, the filename
will
be appended to the URL here, too; the
get
method in the form’s HTML instructs the
browser to append the filename to the URL, exactly as if we had done so
manually. It shows up at the end of the URL in the response page’s
address field, even though we really typed it into a form. Clicking the
link at the bottom of
Figure 15-29
opens the file-getter
script’s source in the same way, though the URL is explicit.
[
65
]

Handling private files and errors

As long as CGI scripts
have permission to open the desired server-side file,
this script can be used to view and locally save
any
file on the server. For instance,
Figure 15-30
shows the page we’re
served after asking for the file path
PyMailCgi/pymailcgi.html
—an HTML text file
in another application’s subdirectory, nested within the parent
directory of this script (we explore PyMailCGI in the next chapter).
Users can specify both relative and absolute paths to reach a file—any
path syntax the server understands will do.

Figure 15-30. Viewing files with relative paths

More generally, this script will display any file path for which
the username under which the CGI script runs has read access. On some
servers, this is often the user
“nobody”—
a predefined username with
limited permissions. Just about every server-side file used in web
applications will be accessible, though, or else they couldn’t be
referenced from browsers in the first place. When running our local
web server, every file on the computer can be inspected:
C:\Users\mark\Stuff\Websites\public_html\index.html
works fine when entered in the form of
Figure 15-29
on my laptop, for
example.

That makes for a flexible tool, but it’s also potentially
dangerous if you are running a server on a remote machine. What if we
don’t want users to be able to view some files on the server? For
example, in the next chapter, we will implement an encryption module
for email account passwords. On our server, it is in fact addressable
as
PyMailCgi/cgi-bin/secret.py
. Allowing
users to view that module’s source code would make encrypted passwords
shipped over the Net much more vulnerable to cracking.

To minimize this potential, the
getfile
script keeps a list,
privates
, of restricted filenames, and uses
the
os.path.samefile
built-in to
check whether a requested filename path points to one of the names on
privates
. The
samefile
call checks to see whether
the
os.stat
built-in
returns the same identifying information (device and inode numbers)
for both file paths. As a result, pathnames that look different
syntactically but reference the same file are treated as identical.
For example, on the server used for this book’s second edition, the
following paths to the encryptor module were different strings, but
yielded a true result from
os.path.samefile
:

../PyMailCgi/secret.py
/home/crew/lutz/public_html/PyMailCgi/secret.py

Unfortunately, the
os.path.samefile
call is supported on Unix, Linux, and Macs, but not on
Windows. To emulate its behavior in Windows, we expand file paths to
be absolute, convert to a common case, and compare (I shortened paths
in the following with
...
for
display here):

>>>
import os
>>>
os.path.samefile
AttributeError: 'module' object has no attribute 'samefile'
>>>
os.getcwd()
'C:\\...\\PP4E\\dev\\Examples\\PP4E\\Internet\\Web'
>>>
>>>
x = os.path.abspath('../Web/PYMailCgi/cgi-bin/secret.py').lower()
>>>
y = os.path.abspath('PyMailCgi/cgi-bin/secret.py').lower()
>>>
z = os.path.abspath('./PYMailCGI/cgi-bin/../cgi-bin/SECRET.py').lower()
>>>
x
'c:\\...\\pp4e\\dev\\examples\\pp4e\\internet\\web\\pymailcgi\\cgi-bin\\secret.py'
>>>
y
'c:\\...\\pp4e\\dev\\examples\\pp4e\\internet\\web\\pymailcgi\\cgi-bin\\secret.py'
>>>
z
'c:\\...\\pp4e\\dev\\examples\\pp4e\\internet\\web\\pymailcgi\\cgi-bin\\secret.py'
>>>
>>>
x == y, y == z
(True, True)

Accessing any of the three paths expanded here generates an
error page like that in
Figure 15-31
.
Notice how the names of secret files are global data in this module,
on the assumption that they pertain to files viewable across an entire
site; though we could allow for customization per site, changing the
script’s globals per site is likely just as convenient as changing a
per-site customization files.

Also notice that bona fide file errors are handled differently.
Permission problems and attempts to access nonexistent files, for
example, are trapped by a different exception handler clause, and they
display the exception’s message—fetched using Python’s
sys.exc_info
—to give additional context.
Figure 15-32
shows one such error
page.

Figure 15-31. Accessing private files

Figure 15-32. File errors display

As a general rule of thumb, file-processing exceptions should
always be reported in detail, especially during script debugging. If
we catch such exceptions in our scripts, it’s up to us to display the
details (assigning
sys.stderr
to
sys.stdout
won’t help if Python
doesn’t print an error message). The current exception’s type, data,
and traceback objects are always available in the
sys
module for manual
display.

Warning

Do not install the
getfile.py
script if
you truly wish to keep your files private! The private files list
check it uses attempts to prevent the encryption module from being
viewed directly with this script, but it may or may not handle all
possible attempts, especially on Windows. This book isn’t about
security, so we won’t go into further details here, except to say
that on the Internet, a little paranoia is often a good thing.
Especially for systems installed on the general Internet at large,
you should generally assume that the worst case scenario might
eventually happen.

Other books

Strings Attached by Mandy Baggot
Aftertime by Sophie Littlefield
All I Have to Give by Mary Wood
Seduction by Velvet
Snark and Stage Fright by Stephanie Wardrop
A Whale For The Killing by Farley Mowat
Nine White Horses by Judith Tarr
HMS Diamond by Tom Grundner