Programming Python (126 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
7.09Mb size Format: txt, pdf, ePub

class MailSenderAuthConsole(MailSenderAuth):
def askSmtpPassword(self):
import getpass
prompt = 'Password for %s on %s?' % (self.smtpUser, self.smtpServerName)
return getpass.getpass(prompt)
class SilentMailSender(SilentMailTool, MailSender):
pass # replaces trace
MailFetcher Class

The class
defined in
Example 13-24
does the work of
interfacing with a POP email server—loading, deleting, and
synchronizing. This class merits a few additional words of
explanation.

General usage

This module deals strictly in email text; parsing email after it
has been fetched is delegated to a different module in the package.
Moreover, this module doesn’t cache already loaded information;
clients must add their own mail-retention tools if desired. Clients
must also provide password input methods or pass one in, if they
cannot use the console input subclass here (e.g., GUIs and web-based
programs).

The loading and deleting tasks use the standard library
poplib
module in ways we saw earlier in this
chapter, but notice that there are interfaces for fetching just
message header text with the TOP action in POP if the mail server
supports it. This can save substantial time if clients need to fetch
only basic details for an email index. In addition, the header and
full-text fetchers are equipped to load just mails newer than a
particular number (useful once an initial load is run), and to
restrict fetches to a fixed-sized set of the mostly recently arrived
emails (useful for large inboxes with slow Internet access or
servers).

This module also supports the notion of progress indicators—for
methods that perform multiple downloads or deletions, callers may pass
in a function that will be called as each mail is processed. This
function will receive the current and total step numbers. It’s left up
to the caller to render this in a GUI, console, or other user
interface.

Unicode decoding for full mail text on fetches

Additionally, this
module is where we apply the session-wide message bytes
Unicode decoding policy required for parsing, as discussed earlier in
this chapter. This decoding uses an encoding name user setting in the
mailconfig
module, followed by
heuristics. Because this decoding is performed immediately when a mail
is fetched, all clients of this package can assume message text is
str
Unicode strings—including any
later parsing, display, or save operations. In addition to the
mailconfig
setting, we also apply a
few guesses with common encoding types, though it’s not impossible
that this may lead to problems if mails decoded by guessing cannot be
written to mail save fails using the
mailconfig
setting.

As described, this session-wide approach to encodings is not
ideal, but it can be adjusted per client session and reflects the
current limitations of
email
in
Python 3.1—its parser requires already decoded Unicode strings, but
fetches return bytes. If this decoding fails, as a last resort we
attempt to decode headers only, as either ASCII (or other common
format) text or the platform default, and insert an error message in
the email
body—
a heuristic that
attempts to avoid killing clients with exceptions if possible (see
file
_test-decoding.py
in the
examples package for a test of this logic). In practice, an 8-bit
Unicode encoding such as Latin-1 will probably suffice in most cases,
because ASCII was the original requirement of email standards.

In principle, we could try to search for encoding information in
message headers if it’s present, by parsing mails partially ourselves.
We might then take a per-message instead of per-session approach to
decoding full text, and associate an encoding type with each mail for
later processing such as saves, though this raises further
complications, as a save file can have just one (compatible) encoding,
not one per message. Moreover, character sets in email headers may
refer to individual components, not the entire email’s text. Since
most mails will conform to 7- or 8-bit standards, and since a future
email
release will likely address
this issue, extra complexity is probably not warranted for this case
in this book.

Also keep in mind that the Unicode decoding performed here is
for the entire mail text fetched from a server. Really, this is just
one part of the email encoding story in the Unicode-aware world of
today. In addition:

  • Payloads of parsed message parts may still be returned as
    bytes and require special handling or further Unicode decoding
    (see the parser module ahead).

  • Text parts and attachments in composed mails impose encoding
    choices as well (see the sender module earlier).

  • Message headers have their own encoding conventions, and may
    be both MIME and Unicode encoded if Internationalized (see both
    the parser and sender
    modules
    ).

Inbox synchronization tools

When you start
studying this example, you’ll also notice that
Example 13-24
devotes substantial
code to detecting synchronization errors between an email list held by
a client and the current state of the inbox at the POP email server.
Normally, POP assigns relative message numbers to email in the inbox,
and only adds newly arrived emails to the end of the inbox. As a
result, relative message numbers from an earlier fetch may usually be
used to delete and fetch in the future.

However, although rare, it is not impossible for the server’s
inbox to change in ways that invalidate previously fetched message
numbers. For instance, emails may be deleted in another client, and
the server itself may move mails from the inbox to an undeliverable
state on download errors (this may vary per ISP). In both cases, email
may be removed from the middle of the inbox, throwing some prior
relative message numbers out of sync with the server.

This situation can result in fetching the wrong message in an
email client—users receive a different message than the one they
thought they had selected. Worse, this can make deletions
inaccurate—if a mail client uses a relative message number in a delete
request, the wrong mail may be deleted if the inbox has changed since
the index was fetched.

To assist clients,
Example 13-24
includes tools, which
match message headers on deletions to ensure accuracy and perform
general inbox synchronization tests on demand. These tools are useful
only to clients that retain the fetched email list as state
information. We’ll use these in the PyMailGUI client in
Chapter 14
. There, deletions use the safe
interface, and loads run the on-demand synchronization test; on
detection of synchronization errors, the inbox index is automatically
reloaded. For now, see
Example 13-24
source code and
comments for more details.

Note that the synchronization tests try a variety of matching
techniques, but require the complete headers text and, in the worst
case, must parse headers and match many header fields. In many cases,
the single previously fetched
message-id
header field would be sufficient
for matching against messages in the server’s inbox. However, because
this field is optional and can be forged to have any value, it might
not always be a reliable way to identify messages. In other words, a
same-valued
message-id
may not
suffice to guarantee a match, although it can be used to identify a
mismatch; in
Example 13-24
,
the
message-id
is used to rule out
a match if either message has one, and they differ in value. This test
is performed before falling back on slower parsing and multiple header
matches.

Example 13-24. PP4E\Internet\Email\mailtools\mailFetcher.py

"""
###############################################################################
retrieve, delete, match mail from a POP server (see __init__ for docs, test)
###############################################################################
"""
import poplib, mailconfig, sys # client's mailconfig on sys.path
print('user:', mailconfig.popusername) # script dir, pythonpath, changes
from .mailParser import MailParser # for headers matching (4E: .)
from .mailTool import MailTool, SilentMailTool # trace control supers (4E: .)
# index/server msgnum out of synch tests
class DeleteSynchError(Exception): pass # msg out of synch in del
class TopNotSupported(Exception): pass # can't run synch test
class MessageSynchError(Exception): pass # index list out of sync
class MailFetcher(MailTool):
"""
fetch mail: connect, fetch headers+mails, delete mails
works on any machine with Python+Inet; subclass me to cache
implemented with the POP protocol; IMAP requires new class;
4E: handles decoding of full mail text on fetch for parser;
"""
def __init__(self, popserver=None, popuser=None, poppswd=None, hastop=True):
self.popServer = popserver or mailconfig.popservername
self.popUser = popuser or mailconfig.popusername
self.srvrHasTop = hastop
self.popPassword = poppswd # ask later if None
def connect(self):
self.trace('Connecting...')
self.getPassword() # file, GUI, or console
server = poplib.POP3(self.popServer)
server.user(self.popUser) # connect,login POP server
server.pass_(self.popPassword) # pass is a reserved word
self.trace(server.getwelcome()) # print returned greeting
return server
# use setting in client's mailconfig on import search path;
# to tailor, this can be changed in class or per instance;
fetchEncoding = mailconfig.fetchEncoding
def decodeFullText(self, messageBytes):
"""
4E, Py3.1: decode full fetched mail text bytes to str Unicode string;
done at fetch, for later display or parsing (full mail text is always
Unicode thereafter); decode with per-class or per-instance setting, or
common types; could also try headers inspection, or intelligent guess
from structure; in Python 3.2/3.3, this step may not be required: if so,
change to return message line list intact; for more details see Chapter 13;
an 8-bit encoding such as latin-1 will likely suffice for most emails, as
ASCII is the original standard; this method applies to entire/full message
text, which is really just one part of the email encoding story: Message
payloads and Message headers may also be encoded per email, MIME, and
Unicode standards; see Chapter 13 and mailParser and mailSender for more;
"""
text = None
kinds = [self.fetchEncoding] # try user setting first
kinds += ['ascii', 'latin1', 'utf8'] # then try common types
kinds += [sys.getdefaultencoding()] # and platform dflt (may differ)
for kind in kinds: # may cause mail saves to fail
try:
text = [line.decode(kind) for line in messageBytes]
break
except (UnicodeError, LookupError): # LookupError: bad name
pass
if text == None:
# try returning headers + error msg, else except may kill client;
# still try to decode headers per ascii, other, platform default;
blankline = messageBytes.index(b'')
hdrsonly = messageBytes[:blankline]
commons = ['ascii', 'latin1', 'utf8']
for common in commons:
try:
text = [line.decode(common) for line in hdrsonly]
break
except UnicodeError:
pass
else: # none worked
try:
text = [line.decode() for line in hdrsonly] # platform dflt?
except UnicodeError:
text = ['From: (sender of unknown Unicode format headers)']
text += ['', '--Sorry: mailtools cannot decode this mail content!--']
return text
def downloadMessage(self, msgnum):
"""
load full raw text of one mail msg, given its
POP relative msgnum; caller must parse content
"""
self.trace('load ' + str(msgnum))
server = self.connect()
try:
resp, msglines, respsz = server.retr(msgnum)
finally:
server.quit()
msglines = self.decodeFullText(msglines) # raw bytes to Unicode str
return '\n'.join(msglines) # concat lines for parsing
def downloadAllHeaders(self, progress=None, loadfrom=1):
"""
get sizes, raw header text only, for all or new msgs
begins loading headers from message number loadfrom
use loadfrom to load newly arrived mails only
use downloadMessage to get a full msg text later
progress is a function called with (count, total);
returns: [headers text], [mail sizes], loadedfull?
4E: add mailconfig.fetchlimit to support large email
inboxes: if not None, only fetches that many headers,
and returns others as dummy/empty mail; else inboxes
like one of mine (4K emails) are not practical to use;
4E: pass loadfrom along to downloadAllMsgs (a buglet);
"""
if not self.srvrHasTop: # not all servers support TOP
# naively load full msg text
return self.downloadAllMsgs(progress, loadfrom)
else:
self.trace('loading headers')
fetchlimit = mailconfig.fetchlimit
server = self.connect() # mbox now locked until quit
try:
resp, msginfos, respsz = server.list() # 'num size' lines list
msgCount = len(msginfos) # alt to srvr.stat[0]
msginfos = msginfos[loadfrom-1:] # drop already loadeds
allsizes = [int(x.split()[1]) for x in msginfos]
allhdrs = []
for msgnum in range(loadfrom, msgCount+1): # poss empty
if progress: progress(msgnum, msgCount) # run callback
if fetchlimit and (msgnum <= msgCount - fetchlimit):
# skip, add dummy hdrs
hdrtext = 'Subject: --mail skipped--\n\n'
allhdrs.append(hdrtext)
else:
# fetch, retr hdrs only
resp, hdrlines, respsz = server.top(msgnum, 0)
hdrlines = self.decodeFullText(hdrlines)
allhdrs.append('\n'.join(hdrlines))
finally:
server.quit() # make sure unlock mbox
assert len(allhdrs) == len(allsizes)
self.trace('load headers exit')
return allhdrs, allsizes, False
def downloadAllMessages(self, progress=None, loadfrom=1):
"""
load full message text for all msgs from loadfrom..N,
despite any caching that may be being done in the caller;
much slower than downloadAllHeaders, if just need hdrs;
4E: support mailconfig.fetchlimit: see downloadAllHeaders;
could use server.list() to get sizes of skipped emails here
too, but clients probably don't care about these anyhow;
"""
self.trace('loading full messages')
fetchlimit = mailconfig.fetchlimit
server = self.connect()
try:
(msgCount, msgBytes) = server.stat() # inbox on server
allmsgs = []
allsizes = []
for i in range(loadfrom, msgCount+1): # empty if low >= high
if progress: progress(i, msgCount)
if fetchlimit and (i <= msgCount - fetchlimit):
# skip, add dummy mail
mailtext = 'Subject: --mail skipped--\n\nMail skipped.\n'
allmsgs.append(mailtext)
allsizes.append(len(mailtext))
else:
# fetch, retr full mail
(resp, message, respsz) = server.retr(i) # save text on list
message = self.decodeFullText(message)
allmsgs.append('\n'.join(message)) # leave mail on server
allsizes.append(respsz) # diff from len(msg)
finally:
server.quit() # unlock the mail box
assert len(allmsgs) == (msgCount - loadfrom) + 1 # msg nums start at 1
#assert sum(allsizes) == msgBytes # not if loadfrom > 1
return allmsgs, allsizes, True # not if fetchlimit
def deleteMessages(self, msgnums, progress=None):
"""
delete multiple msgs off server; assumes email inbox
unchanged since msgnums were last determined/loaded;
use if msg headers not available as state information;
fast, but poss dangerous: see deleteMessagesSafely
"""
self.trace('deleting mails')
server = self.connect()
try:
for (ix, msgnum) in enumerate(msgnums): # don't reconnect for each
if progress: progress(ix+1, len(msgnums))
server.dele(msgnum)
finally: # changes msgnums: reload
server.quit()
def deleteMessagesSafely(self, msgnums, synchHeaders, progress=None):
"""
delete multiple msgs off server, but use TOP fetches to
check for a match on each msg's header part before deleting;
assumes the email server supports the TOP interface of POP,
else raises TopNotSupported - client may call deleteMessages;
use if the mail server might change the inbox since the email
index was last fetched, thereby changing POP relative message
numbers; this can happen if email is deleted in a different
client; some ISPs may also move a mail from inbox to the
undeliverable box in response to a failed download;
synchHeaders must be a list of already loaded mail hdrs text,
corresponding to selected msgnums (requires state); raises
exception if any out of synch with the email server; inbox is
locked until quit, so it should not change between TOP check
and actual delete: synch check must occur here, not in caller;
may be enough to call checkSynchError+deleteMessages, but check
each msg here in case deletes and inserts in middle of inbox;
"""
if not self.srvrHasTop:
raise TopNotSupported('Safe delete cancelled')
self.trace('deleting mails safely')
errmsg = 'Message %s out of synch with server.\n'
errmsg += 'Delete terminated at this message.\n'
errmsg += 'Mail client may require restart or reload.'
server = self.connect() # locks inbox till quit
try: # don't reconnect for each
(msgCount, msgBytes) = server.stat() # inbox size on server
for (ix, msgnum) in enumerate(msgnums):
if progress: progress(ix+1, len(msgnums))
if msgnum > msgCount: # msgs deleted
raise DeleteSynchError(errmsg % msgnum)
resp, hdrlines, respsz = server.top(msgnum, 0) # hdrs only
hdrlines = self.decodeFullText(hdrlines)
msghdrs = '\n'.join(hdrlines)
if not self.headersMatch(msghdrs, synchHeaders[msgnum-1]):
raise DeleteSynchError(errmsg % msgnum)
else:
server.dele(msgnum) # safe to delete this msg
finally: # changes msgnums: reload
server.quit() # unlock inbox on way out
def checkSynchError(self, synchHeaders):
"""
check to see if already loaded hdrs text in synchHeaders
list matches what is on the server, using the TOP command in
POP to fetch headers text; use if inbox can change due to
deletes in other client, or automatic action by email server;
raises except if out of synch, or error while talking to server;
for speed, only checks last in last: this catches inbox deletes,
but assumes server won't insert before last (true for incoming
mails); check inbox size first: smaller if just deletes; else
top will differ if deletes and newly arrived messages added at
end; result valid only when run: inbox may change after return;
"""
self.trace('synch check')
errormsg = 'Message index out of synch with mail server.\n'
errormsg += 'Mail client may require restart or reload.'
server = self.connect()
try:
lastmsgnum = len(synchHeaders) # 1..N
(msgCount, msgBytes) = server.stat() # inbox size
if lastmsgnum > msgCount: # fewer now?
raise MessageSynchError(errormsg) # none to cmp
if self.srvrHasTop:
resp, hdrlines, respsz = server.top(lastmsgnum, 0) # hdrs only
hdrlines = self.decodeFullText(hdrlines)
lastmsghdrs = '\n'.join(hdrlines)
if not self.headersMatch(lastmsghdrs, synchHeaders[-1]):
raise MessageSynchError(errormsg)
finally:
server.quit()
def headersMatch(self, hdrtext1, hdrtext2):
""""
may not be as simple as a string compare: some servers add
a "Status:" header that changes over time; on one ISP, it
begins as "Status: U" (unread), and changes to "Status: RO"
(read, old) after fetched once - throws off synch tests if
new when index fetched, but have been fetched once before
delete or last-message check; "Message-id:" line is unique
per message in theory, but optional, and can be anything if
forged; match more common: try first; parsing costly: try last
"""
# try match by simple string compare
if hdrtext1 == hdrtext2:
self.trace('Same headers text')
return True
# try match without status lines
split1 = hdrtext1.splitlines() # s.split('\n'), but no final ''
split2 = hdrtext2.splitlines()
strip1 = [line for line in split1 if not line.startswith('Status:')]
strip2 = [line for line in split2 if not line.startswith('Status:')]
if strip1 == strip2:
self.trace('Same without Status')
return True
# try mismatch by message-id headers if either has one
msgid1 = [line for line in split1 if line[:11].lower() == 'message-id:']
msgid2 = [line for line in split2 if line[:11].lower() == 'message-id:']
if (msgid1 or msgid2) and (msgid1 != msgid2):
self.trace('Different Message-Id')
return False
# try full hdr parse and common headers if msgid missing or trash
tryheaders = ('From', 'To', 'Subject', 'Date')
tryheaders += ('Cc', 'Return-Path', 'Received')
msg1 = MailParser().parseHeaders(hdrtext1)
msg2 = MailParser().parseHeaders(hdrtext2)
for hdr in tryheaders: # poss multiple Received
if msg1.get_all(hdr) != msg2.get_all(hdr): # case insens, dflt None
self.trace('Diff common headers')
return False
# all common hdrs match and don't have a diff message-id
self.trace('Same common headers')
return True
def getPassword(self):
"""
get POP password if not yet known
not required until go to server
from client-side file or subclass method
"""
if not self.popPassword:
try:
localfile = open(mailconfig.poppasswdfile)
self.popPassword = localfile.readline()[:-1]
self.trace('local file password' + repr(self.popPassword))
except:
self.popPassword = self.askPopPassword()
def askPopPassword(self):
assert False, 'Subclass must define method'
################################################################################
# specialized subclasses
################################################################################

Other books

The War of the Roses by Warren Adler
Abominations by P. S. Power
Marjorie Morningstar by Herman Wouk
The Bright Forever by Lee Martin
All for You by Laura Florand
Coming Home for Christmas by Patricia Scanlan
WalkingHaunt by Viola Grace
The Bobcat's Tate by Georgette St. Clair