[Zope] Email address validator?

Barry A. Warsaw barry@zope.com
Sat, 24 Nov 2001 12:49:34 -0500


>>>>> "BE" == Bruce Eckel <Bruce@EckelObjects.com> writes:

    BE> Worked like a charm! It turns out I was handing it an invalid
    BE> "from" email address.

    BE> Which brings up another question -- is there a piece of code
    BE> that will validate email addresses somewhere? Seems like it
    BE> would be a common tool (I'll bet the spammers have it!).

Do you mean you want to find out if a string looks like a valid email
address, or that it actually successfully delivers to some end
destination?  The former is doable, the latter is exceedingly
difficult!

Mailman has the following bit of code, which accomplishes the goal set
out in its docstring <wink>:

-------------------- snip snip --------------------
def ParseEmail(email):
    user = None
    domain = None
    email = email.lower()
    at_sign = email.find('@')
    if at_sign < 1:
	return email, None
    user = email[:at_sign]
    rest = email[at_sign+1:]
    domain = rest.split('.')
    return user, domain

_badchars = re.compile('[][()<>|;^,/]')

def ValidateEmail(s):
    """Verify that the an email address isn't grossly evil."""
    # Pretty minimal, cheesy check.  We could do better...
    if not s:
        raise Errors.MMBadEmailError
    if _badchars.search(s) or s[0] == '-':
        raise Errors.MMHostileAddress
    user, domain_parts = ParseEmail(s)
    # This means local, unqualified addresses, are no allowed
    if not domain_parts:
        raise Errors.MMBadEmailError
    if len(domain_parts) < 2:
	raise Errors.MMBadEmailError
-------------------- snip snip --------------------

It seems to be fairly successful in keeping out really bad addresses.

There are several things you can try to do to find out if the email
address is undeliverable.  Whether or not that actually reaches the
intended individual and whether or not they actually read the message
is a whole 'nutha story.

First, you'd need to lookup the MX for the hostname part of the email
address.  Then you'd have to try each MX in the order of priority.
Connect to port 25 for each machine and issue an EHLO command.  See if
the remote server supports the VRFY command.  If so, you can try
sending the address in a VRFY and see what response you get back.  If
you get an error back, it's possible <wink> that the email address
isn't valid.

It's probably a much greater possibility that VRFY isn't supported.
So now you're left to attempting to deliver the message and waiting
asynchronously for a bounce, and there you've got roughly two choices.
You could route bounces to a mail robot that attempts to decipher the
bazillion or so bounce formats dreamed up as cute by the mailer's
authors (and remember one hacker's dream is another's nightmare), or
you can attempt to encode the recipients address in the envelope
sender address of the original delivery.  This is called VERP
(technically VERP-like if your app, and not your mailer does it), and
can be pretty successful in unambiguously discovering bouncing
addresses, but you need cooperation from your local mailer (the one
receiving the bounce messages).

All the above only helps you to find obviously bogus email addresses,
and as you can guess there are tons of ways the above recipies can
hide invalid addresses or unattended mailboxes.  Trying to use headers
like Return-Reciept-To: and friends really doesn't help, since they're
just hints, not requirements (and who follows the specs anyway?
Judging by email standards, not many ;).

are-we-having-fun-yet?-ly y'rs,
-Barry