[Zope] Re: Non-caching version of POPMail ?

Chris McDonough chrism@digicool.com
Tue, 3 Apr 2001 19:25:45 -0400


Here's a fun regex for matching email addresses from that book:

#
# Program to build a regex to match an internet email address,
# from Chapter 7 of _Mastering Regular Expressions_ (Friedl / O'Reilly)
# (http://www.ora.com/catalog/regexp/)
#
# Optimized version.
#
# Copyright 1997 O'Reilly & Associates, Inc.
#



# Some things for avoiding backslashitis later on.
$esc        = '\\\\';               $Period      = '\.';
$space      = '\040';               $tab         = '\t';
$OpenBR     = '\[';                 $CloseBR     = '\]';
$OpenParen  = '\(';                 $CloseParen  = '\)';
$NonASCII   = '\x80-\xff';          $ctrl        = '\000-\037';
$CRlist     = '\n\015';  # note: this should really be only \015.

# Items 19, 20, 21
$qtext = qq/[^$esc$NonASCII$CRlist\"]/;               # for within "..."
$dtext = qq/[^$esc$NonASCII$CRlist$OpenBR$CloseBR]/;  # for within [...]
$quoted_pair = qq< $esc [^$NonASCII] >; # an escaped character

############################################################################
##
# Items 22 and 23, comment.
# Impossible to do properly with a regex, I make do by allowing at most one
level of nesting.
$ctext   = qq< [^$esc$NonASCII$CRlist()] >;

# $Cnested matches one non-nested comment.
# It is unrolled, with normal of $ctext, special of $quoted_pair.
$Cnested = qq<
   $OpenParen                            #  (
      $ctext*                            #     normal*
      (?: $quoted_pair $ctext* )*        #     (special normal*)*
   $CloseParen                           #                       )
>;

# $comment allows one level of nested parentheses
# It is unrolled, with normal of $ctext, special of ($quoted_pair|$Cnested)
$comment = qq<
   $OpenParen                              #  (
       $ctext*                             #     normal*
       (?:                                 #       (
          (?: $quoted_pair | $Cnested )    #         special
           $ctext*                         #         normal*
       )*                                  #            )*
   $CloseParen                             #                )
>;

############################################################################
##

# $X is optional whitespace/comments.
$X = qq<
   [$space$tab]*                    # Nab whitespace.
   (?: $comment [$space$tab]* )*    # If comment found, allow more spaces.
>;



# Item 10: atom
$atom_char   = qq/[^($space)<>\@,;:\".$esc$OpenBR$CloseBR$ctrl$NonASCII]/;
$atom = qq<
  $atom_char+    # some number of atom characters...
  (?!$atom_char) # ..not followed by something that could be part of an atom
>;

# Item 11: doublequoted string, unrolled.
$quoted_str = qq<
    \"                                     # "
       $qtext *                            #   normal
       (?: $quoted_pair $qtext * )*        #   ( special normal* )*
    \"                                     #        "
>;

# Item 7: word is an atom or quoted string
$word = qq<
    (?:
       $atom                 # Atom
       |                       #  or
       $quoted_str           # Quoted string
     )
>;

# Item 12: domain-ref is just an atom
$domain_ref  = $atom;

# Item 13: domain-literal is like a quoted string, but [...] instead of
"..."
$domain_lit  = qq<
    $OpenBR                            # [
    (?: $dtext | $quoted_pair )*     #    stuff
    $CloseBR                           #           ]
>;

# Item 9: sub-domain is a domain-ref or domain-literal
$sub_domain  = qq<
  (?:
    $domain_ref
    |
    $domain_lit
   )
   $X # optional trailing comments
>;

# Item 6: domain is a list of subdomains separated by dots.
$domain = qq<
     $sub_domain
     (?:
        $Period $X $sub_domain
     )*
>;

# Item 8: a route. A bunch of "@ $domain" separated by commas, followed by a
colon.
$route = qq<
    \@ $X $domain
    (?: , $X \@ $X $domain )*  # additional domains
    :
    $X # optional trailing comments
>;

# Item 6: local-part is a bunch of $word separated by periods
$local_part = qq<
    $word $X
    (?:
        $Period $X $word $X # additional words
    )*
>;

# Item 2: addr-spec is local@domain
$addr_spec  = qq<
  $local_part \@ $X $domain
>;

# Item 4: route-addr is <route? addr-spec>
$route_addr = qq[
    < $X                 # <
       (?: $route )?     #       optional route
       $addr_spec        #       address spec
    >                    #                 >
];


# Item 3: phrase........
$phrase_ctrl = '\000-\010\012-\037'; # like ctrl, but without tab

# Like atom-char, but without listing space, and uses phrase_ctrl.
# Since the class is negated, this matches the same as atom-char plus space
and tab
$phrase_char =
   qq/[^()<>\@,;:\".$esc$OpenBR$CloseBR$NonASCII$phrase_ctrl]/;

# We've worked it so that $word, $comment, and $quoted_str to not consume
trailing $X
# because we take care of it manually.
$phrase = qq<
   $word                        # leading word
   $phrase_char *               # "normal" atoms and/or spaces
   (?:
      (?: $comment | $quoted_str ) # "special" comment or quoted string
      $phrase_char *            #  more "normal"
   )*
>;

## Item #1: mailbox is an addr_spec or a phrase/route_addr
$mailbox = qq<
    $X                                  # optional leading comment
    (?:
            $addr_spec                  # address
            |                             #  or
            $phrase  $route_addr      # name and address
     )
>;



###########################################################################
# Here's a little snippet to test it.
# Addresses given on the commandline are described.
#

my $error = 0;
my $valid;
foreach $address (@ARGV) {
    $valid = $address =~ m/^$mailbox$/xo;
    printf "`$address' is syntactically %s.\n", $valid ? "valid" :
"invalid";
    $error = 1 if not $valid;
}
exit $error;



----- Original Message -----
From: "David Shaw" <david.shaw@zapmedia.com>
To: "Loren Stafford" <lstafford@morphics.com>
Cc: <zope@zope.org>
Sent: Tuesday, April 03, 2001 6:37 PM
Subject: [Zope] Re: Non-caching version of POPMail ?


> I actually solved this on my working version in a different way.  I simply
> call UIDL on a refresh and if any of the UIDs I have are not in the UID
list
> from the server, I delete the message from the MessageDict.  It
accomplishes
> the same thing without the large performance hit of not caching.
>
> I've started working on this product again.  I'd be happy to send you my
> current working revision if you want to take a look.  It's not anything
> significant enough yet to warrant a new release, but I am making it
better.
> My next adventure is to do better message parsing to make URLs and email
> addresses clickable.  I just got the O'Reilly Regular Expression book and
> plan on delving into it when I get a chance.
>
>
> Loren Stafford said:
>
> > David,
> >
> > I'm planning to modify POPMail (or more likely, make a derived product
> > POPMailNc) so that there is no persistent message cache. I just wanted
to
> > pass the idea by you, so you could tell me if I'm doing something stupid
or
> > if you've already solved my problem for me in a newer version of
POPMail.
> >
> > I'm using POPMail to couple my Zope server to another product
> > (PerfectTracker). Zope initiates Tracker incidents via sendmail and
receives
> > responses from the Tracker at a dedicated POP3 mailbox, which it polls
> > (using Xron) every 5-10 minutes. I've discovered that if I ever delete
> > messages from the mailbox manually (i.e. using a POP client other than
> > POPMail) POPMail's persistent message cache gets hopelessly out of sync
with
> > the mailbox, and it begins to deliver messages from its cache when there
are
> > new messages with the same UID in the mailbox. That's bad.
> >
> > While I don't need to delete messages behind POPMail's back, I can't
really
> > prevent someone from doing so, due to the nature of our mail system. So
I
> > propose to make POPMail's cache non-persistent (I guess it really
wouldn't
> > be a cache then, would it?). I don't expect a performance problem,
because,
> > if I delete old messages regularly I will never have more than a few
hundred
> > messages in the mailbox. (Messages correspond to new employees.)
> >
> > I think I can make uidDict and MessageDict nonpersistent simply by
changing
> > their names to _v_uidDict and _v_MessasgeDict. Is that correct?
> >
> > -- Thanks for your input
> > -- Loren
> >
>
> --
> David Shaw -- Senior Software Developer -- ZapMedia -- 678.420.2715
>
>
>
> _______________________________________________
> Zope maillist  -  Zope@zope.org
> http://lists.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope-dev )
>