[Zope] Using RegEx in Script(Python)
Joel Burton
joel@joelburton.com
Mon, 30 Jun 2003 19:21:07 -0400
On Fri, Jun 27, 2003 at 01:09:27AM +0200, Andreas Pakulat wrote:
> Hi,
>
> I wanted to know if the above can be done? What I need is a function
> that replaces every character of a string, that is not in [a-zA-Z1-9]
> with an underscore. I want to use this to automatically create an
> Object-Id from a title, to create a new Object.
>
> If this is not possible directly within a Script(Python), can it be done
> using an ExternalMethod? I suppose yes.
>
> Andreas
If you're looking to have a "clean-zope-id" method, we use the
following. A simple regex solution can sometimes forget to fix things
like leading underscores, or getting rid of double underscores or such.
I actually do this w/o regexes using translate(), but regexs might be
faster. Feel free to benchmark and say so. ;)
#!/usr/bin/env python2.1
"""ConvertStringToID
Converts a string into a Zope-safe ID.
This removes all non-identifier safe characters. It replaces
most with underscores, while trying to make the ID match a
sensible choice (eg "Bill's House" -> "bills_house", not "bill_s_house").
The output is always lowercase, and any leading underscores are
removed (as they would be illegal in Zope.
"""
import string
tt = '______________________________________________._0123456789_______abcdefghijklmnopqrstuvwxy_______abcdefghijklmnopqrstuvwxyz_____________________________________________________________________________________________________________________________________'
def ConvertStringToID(s, maxlen=None):
"""
Convert String to ID
s = string to convert
maxlen = maximum length of ID
returns string.
"""
# translate most things to underscore. remove punctuation below w/o translating
s = string.translate(s, tt, '!@#$%^&*()-=+,\'"')
# remove ALL double-underscores
while s.find("__") > -1:
s = s.replace('__','_')
# when we use py2.2.2, this and below can simply be s = s.strip("_"). yeah!
# trim underscores off front
while s.startswith("_"):
s = s[1:]
# trim underscores off end
while s.endswith("_"):
s = s[:-1]
# trim to maxlength
if maxlen and len(s) > maxlen:
s = s[:maxlen]
return s
if __name__ == '__main__':
assert ConvertStringToID("____A Lover's % Tale (Of 2 Cities).doc_") == "a_lovers_tale_of_2_cities.doc"
HTH.
--
Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton
Independent Knowledge Management Consultant