[Zope] DTML, Zope and Regex
Duncan Booth
duncan@rcp.co.uk
Fri, 12 Jul 2002 14:12:09 +0100
[My first attempt to send this failed to post it to the list]
Kirk Lowery <klowery@wts.edu> wrote in news:3D2B0727.1030401@wts.edu:
> Ben, would you mind expanding on this? What dangers are there? Regexes
> are so handy, and if I turn them on I'd like to know what the risks
> are...
Regular expressions can easily lead to unbounded CPU usage. Provided you
don't let any untrusted users edit anything that can call regular
expressions, and provided that you fully understand the issues involved,
then you may be able to use regular expressions.
If the regular expression comes from an untrusted source, or if you don't
think it through carefully enough, then you may find that you can
effectively block one thread of your Zope server. For the sort of use you
proposed, this is unlikely to happen. The problem is that it is almost
impossible to work out in advance which regular expressions will cause
problems.
Try this at a Python interactive prompt:
>>> import re
>>> s = 'a'+'b'*1024+'c'
>>> re.match('a.*.*b.*c',s)
>>> re.match('a.*.*b.*d',s)
The first match will complete instantly, but the second one will probably
take a few seconds to fail.
(My memory may be a bit rusty on this next bit)
There are traditionally two ways to match regular expressions.
Deterministic Finite-state Automata (DFA), and Non-deterministic Finite-
state Automata (NFA). Most software uses NFA matching, which requires a
small amount of memory to compile the regex, but requires a potentially
unbounded time to do the match. DFA matching can match a regex in linear
time (i.e. linear on the search string), but may require a very large
amount of memory (and CPU) to compile the regular expression. If someone
produced a DFA engine for Python which had the option of limiting the size
of the compiled expression (and throwing an exception for any that were too
complex), then you could probably expose this safely in Zope. There may be
other problems with DFAs though, e.g. I'm not sure if you can match groups
easily with a DFA.
DFA matches are best used when you have a fixed pattern so that the
overhead for compilation is only hit once (e.g. parsers). With a DFA engine
you might implement a persistent regex object in the ZODB. Not all regexes
would be compilable into the object, but once you managed to compile one
you would know it would work.
--
Duncan Booth duncan@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
--
Duncan Booth duncan@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?