[My first attempt to send this failed to post it to the list] Kirk Lowery <klowery@wts.edu> wrote in news:3D2B0727.1030401@wts.edu:
Ben, would you mind expanding on this? What dangers are there? Regexes are so handy, and if I turn them on I'd like to know what the risks are...
Regular expressions can easily lead to unbounded CPU usage. Provided you don't let any untrusted users edit anything that can call regular expressions, and provided that you fully understand the issues involved, then you may be able to use regular expressions. If the regular expression comes from an untrusted source, or if you don't think it through carefully enough, then you may find that you can effectively block one thread of your Zope server. For the sort of use you proposed, this is unlikely to happen. The problem is that it is almost impossible to work out in advance which regular expressions will cause problems. Try this at a Python interactive prompt:
import re s = 'a'+'b'*1024+'c' re.match('a.*.*b.*c',s) re.match('a.*.*b.*d',s)
The first match will complete instantly, but the second one will probably take a few seconds to fail. (My memory may be a bit rusty on this next bit) There are traditionally two ways to match regular expressions. Deterministic Finite-state Automata (DFA), and Non-deterministic Finite- state Automata (NFA). Most software uses NFA matching, which requires a small amount of memory to compile the regex, but requires a potentially unbounded time to do the match. DFA matching can match a regex in linear time (i.e. linear on the search string), but may require a very large amount of memory (and CPU) to compile the regular expression. If someone produced a DFA engine for Python which had the option of limiting the size of the compiled expression (and throwing an exception for any that were too complex), then you could probably expose this safely in Zope. There may be other problems with DFAs though, e.g. I'm not sure if you can match groups easily with a DFA. DFA matches are best used when you have a fixed pattern so that the overhead for compilation is only hit once (e.g. parsers). With a DFA engine you might implement a persistent regex object in the ZODB. Not all regexes would be compilable into the object, but once you managed to compile one you would know it would work. -- Duncan Booth duncan@rcp.co.uk int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3" "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure? -- Duncan Booth duncan@rcp.co.uk int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3" "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?