[Zope-CMF] script crashing zope

Tres Seaver tseaver@palladion.com
Fri, 21 Dec 2001 10:42:23 -0500


Dan Keshet wrote:

> I've written an external python script to convert plain old html documents
> (stored as DTML documents) to CMF Documents.  It works on small documents,
> but anything moderately large and it crashes the server.  I'm not quite
> sure what moderately large is b/c I wasn't keen to keep crashing the
> server, but it's between 1709 bytes and 5094 bytes.
> 
> So...
> 
> 1) Is this a general Zope bug (well, clearly it shouldn't be crashing) or
> a CMF-specific bug?
> 
> 2) Does anybody have a workaround or a script that they've written for the
> same purpose?
> 
> Thanks,
> 
> Dan
> 
> 
> Setup: CMF1.1, Zope 2.4.3, python 2.1.1, freebsd4.1:
> 
> ---Begin script----
> def convert(self):
>         from Products.CMFDefault.Document import addDocument
> 
>         text = self.document_src()
>         title = self.title_or_id()
>         id = self.getId()
>         self.manage_renameObject(id, id + '.dtml')
>         self.manage_addProduct['CMFDefault'].addDocument( id, title, '',
> "html", text)
> ---- End Script----

Dan,

There was a bug in CMF 1.1 which had similar symptoms.  Can you
try either CMF 1.2 beta1 or a CVS checkout and let us know if the
problem persists?

You could also apply the fix yourself.  Here is the diff between
CMF-1_1-release and the fix for the bug::

--- CMF/CMFDefault/Document.py	2001/06/05 18:23:53	1.24
+++ CMF/CMFDefault/Document.py	2001/08/13 21:00:18	1.28
@@ -239,10 +243,8 @@
      security.declarePrivate('guessFormat')
      def guessFormat(self, text):
          """ Simple stab at guessing the inner format of the text """
-        if bodyfinder.search(text) is not None:
-            return 'html'
-        else:
-            return 'structured-text'
+        if utils.html_headcheck(text): return 'html'
+        else: return 'structured-text'

      security.declarePrivate('handleText')
      def handleText(self, text, format=None, stx_level=None):
@@ -260,9 +262,9 @@
              headers.update(parser.metatags)
              if parser.title:
                  headers['Title'] = parser.title
-            bodyfound = bodyfinder.search(text)
+            bodyfound = bodyfinder(text)
              if bodyfound:
-                cooked = body = bodyfound.group('bodycontent')
+                cooked = body = bodyfound
          else:
              headers, body = parseHeadersBody(text, headers)
              cooked = _format_stx(text=body, level=level)


and here is the diff to CMFDefault.utils::

--- CMF/CMFDefault/utils.py	2001/06/05 23:01:12	1.6
+++ CMF/CMFDefault/utils.py	2001/08/13 21:08:00	1.8
@@ -141,10 +141,18 @@
          self.setliteral()


-bodyfinder = re.compile(r'<body.*?>(?P<bodycontent>.*?)</body>',
-                        re.DOTALL|re.I)
-htfinder = re.compile(r'<html', re.DOTALL|re.I)
+_bodyre = re.compile(r'<body.*?>', re.DOTALL|re.I)
+_endbodyre = re.compile(r'</body', re.DOTALL|re.I)
+
+def bodyfinder(text):
+    bod = _bodyre.search(text)
+    if not bod: return text

+    end = _endbodyre.search(text)
+    if not end: return text
+    else: return text[bod.end():end.start()]
+
+htfinder = re.compile(r'<html', re.DOTALL|re.I)
  def html_headcheck(html):
      """ Returns 'true' if document looks HTML-ish enough """
      if not htfinder.search(html):
@@ -156,5 +164,5 @@
              continue
          elif lower(line[:5]) == '<html':
              return 1
-        elif line[:2] not in ('<!', '<?'):
+        elif line[0] != '<':
              return 0


Tres.
-- 
===============================================================
Tres Seaver                                tseaver@zope.com
Zope Corporation      "Zope Dealers"       http://www.zope.com