[Zope] Indexing and plaintext display gives PDF errors

sean.upton@uniontrib.com sean.upton@uniontrib.com
Mon, 04 Jun 2001 16:08:47 -0700


I don't have a solution, but if you want to do some more digging...

I think I had the same problem with the DocumentLibrary pdf converter.  You
may want to comment out (what I think is) the last line in
lib/python/Products/DocumentLibrary/FileConverters/pdf.py (it deletes the
temp pdf file), which will keep the pdf (saved in /tmp) from being deleted;
you can then go in, and take a look at that file to see if it is corrupt,
and compare it to the one you uploaded.  I did, and that is what I found -
it is not saving the entire file out... fixing it is a different story
(someone else might have an answer).

Sean

-----Original Message-----
From: Leigh Ann Hildebrand [mailto:leighann@onebox.com]
Sent: Monday, June 04, 2001 2:19 PM
To: zope@zope.org
Cc: cduncan@kaivo.com
Subject: [Zope] Indexing and plaintext display gives PDF errors


I'm using Zope 2.3.2 with Python 1.5.2 running on Redhat. I don't use
Python, I work in DTML. I'm cataloging technical documents. I do not
use Document Library or the CMF, in part because of compatibility
restrictions.
(The site must support NetPositive, a non-javascript, non-CSS compatible
browser.) The documents I'm indexing are html, text, Word, PowerPoint,
and PDF files. 

I have the CMF and the Document Library product installed; I also had
installed wvWare, though I'm not sure I installed it correctly. (The
instructions were vague.)

This is my problem. When I update my Catalog, I get a number of errors
on the linux box that runs my Zope installation, related to PDF files:

Error (0): PDF file is damaged - attempting to construct xref table ...
Error: Top level pages is wrong type (null)
Error: Couldn't read page catalog
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table


These repeat a few times, giving me two screens worth, before the index
updating is complete. I can think of at least one problem that might
be going on here: I think some PDF documents were added as type
"DocumentFile",
which is related to the DocumentLibrary stuff. 

Anyway, I'm trying to get rid of the errors, and be able to index the
text of PDF and Word files. Suggestions? I'm forwarding this to the
DocumentLibrary
product engineer, too. 

Leigh Ann

-- 
Leigh Ann Hildebrand
leighann@onebox.com - email
(650) 223-2199 x2231 - voicemail/fax



__________________________________________________
FREE voicemail, email, and fax...all in one place.
Sign Up Now! http://www.onebox.com


_______________________________________________
Zope maillist  -  Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )