Re: [Zope] indexing pdf files

1 Sep 2000

      Terry Kerr wrote:
...
Hi,
I need to be able to index the text within pdf files.  I assume I will
somehow use PrincipiaSearchSource, but I need to know how to get the
text out of the pdf when it is uploaded to the ZODB.  Has anyone done
this before?  Are there any packages around that I can use that run in
python or at least on a linux box that I can pipe to and from?
terry
from xml2pdf there are a multitude of ways in python

XSLT - check out the ibm.com/developer xmlzone they have an article in
the education lib for transforming xml to pdf.

platypus packages from
http://www.reportlab.com/

they might give you some help in going the other way..

as for implementation... 

looking at a pdf in a text viewer it appears to be formating text and
encoded display strings. 

you could write a subclass of file, which read its content upon upload
stripping the formatting string and decoding the display strings and
storing that as a property to be indexed. 

Kapil

Re: [Zope] indexing pdf files

Kapil Thangavelu