RE: [Zope-dev] cataloging binary files (pdf's, word docs...)

18 Feb 2000

      From: Roman Milner [mailto:roman@speeder.com]
...
I'm trying to come up with a way to catalog PDF's and Word docs.  It
is  easy to write python methods to pull the text ouf of these.  The
problem is that we already have tons of them in our ZODB as file
objects.
The only thing I can think of is to make a zclass class for each type
(ie. PDFFile type) that has a method that knows how to 
extract the text
from the pdf and have zcatalog catalog that property.  But this means
re-creating all the binary files currenlty in our ZODB.
Can any one offer any better suggestions?  I could write a python
method that extracted the text based on mime type but I can't go back
and ad that method to each file object.
Thanks for any help.
You could write an External Method, then acquire that method onto the
File object. Let's call it FileToText:

  def FileToText(self):
      # do watherever you want with self,
      # it is the File object.
      # return some text.

Then ad a textindex on FileToText, and you can start cataloguing your
binary File objects.

-- 
Martijn Pieters, Software Engineer 
| Digital Creations http://www.digicool.com 
| Creators of Zope      http://www.zope.org 
| mailto:mj@digicool.com       ICQ: 4532236
| PGP:
http://wwwkeys.nl.pgp.net:11371/pks/lookup?op=get&search=0xA8A32149 
-------------------------------------------