From: Roman Milner [mailto:roman@speeder.com]
I'm trying to come up with a way to catalog PDF's and Word docs. It is easy to write python methods to pull the text ouf of these. The problem is that we already have tons of them in our ZODB as file objects.
The only thing I can think of is to make a zclass class for each type (ie. PDFFile type) that has a method that knows how to extract the text from the pdf and have zcatalog catalog that property. But this means re-creating all the binary files currenlty in our ZODB.
Can any one offer any better suggestions? I could write a python method that extracted the text based on mime type but I can't go back and ad that method to each file object.
Thanks for any help.
You could write an External Method, then acquire that method onto the File object. Let's call it FileToText: def FileToText(self): # do watherever you want with self, # it is the File object. # return some text. Then ad a textindex on FileToText, and you can start cataloguing your binary File objects. -- Martijn Pieters, Software Engineer | Digital Creations http://www.digicool.com | Creators of Zope http://www.zope.org | mailto:mj@digicool.com ICQ: 4532236 | PGP: http://wwwkeys.nl.pgp.net:11371/pks/lookup?op=get&search=0xA8A32149 -------------------------------------------