[Zope-dev] cataloging binary files (pdf's, word docs...)
Martijn Pieters
mj@digicool.com
Fri, 18 Feb 2000 04:06:06 -0500
From: Roman Milner [mailto:roman@speeder.com]
>
> I'm trying to come up with a way to catalog PDF's and Word docs. It
> is easy to write python methods to pull the text ouf of these. The
> problem is that we already have tons of them in our ZODB as file
> objects.
>
> The only thing I can think of is to make a zclass class for each type
> (ie. PDFFile type) that has a method that knows how to
> extract the text
> from the pdf and have zcatalog catalog that property. But this means
> re-creating all the binary files currenlty in our ZODB.
>
> Can any one offer any better suggestions? I could write a python
> method that extracted the text based on mime type but I can't go back
> and ad that method to each file object.
>
> Thanks for any help.
>
You could write an External Method, then acquire that method onto the
File object. Let's call it FileToText:
def FileToText(self):
# do watherever you want with self,
# it is the File object.
# return some text.
Then ad a textindex on FileToText, and you can start cataloguing your
binary File objects.
--
Martijn Pieters, Software Engineer
| Digital Creations http://www.digicool.com
| Creators of Zope http://www.zope.org
| mailto:mj@digicool.com ICQ: 4532236
| PGP:
http://wwwkeys.nl.pgp.net:11371/pks/lookup?op=get&search=0xA8A32149
-------------------------------------------