Simon Coles writes:
We have binary files stored in Zope, for example Word documents (but could be any of a variety of document types).
We would like to be able to index and search the contents of these files using ZCatalog. So if a Word file contains the word "Fred", then any search for "Fred" would include that file in the list of documents returned.
I have done something similar. I created a ZClass subclassing CatalogAware and File. I added a property called text which is text indexed in a catalog. When a Word document is added, a method I created uses the wvHtml utility to convert the Word document to text and store it in the text property. It has kind of a kludgey implementation at the moment, mostly because I want to create a Python wrapper around the wv library, but it's documentation is sketchy and I have other priorities at the moment. It does work though and lets you search Word documents using a ZCatalog quite effectively (although it only works for Word docs). Check out wv at http://www.wvware.com/