RE: [Zope] Advice on searching/indexing Word documents?
I've been thinking about doing this. I wonder if there are any C filter libraries that read word docs. The word 2000 docs are supposedly non-binary, so you could proabaly write a parser of sorts in python or C/Lex; I used to write text filters in C and Lex for my previous employer - one of these days I will figure out how to extend python with C and do this. I'm thinking about doing this type of thing in order to make PDFs searchable (as well as IPTC catopn data in JPG files). Perhaps in the mean time, one could set up a macro in normal.dat template file that ftps the doc to zope on every save and updates properties containing the full text for the document. Sort of kludgy, but I assume it would work, if you were familiar with VBA coding, and had access to a http client component. Doing it this way would make it so you would likely have to manually reindex the catalog. There might be a way around that though, to automate it... Sean ========================= Sean Upton Senior Programmer/Analyst SignOnSanDiego.com The San Diego Union-Tribune 619.718.5241 sean.upton@uniontrib.com ========================= -----Original Message----- From: Bowyer, Alex [mailto:BowyerA@logica.com] Sent: Tuesday, January 02, 2001 2:45 PM To: 'zope@zope.org' Subject: [Zope] Advice on searching/indexing Word documents? Our company has a repository of staff CVs (Resumes) as Word Documents and I am about to embark on creating a new feature for our Zope Intranet to allow project managers to search those documents for keywords such as particular skills or projects. I am thinking about several possibilities such as a skills/CVs database linked in via ODBC, or some task that converts the Word documents to text files which can then be searched by Zope (I think Zope can do this, and I assume it can't search Word format directly?). Has anyone ever approached a similar problem, does anyone have any tips on how to index/search a load of documents in Zope? Any tips/suggestions/comments would be most welcome. Thanks, Alex ================================== Alex Bowyer IT Consultant, Logica Australasia Tel : +61 2 9202 8130 Fax : +61 2 9922 7466 E-mail : bowyera@logica.com WWW : http://www.logica.com.au/ ================================== _______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
I used to write text filters in C and Lex for my previous employer - one of these days I will figure out how to extend python with C and do this.
Here's one that's written entirely in Python: http://www.cosc.canterbury.ac.nz/~greg/python/Plex/ I've seen a couple of other implementations out there. --jfarr
participants (2)
-
Jonothan Farr -
sean.upton@uniontrib.com