[Zope] Implementing full text search

Dieter Maurer dieter@handshake.de
Wed, 19 Jul 2000 23:23:54 +0200 (CEST)


Erich Seifert writes:
 > I tried to create a full text search for my existing site. All went fine:
 > I can search my whole site's document contents (via PrincipaSourceSearch)
 > and titles (using 'or' as in advanced zcatalog searching how-to).
 > 
 > The problem I have at the moment is that Zope only searches in unrendered
 > content and all html and dtml code is found also when searching.
 > How can I search my documents with all dtml-vars inserted and without all html
 > code or at least dtml code?
It is difficult to have the dtml-vars inserted, at least in the general
case.
Many DTML methods and documents need a context (client, namespace, REQUEST)
to be rendered. Without this context, currently no provided
by ZCatalog, rendering would throw exceptions and the
document would not be cataloged.

Probably, you can filter away all DTML and HTML tags.
You would give "DTMLMethod" a new method, say "filteredSource".
"filteredSource" could use sgmllib (or, more efficiently, sgmlop)
to just keep the pure text and resolve simple character entities.
Yes, it needs a Zope extension and it requires programming in
Python.


Dieter