[ZWeb] REVISIT: Switch zope.org search to google

Andreas Jung andreas@andreas-jung.com
Wed, 16 Oct 2002 17:48:24 +0200


--On Mittwoch, 16. Oktober 2002 11:39 -0400 Shane Hathaway <shane@zope.com> 
wrote:

> Andreas Jung wrote:
>> On Wed, Oct 16, 2002 at 10:26:49AM -0400, Shane Hathaway wrote:
>>
>>> Olivier DECKMYN wrote:
>>>
>>>> How does ZCTextIndex compares to TextIndexNG ?
>>>>
>>>> ( http://www.zope.org/Members/ajung/TextIndexNG )
>>>
>>> IMHO ZCTextIndex is a text index that Just Works.  You don't have to
>>> learn much about it or fiddle with options.  TextIndexNG has a lot of
>>> features but doesn't match the simplicity and accuracy.
>>
>>                                                 ^^^^^^^^
>>                                               What do you mean here?
>>
>> TextIndexNG is a text index that also just works. It is currently
>> in used by some larger websites and I use it for a large internal
>> document database. So what are the differences?
>
> For a customer project we had to get Zope to generate good text ranking.
> TextIndexNG didn't quite cut it (sorry), so PythonLabs used some
> well-researched algorithms to create a new index from scratch.
> ZCTextIndex has none of the cruft from the old TextIndex, and it can be
> used independently of Zope or ZCatalog.

There is also no cruft in TextIndexNG. Although it was based on the old
TextIndex there is not much code of the original code. Relevance ranking
is important but since TextIndexNG supports very different options
like stemming, similarity search it is very hard to find a measure
for ranking. Word frequencies is fine for the most cases but it
does not cover all aspects TextIndexNG provides.
>
>> Stemming is the hardest part but all extensions are freely
>> available meanwhile as dedicated TextIndexNG-Extensions package
>> and published under the ZPL.
>
> Your work is definitely valuable, and converting your extensions to
> lexicon pipeline elements would be a big win.


You can easily put a Python wrapper around them. There is no need
to support pipeline API since the wrapper is just one or two lines
of Python code ;-)

Andreas

    ---------------------------------------------------------------------
   -    Andreas Jung                     http://www.andreas-jung.com   -
  -   EMail: andreas at andreas-jung.com                              -
   -            "Life is too short to (re)write parsers"               -
    ---------------------------------------------------------------------