performance of textindexng2 vs. zctextindex
I recently installed TextIndexNG2 2.1.1 on a system running Zope 2.7.6 on Fedora Core 3. I've been running some comparison tests with ZCTextIndex, which is what our site currently uses. We're indexing around 50,000 objects at the moment. For TextIndexNG2, this is the configuration: Indexed attributes keywordSearchSource Default encoding utf-8 Storage StandardStorage Stemmer english Splitter: casefolding enabled Splitter: index single characters disabled Splitter: max. length of splitted words 64 Splitter: separator characters .+-_@ Default query parser PyQueryParser Autoexpansion disabled Stopwords english Normalizer European Use converters disabled Near distance Left truncation disabled I've been struck that if the number of search hits is high, TextIndexNG2 is much slower than ZCTextIndex. For example, if I do a search on 'podcast' (our site deals w/ podcasting) I get about 14,000 hits. ZCTextIndex returns the results in about 0.1 seconds; TextIndexNG2 takes 31 seconds or 300 times longer. In general, the more hits there are, the bigger the difference between the two search indexes. TextIndexNG2 is great: it has many features that we really want and perhaps the cost of those features is performance vis-a-vis ZCTextIndex. But I'm hoping that maybe I've overlooked an obvious or not-so-obvious configuration issue that will enable me to speed up TextIndexNG2. Thanks for any advice. Francis Kelly www.loomia.com
--On 19. Juli 2005 17:15:25 -0700 Francis Kelly <zope@crubellier.com> wrote:
I recently installed TextIndexNG2 2.1.1
which is *pretty old*. Take a look at the v 2.2.0 which has been optimized over the time in different ways. Consider using StupidStorage as documented in the release notes.
I've been struck that if the number of search hits is high, TextIndexNG2 is much slower than ZCTextIndex. For example, if I do a search on 'podcast' (our site deals w/ podcasting) I get about 14,000 hits. ZCTextIndex returns the results in about 0.1 seconds; TextIndexNG2 takes 31 seconds or 300 times longer. In general, the more hits there are, the bigger the difference between the two search indexes.
Query speed depends on different things: the query, the implementation, the operations needed to be performed during the query. Because of some functionality TXNG needs to store much more information than ZCTextIndex. It did this as said above sometimes in a not so efficient way (see above). You might also look at TextIndexNG V3. -aj
participants (2)
-
Andreas Jung -
Francis Kelly