[Zope] Knowledge Base type of product
Chris McDonough
chrism@zope.com
06 Jul 2002 11:57:44 -0400
Hi,
You should try Zope 2.6, which has ZCTextIndex, a much-improved text
index. Sounds like it would work very well for this application.
>From the ZCTextIndex readme:
- A new query language, supporting both explicit and implicit Boolean
operators, parentheses, globbing, and phrase searching. Apart from
explicit operators and globbing, the syntax is roughly the same as
that popularized by Google.
- A more refined scoring algorithm, resulting in better selectiveness:
it's much more likely that you'll find the document you are looking
for among the first few highest-ranked results.
- Actually, ZCTextIndex gives you a choice of two scoring algorithms
from recent literature: the Cosine ranking from the Managing
Gigabytes book, and Okapi from more recent research papers. Okapi
usually does better, so it is the default (but your milage may
vary).
- A redesigned Lexicon, using a pipeline architecture to split the
input text into words. This makes it possible to mix and match
pipeline components, e.g. you can choose between an HTML-aware
splitter and a plain text splitter, and additional components can be
added to the pipeline for case folding, stopword removal, and other
features. Enough example pipeline components are provided to get
you started, and it is very easy to write new components.
Performance is roughly the same as for TextIndex, and we're expecting
to make tweaks to the code that will make it faster.
(Try it out on this maillist archive: http://saints.homeunix.com:8080/)
On Sat, 2002-07-06 at 10:44, Hung Jung Lu wrote:
> Hi,
>
> I am looking for some product that can store a
> "knowledge database". Open-source or commercial (the
> cheaper, the better), Zope or otherwise.
>
> I simply need to store text files, and make them
> searchable. I know that ZCatalog can kind of do the
> job, I used it a few years ago, but back then the
> search features were kind of limited (for instance,
> two-word search was hard to implement, like when
> searching for "correlation matrix": you don't want
> files that contain "correlation" and/or "matrix", you
> want files that contains the two words consecutively.
> Also, back then, ZCatalog did not have "and", "or"
> logical operators.) I don't know whether it's been
> improved recently. (I know search engine is no easy
> matter.)
>
> Ideally the product should allow some sort of failure
> report (when some user looks up for certain keywords
> and couldn't find anything), and also some basic
> statistics, so that a human editor could improve the
> hit scores, say, once a day or once a week. Anyway, I
> am looking for something that is not 100% automated:
> it would be great if some human editor assistance can
> be incorporated to make the knowledge database's
> output more reasonable.
>
> I'd appreciate any pointers.
>
> regards,
>
> Hung Jung
>
>
> __________________________________________________
> Do You Yahoo!?
> Sign up for SBC Yahoo! Dial - First Month Free
> http://sbc.yahoo.com
>
>
> _______________________________________________
> Zope maillist - Zope@zope.org
> http://lists.zope.org/mailman/listinfo/zope
> ** No cross posts or HTML encoding! **
> (Related lists -
> http://lists.zope.org/mailman/listinfo/zope-announce
> http://lists.zope.org/mailman/listinfo/zope-dev )