Knowledge Base type of product
Hi, I am looking for some product that can store a "knowledge database". Open-source or commercial (the cheaper, the better), Zope or otherwise. I simply need to store text files, and make them searchable. I know that ZCatalog can kind of do the job, I used it a few years ago, but back then the search features were kind of limited (for instance, two-word search was hard to implement, like when searching for "correlation matrix": you don't want files that contain "correlation" and/or "matrix", you want files that contains the two words consecutively. Also, back then, ZCatalog did not have "and", "or" logical operators.) I don't know whether it's been improved recently. (I know search engine is no easy matter.) Ideally the product should allow some sort of failure report (when some user looks up for certain keywords and couldn't find anything), and also some basic statistics, so that a human editor could improve the hit scores, say, once a day or once a week. Anyway, I am looking for something that is not 100% automated: it would be great if some human editor assistance can be incorporated to make the knowledge database's output more reasonable. I'd appreciate any pointers. regards, Hung Jung __________________________________________________ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com
Hi, You should try Zope 2.6, which has ZCTextIndex, a much-improved text index. Sounds like it would work very well for this application.
From the ZCTextIndex readme:
- A new query language, supporting both explicit and implicit Boolean operators, parentheses, globbing, and phrase searching. Apart from explicit operators and globbing, the syntax is roughly the same as that popularized by Google. - A more refined scoring algorithm, resulting in better selectiveness: it's much more likely that you'll find the document you are looking for among the first few highest-ranked results. - Actually, ZCTextIndex gives you a choice of two scoring algorithms from recent literature: the Cosine ranking from the Managing Gigabytes book, and Okapi from more recent research papers. Okapi usually does better, so it is the default (but your milage may vary). - A redesigned Lexicon, using a pipeline architecture to split the input text into words. This makes it possible to mix and match pipeline components, e.g. you can choose between an HTML-aware splitter and a plain text splitter, and additional components can be added to the pipeline for case folding, stopword removal, and other features. Enough example pipeline components are provided to get you started, and it is very easy to write new components. Performance is roughly the same as for TextIndex, and we're expecting to make tweaks to the code that will make it faster. (Try it out on this maillist archive: http://saints.homeunix.com:8080/) On Sat, 2002-07-06 at 10:44, Hung Jung Lu wrote:
Hi,
I am looking for some product that can store a "knowledge database". Open-source or commercial (the cheaper, the better), Zope or otherwise.
I simply need to store text files, and make them searchable. I know that ZCatalog can kind of do the job, I used it a few years ago, but back then the search features were kind of limited (for instance, two-word search was hard to implement, like when searching for "correlation matrix": you don't want files that contain "correlation" and/or "matrix", you want files that contains the two words consecutively. Also, back then, ZCatalog did not have "and", "or" logical operators.) I don't know whether it's been improved recently. (I know search engine is no easy matter.)
Ideally the product should allow some sort of failure report (when some user looks up for certain keywords and couldn't find anything), and also some basic statistics, so that a human editor could improve the hit scores, say, once a day or once a week. Anyway, I am looking for something that is not 100% automated: it would be great if some human editor assistance can be incorporated to make the knowledge database's output more reasonable.
I'd appreciate any pointers.
regards,
Hung Jung
__________________________________________________ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
participants (3)
-
Chris McDonough -
Dieter Maurer -
Hung Jung Lu