[Zope-CVS] CVS: Products/ZCTextIndex/help - Lexicon_Add.stx:1.1 ZCTextIndex_Add.stx:1.1
Casey Duncan
casey@zope.com
Tue, 4 Jun 2002 15:56:33 -0400
Update of /cvs-repository/Products/ZCTextIndex/help
In directory cvs.zope.org:/tmp/cvs-serv24572/help
Added Files:
Lexicon_Add.stx ZCTextIndex_Add.stx
Log Message:
Added online help for ZCTextIndex and Lexicon add forms
=== Added File Products/ZCTextIndex/help/Lexicon_Add.stx ===
ZCTextIndex Lexicon - Add: Create a new ZCTextIndex Lexicon
Description
This view allows you to create a new ZCTextIndex Lexicon object.
ZCTextIndex Lexicons store the words indexed by ZCTextIndexes in a
ZCatalog.
Controls
'Id' -- Allows you to specify the id of the ZCTextIndex Lexicon.
'Title' -- Allows you to specify the title of the ZCTextIndex Lexicon.
Pipeline Stages
The remaining controls allow you to select the desired processing
of text to index by selecting pipeline stages.
The default available stages are:
- **Word Splitter** This is the only mandatory stage. The word
splitter breaks the text up into a list of words. Included is a
simple whitespace splitter, and a splitter that removes HTML
tags. The HTML aware splitter gives best results when all of
the incoming content to index is HTML.
- **Stop Words** To conserve space in the vocabulary, and possibly increase
performance, you can select a stop word remover which subtracts
very common or single letter words from the Lexicon. Bear in
mind that you will not be able to search on removed stop words,
and they will also be removed from queries passed to search
ZCTextIndexes using the Lexicon.
- **Case Normalizer** The case normalizer removes case information from the words in
the Lexicon. If case-sensitive searching is desires, then omit
this element from the pipeline.
S
=== Added File Products/ZCTextIndex/help/ZCTextIndex_Add.stx ===
ZCTextIndex Add: Create a new ZCTextIndex
Description
A ZCTextIndex is an index for performing full text searches over
bodies of text. It includes the following features:
- Boolean query operators with parenthetical grouping
- Globbing (partial word) and phrase matching
- Two selectable relevance scoring algorithms
ZCTextIndex is designed as a replacement for standard TextIndex, and
has several advantages over it.
Controls
'Id' -- The id of the ZCTextIndex, must be unique for this ZCatalog.
'Field Name' -- The name of the field (object attribute) to be indexed.
'Ranking Strategy'
- **Okapi BM25 Rank** A relevance scoring technique that seems to
work well when the document text is considerably longer than the
query string, which is often the case with user specified query
strings.
- **Cosine Measure** A relevance scoring technique derived from the
"*Managing Gigabytes*":http://www.cs.mu.oz.au/mg/ book. It seems
to work best when the queries are similar in size and content to
the text they are searching.
'Lexicon' -- The ZCTextIndex Lexicon to be used by this ZCTextIndex.
Lexicons process and store the words from the text and
help in processing queries. You must define a ZCTextIndex
Lexicon before you can create a ZCTextIndex. Several
ZCTextIndexes can share the same Lexicon if desired.