[Checkins] SVN: zope2book/trunk/ Restify SearchingZCatalog chapter.
Tres Seaver
tseaver at palladion.com
Tue Feb 10 18:26:53 EST 2009
Log message for revision 96431:
Restify SearchingZCatalog chapter.
Changed:
D zope2book/trunk/SearchingZCatalog.stx
A zope2book/trunk/source/SearchingZCatalog.rst
U zope2book/trunk/source/index.rst
-=-
Deleted: zope2book/trunk/SearchingZCatalog.stx
===================================================================
--- zope2book/trunk/SearchingZCatalog.stx 2009-02-10 23:18:36 UTC (rev 96430)
+++ zope2book/trunk/SearchingZCatalog.stx 2009-02-10 23:26:53 UTC (rev 96431)
@@ -1,2189 +0,0 @@
-Searching and Categorizing Content
-
- The ZCatalog is Zope's built in search engine. It allows you to
- categorize and search all kinds of Zope objects. You can also use it
- to search external data such as relational data, files, and remote
- web pages. In addition to searching you can use the ZCatalog to
- organize collections of objects.
-
- The ZCatalog supports a rich query interface. You can perform full text
- searching, can search multiple indexes at once, and can even specify
- weighing for different fields in your results. In addition, the
- ZCatalog keeps track of meta-data about indexed objects.
-
- The two most common ZCatalog usage patterns are:
-
- Mass Cataloging -- Cataloging a large collection of objects all at once.
-
- Automatic Cataloging -- Cataloging objects as they are created and
- tracking changes made to them.
-
- Getting started with Mass Cataloging
-
- Let's take a look at how to use the ZCatalog to search documents.
- Cataloging a bunch of objects all at once is called *mass cataloging*.
- Mass cataloging involves four steps:
-
- o Creating a ZCatalog
-
- o Creating indexes
-
- o Finding objects and cataloging them
-
- o Creating a web interface to search the ZCatalog.
-
- Creating a ZCatalog
-
- Choose *ZCatalog* from the product add list to create a ZCatalog
- object within a subfolder named 'Zoo'. This takes you to the
- ZCatalog add form, as shown in the figure below.
-
- "ZCatalog add form":img:16-1:Figures/creatingzcatalog.png
-
- The Add form asks you for an *Id* and a *Title*. Give your
- ZCatalog the Id 'AnimalCatalog' and click *Add* to create your new
- ZCatalog. The ZCatalog icon looks like a folder with a small
- magnifying glass on it. Select the *AnimalCatalog* icon to see
- the *Contents* view of the ZCatalog.
-
- A ZCatalog looks a lot like a folder, but it has a few more
- tabs. Six tabs on the ZCatalog are the exact same six tabs you
- find on a standard folder. ZCatalog have the following views:
- *Contents*, *Catalog*, *Properties*, *Indexes*, *Metadata*,
- *Find Objects*, *Advanced*, *Undo*, *Security*, and *Ownership*.
- When you click on a ZCatalog, you are on the *Contents*
- view. Here, you can add new objects and the ZCatalog will
- contain them just as any folder does. Although a ZCatalog is
- like a normal Zope folder, this does not imply that the objects
- contained within it are automatically searchable. A ZCatalog
- can catalog objects at any level of your site, and it needs to
- be told exactly which ones to index.
-
- Creating Indexes
-
- In order to tell Zope what to catalog and where to store the
- information, we need to create a *Lexicon* and an *Index*. A
- *Lexicon* is necessary to provide word storage services for
- full-text searching, and an *Index* is the object which stores
- the data necessary to perform fast searching.
-
- In the contents view of the *AnimalCatalog* ZCatalog, choose
- *ZCTextIndex Lexicon*, and give it an id of *zooLexicon*
-
- "ZCTextIndex Lexicon add form":img:16-2:Figures/creatinglexicon.png
-
- Now we can create an index that will record the information we
- want to have in the ZCatalog. Click on the *Indexes* tab of the
- ZCatalog. A drop down menu lists the available indexes. Choose
- *ZCTextIndex*; in the add form fill in the id *zooTextIdx*.
- Fill in *PrincipiaSearchSource* in the "Field name" input. This
- tells the ZCTextIndex to index the body text of the DTML
- Documents (*PrincipiaSearchSource* is an API method of all DTML
- Document and Method objects). Note that *zooLexicon* is
- preselected in the *Lexicon* menu.
-
- "ZCTextIndex add form":img:16-3:Figures/creatingtextindex.png
-
- % Anonymous User - Feb. 10, 2004 12:43 pm:
- When you want the textindex to work on other types of objects, they have to provide a method named
- "PrincipiaSearchSource" which returns the data of the object which has to be searched.
-
- To keep this example short we will skip over some of the options
- presented here. In the section on indexes below, we will
- discuss this more thoroughly.
-
- Additionally, we will have to tell the ZCatalog which attributes
- of each cataloged object that it should store directly. These
- attributes are called *Metadata*, however they should not be
- confused with the idea of metadata in Zope CMF, Plone, or other
- content management systems--here, this just means that these are
- attributes that will be stored directly in the catalog for
- performance benefits. For now, just go to the
- *Metadata* tab of the ZCatalog and add *id* and *title*.
-
- Finding and Cataloging Objects
-
- Now that you have created a ZCatalog and an Index, you can move
- onto the next step: finding objects and cataloging them.
- Suppose you have a zoo site with information about animals. To
- work with these examples, create two DTML Documents along-side
- the *AnimalCatalog* object (within the same folder that contains
- the *AnimalCatalog* ZCatalog) that contain information about
- reptiles and amphibians.
-
- The first should have an Id of "chilean_frog", a title "Chilean
- four-eyed frog" and its body text should read something like
- this::
-
- The Chilean four-eyed frog has a bright
- pair of spots on its rump that look like enormous eyes. When
- seated, the frog's thighs conceal these eyespots. When
- predators approach, the frog lowers its head and lifts its
- rump, creating a much larger and more intimidating head.
- Frogs are amphibians.
-
- For the second, fill in an id of "carpet_python" and a title of
- "Carpet Python"; its body text could be::
-
- *Morelia spilotes variegata* averages 2.4 meters in length. It
- is a medium-sized python with black-to-gray patterns of
- blotches, crossbands, stripes, or a combination of these
- markings on a light yellowish-to-dark brown background. Snakes
- are reptiles.
-
- Visitors to your Zoo want to be able to search for information on
- the Zoo's animals. Eager herpetologists want to know if you have
- their favorite snake, so you should provide them with the ability
- to search for certain words and show all the documents that
- contain those words. Searching is one of the most useful and
- common web activities.
-
- The *AnimalCatalog* ZCatalog you created can catalog all of the
- documents in your Zope site and let your users search for specific
- words. To catalog your documents, go to the *AnimalCatalog*
- ZCatalog and click on the *Find Objects* tab.
-
- In this view, you tell the ZCatalog what kind of objects you are
- interested in. You want to catalog all DTML Documents so select
- *DTML Document* from the *Find objects of type* multiple selection
- and click *Find and Catalog*.
-
- The ZCatalog will now start from the folder where it is located
- and search for all DTML Documents. It will search the folder and
- then descend down into all of the sub-folders and their
- sub-folders. For example, if your ZCatalog is located at
- '/Zoo/AnimalCatalog', then the '/Zoo' folder and all its
- subfolders will get searched.
-
- If you have lots and lots of objects, this may take a long time
- to complete, so be patient.
-
- After a period of time, the ZCatalog will take you to the *Catalog*
- view automatically, with a status message telling you what it just
- did.
-
- Below the status information is a list of objects that are
- cataloged, they are all DTML Documents. To confirm that these are
- the objects you are interested in, you can click on them to visit
- them. Viewing an object in the catalog shows you what was indexed
- for the object, and what metadata items are stored for it.
-
- You have completed the first step of searching your objects,
- cataloging them into a ZCatalog. Now your documents are in the
- ZCatalog's database. Now you can move onto the fourth step,
- creating a web page and result form to query the ZCatalog.
-
- Search and Report Forms
-
- To create search and report forms, make sure you are inside the
- *AnimalCatalog* ZCatalog and select *Z Search Interface* from the
- add list. Select the *AnimalCatalog* ZCatalog as the searchable
- object, as shown in the figure below.
-
- "Creating a search form for a ZCatalog":img:16-4:Figures/creatingsearchinterface.png
-
- Name the *Report Id* "SearchResults", the *Search Input Id*
- "SearchForm", select "Generate Page Templates" and click *Add*.
- This will create two new Page Templates in the *AnimalCatalog*
- ZCatalog named *SeachForm* and *SearchResults*.
-
- These objects are *contained in* the ZCatalog, but they are not
- *cataloged by* the ZCatalog. The *AnimalCatalog* has only
- cataloged DTML Documents. The search Form and Report templates
- are just a user interface to search the animal documents in the
- ZCatalog. You can verify this by noting that the search and
- report forms are not listed in the *Cataloged Objects* tab.
-
- To search the *AnimalCatalog* ZCatalog, select the *SearchForm*
- template and click on its *Test* tab.
-
- By typing words into the *ZooTextIdx* form element you can
- search all of the documents cataloged by the *AnimalCatalog*
- ZCatalog. For example, type in the word "Reptiles". The
- *AnimalCatalog* ZCatalog will be searched and return a simple
- table of objects that have the word "Reptiles" in them. The
- search results should include the carpet python. You can also
- try specifying multiple search terms like "reptiles OR
- amphibians". Search results for this query should include both
- the Chilean four-eyed Frog and the carpet python.
- Congratulations, you have successfully created a ZCatalog,
- cataloged content into it and searched it through the web.
-
- Configuring ZCatalogs
-
- The ZCatalog is capable of much more powerful and complex searches
- than the one you just performed. Let's take a look at how the
- ZCatalog stores information. This will help you tailor your
- ZCatalogs to provide the sort of searching you want.
-
- Defining Indexes
-
- ZCatalogs store information about objects and their contents in
- fast databases called *indexes*. Indexes can store and retrieve
- large volumes of information very quickly. You can create
- different kinds of indexes that remember different kinds of
- information about your objects. For example, you could have one
- index that remembers the text content of DTML Documents, and
- another index that remembers any objects that have a specific
- property.
-
- When you search a ZCatalog you are not searching through your
- objects one by one. That would take far too much time if you had
- a lot of objects. Before you search a ZCatalog, it looks at
- your objects and remembers whatever you tell it to remember
- about them. This process is called *indexing*. From then on,
- you can search for certain criteria and the ZCatalog will return
- objects that match the criteria you provide.
-
- A good way to think of an index in a ZCatalog is just like an
- index in a book. For example, in a book's index you can look up
- the word *Python*::
-
- Python: 23, 67, 227
-
- The word *Python* appears on three pages. Zope indexes work
- like this except that they map the search term, in this case the
- word *Python*, to a list of all the objects that contain it,
- instead of a list of pages in a book.
-
- In Zope 2.6, indexes can be added and removed from a ZCatalog
- using the "pluggable" index interface as shown in the figure below:
-
- "Managing indexes":img:16-5:Figures/managingindexes.png
-
- Each index has a name, like *PrincipiaSearchSource*,
- and a type, like *ZCTextIndex*.
-
- When you catalog an object the ZCatalog uses each index to
- examine the object. The ZCatalog consults attributes and methods
- to find an object's value for each index. For example, in the
- case of the DTML Documents cataloged with a
- 'PrincipiaSearchSource' index, the ZCatalog calls each document's
- 'PrincipiaSearchSource' method and records the results in its
- 'PrincipiaSearchSource' index. If the ZCatalog cannot find an
- attribute or method for an index, then it ignores it. In other
- words it's fine if an object does not support a given
- index. There are eight kinds of indexes that come standard with
- Zope 2.6, and others that can be added. The standard eight are:
-
- ZCTextIndex -- Searches text. Use this kind of index when you
- want a full-text search.
-
- FieldIndex -- Searches objects for specific values. Use this
- kind of index when you want to search objects, numbers, or
- specific strings.
-
- KeywordIndex -- Searches collections of specific values. This
- index is like a FieldIndex, but it allows you to search
- collections rather than single values.
-
- PathIndex -- Searches for all objects that contain certain URL
- path elements. For example, you could search for all the
- objects whose paths begin with '/Zoo/Animals'.
-
- TopicIndex -- Searches among FilteredSets; each set contains
- the document IDs of documents which match the set's filter
- expression. Use this kind of index to optimize
- frequently-accessed searches.
-
- DateIndex -- A subclass of FieldIndex, optimized for date-time
- values. Use this index for any field known to be a date or a
- date-time.
-
- DateRangeIndex -- Searches objects based on a pair of dates /
- date-times. Use this index to search for objects which are
- "current" or "in effect" at a given time.
-
- TextIndex -- Old version of a full-text index. Only provided
- for backward compatibility, use ZCTextIndex instead.
-
- We'll examine these different indexes more closely later in the
- chapter. New indexes can be created from the *Indexes* view of a
- ZCatalog. There, you can enter the *name* and select a *type*
- for your new index. This creates a new empty index in the
- ZCatalog. To populate this index with information, you need to
- go to the *Advanced* view and click the the *Update Catalog*
- button. Recataloging your content may take a while if you have
- lots of cataloged objects. For a ZCTextIndex, you will also
- need a *ZCTextIndex Lexicon* object in your ZCatalog - see below
- for details.
-
- To remove an index from a ZCatalog, select the Indexes and click
- on the *Delete* button. This will delete the index and all of
- its indexed content. As usual, this operation is undoable.
-
- Defining Meta Data
-
- The ZCatalog can not only index information about your object,
- but it can also store information about your object in a
- *tabular database* called the *Metadata Table*. The *Metadata
- Table* works similarly to a relational database table, it
- consists of one or more *columns* that define the *schema* of
- the table. The table is filled with *rows* of information about
- cataloged objects. These rows can contain information about
- cataloged objects that you want to store in the table. Your meta
- data columns don't need to match your ZCatalog's indexes. Indexes
- allow you to search; meta-data allows you to report search
- results.
-
- The Metadata Table is useful for generating search reports. It
- keeps track of information about objects that goes on your
- report forms. For example, if you create a Metadata Table
- column called *Title*, then your report forms can use this
- information to show the titles of your objects that are returned
- in search results instead of requiring that you actually obtain
- the object to show its title.
-
- To add a new Metadata Table column, type in the name of the column
- on the *Metadata Table* view and click *Add*. To remove a column
- from the Metadata Table, select the column check box and click on
- the *Delete* button. This will delete the column and all of its
- content for each row. As usual, this operation is undoable. Next
- let's look more closely at how to search a ZCatalog.
-
- While metadata columns are useful, there are performance tradeoffs
- from using too many. As more metadata columns are added, the
- catalog itself becomes larger (and slower), and getting the
- result objects becomes more memory- and performance-intensive.
- Therefore, you should choose metadata columns only for those
- fields that you'll want to show on common search results.
- Consider carefully before adding a field that returns a large
- result (like the full text of a document) to metadata.
-
- Searching ZCatalogs
-
- You can search a ZCatalog by passing it search terms. These search
- terms describe what you are looking for in one or more indexes. The
- ZCatalog can glean this information from the web request, or you
- can pass this information explicitly from DTML or Python. In
- response to a search request, a ZCatalog will return a list of
- records corresponding to the cataloged objects that match the
- search terms.
-
- Searching with Forms
-
- In this chapter you used the *Z Search Interface* to
- automatically build a Form/Action pair to query a ZCatalog (the
- Form/Action pattern is discussed in the chapter entitled
- "Advanced Page Templates":AdvZPT.stx ). The *Z Search
- Interface* builds a very simple form and a very simple
- report. These two methods are a good place to start
- understanding how ZCatalogs are queried and how you can
- customize and extend your search interface.
-
- Suppose you have a ZCatalog that holds news items named
- 'NewsCatalog'. Each news item has 'content', an 'author' and a
- 'date' attribute. Your ZCatalog has three indexes that
- correspond to these attributes, namely "contentTextIdx",
- "author" and "date". The contents index is a ZCTextIndex, and
- the author and date indexes are a FieldIndex and a DateIndex.
- For the ZCTextIndex you will need a ZCTextIndexLexicon, and to
- display the search results in the 'Report' template, you should
- add the 'author', 'date' and 'absolute_url' attributes as
- Metadata. Here is a search form that would allow you to query
- such a ZCatalog::
-
- <html><body>
- <form action="Report" method="get">
- <h2 tal:content="template/title_or_id">Title</h2>
- Enter query parameters:<br><table>
- <tr><th>Author</th>
- <td><input name="author" width=30 value=""></td></tr>
- <tr><th>Content</th>
- <td><input name="contentTextIdx" width=30 value=""></td></tr>
- <tr><th>Date</th>
- <td><input name="date" width=30 value=""></td></tr>
- <tr><td colspan=2 align=center>
- <input type="SUBMIT" name="SUBMIT" value="Submit Query">
- </td></tr>
- </table>
- </form>
- </body></html>
-
- This form consists of three input boxes named 'contentTextIdx',
- 'author', and 'date'. These names must match the names of the
- ZCatalog's indexes for the ZCatalog to find the search terms.
- Here is a report form that works with the search form::
-
- <html>
- <body tal:define="searchResults here/NewsCatalog;">
- <table border>
- <tr>
- <th>Item no.</th>
- <th>Author</th>
- <th>Absolute url</th>
- <th>Date</th>
- </tr>
- <div tal:repeat="item searchResults">
- <tr>
- <td>
- <a href="link to object" tal:attributes="href item/absolute_url">
- #<span tal:replace="repeat/item/number">
- search item number goes here
- </span>
- </a>
- </td>
- <td><span tal:replace="item/author">author goes here</span></td>
- <td><span tal:replace="item/date">date goes here</span></td>
- </tr>
- </div>
- </table>
- </body></html>
-
- There are a few things going on here which merit closer
- examination. The heart of the whole thing is in the definition
- of the 'searchResults' variable::
-
- <body tal:define="searchResults here/NewsCatalog;">
-
- This calls the 'NewsCatalog' ZCatalog. Notice how the form
- parameters from the search form ( 'contentTextIdx' ,
- 'author', 'date' ) are not mentioned here at all.
- Zope automatically makes sure that the query parameters from the
- search form are given to the ZCatalog. All you have to do is
- make sure the report form calls the ZCatalog. Zope locates the
- search terms in the web request and passes them to the ZCatalog.
-
- The ZCatalog returns a sequence of *Record Objects* (just like
- ZSQL Methods). These record objects correspond to *search
- hits*, which are objects that match the search criteria you
- typed in. For a record to match a search, it must match all
- criteria for each specified index. So if you enter an author and
- some search terms for the contents, the ZCatalog will only return
- records that match both the author and the contents.
-
- ZSQL Record objects have an attribute for every column in the
- database table. Record objects for ZCatalogs work very
- similarly, except that a ZCatalog Record object has an attribute
- for every column in the Metadata Table. In fact, the purpose of
- the Metadata Table is to define the schema for the Record
- objects that ZCatalog queries return.
-
- Searching from Python
-
- Page Templates make querying a ZCatalog from a form very simple.
- For the most part, Page Templates will automatically make sure
- your search parameters are passed properly to the ZCatalog.
-
- Sometimes though you may not want to search a ZCatalog from a web
- form; some other part of your application may want to query a
- ZCatalog. For example, suppose you want to add a sidebar to the
- Zope Zoo that shows news items that only relate to the animals
- in the section of the site that you are currently looking at.
- As you've seen, the Zope Zoo site is built up from Folders that
- organize all the sections according to animal. Each Folder's id
- is a name that specifies the group or animal the folder
- contains. Suppose you want your sidebar to show you all the
- news items that contain the id of the current section. Here is
- a Script called 'relevantSectionNews' that queries the news
- ZCatalog with the currentfolder's id::
-
- ## Script (Python) "relevantSectionNews"
- ##
- """ Returns news relevant to the current folder's id """
- id=context.getId()
- return context.NewsCatalog({'contentTextIdx' : id})
-
- This script queries the 'NewsCatalog' by calling it like a
- method. ZCatalogs expect a *mapping* as the first argument when
- they are called. The argument maps the name of an index to the
- search terms you are looking for. In this case, the
- 'contentTextIdx' index will be queried for all news items that
- contain the name of the current Folder. To use this in your
- sidebar place you could insert this snippet where appropriate in
- the main ZopeZoo Page Template::
-
- ...
- <ul>
- <li tal:repeat="item here/relevantSectionNews">
- <a href="news link" tal:attributes="href item/absolute_url">
- <span tal:replace="item/title">news title</span>
- </a>
- </li>
- </ul>
- ...
-
- This template assumes that you have defined 'absolute_url' and
- 'title' as Metadata columns in the 'NewsCatalog'. Now, when you
- are in a particular section, the sidebar will show a simple list
- of links to news items that contain the id of the current animal
- section you are viewing. (Note: in reality, you shouldn't use
- an index called 'absolute_url', but should rely instead on the
- getURL() method call below, as that works even in virtual hosting
- settings.
-
- Methods of Search Results
-
- The list of results you get for a catalog search is actually
- a list of Catalog Brain objects. In addition to having an
- attribute for each item of your metadata, they also have
- several useful methods:
-
- has_key(key)
- Returns true if the result object has a meta-data element
- named key.
-
- getPath()
- Returns the physical path of the result object. This can be
- used to uniquely identify each object if some kind of
- post-processing is performed.
-
- getURL()
- Returns the URL of the result object. You should use this
- instead of creating a metadata element for 'absolute_url',
- This can differ from getPath() if you are using virtual hosting.
-
- getObject()
- Returns the actual zope object from the result object. This
- is useful if you want to examine or show an attribute or
- method of the object that isn't in the metadata--once we have
- the actual object, we can get any normal attribute or method
- of it. However, be careful not to use this instead of defining
- metadata. Metadata, being stored in the catalog, is
- pre-calculated and quickly accessed; getting the same type of
- information by using 'getObject().attribute_name' requires
- actually pulling your real object from the ZODB and may be
- a good deal slower. On the other hand, stuffing everything
- you might ever need into metadata will slow down all querying
- of your catalog, so you'll want to strike a balance. A good
- idea is to list in metadata those things that would normally
- appear on a tabular search results form; other things that
- might be needed less commonly (and for fewer result objects
- at a time) can be retried with getObject.
-
- getRID()
- Returns the Catalog's record id for the result object. This
- is an implementation detail, and is not useful except for
- advanced uses.
-
- Searching and Indexing Details
-
- Earlier you saw that the ZCatalog includes eight types of
- indexes. Let's examine these indexes more closely, and look
- at some of the additional available indexes, to understand
- what they are good for and how to search them.
-
- Searching ZCTextIndexes
-
- A ZCTextIndex is used to index text. After indexing, you can
- search the index for objects that contain certain words.
- ZCTextIndexes support a rich search grammar for doing more
- advanced searches than just looking for a word.
-
- Boolean expressions
-
- Search for Boolean expressions
- like::
-
- word1 AND word2
-
- This will search for all objects that contain *both* "word1"
- and "word2". Valid Boolean operators include AND, OR, and
- NOT. A synonym for NOT is a leading hyphen::
-
- word1 -word2
-
- which would search for occurences of "word1" but would
- exclude documents which contain "word2". A sequence of words
- without operators implies AND. A search for "carpet python
- snakes" translates to "carpet AND python AND snakes".
-
- Parentheses
-
- Control search order with parenthetical
- expressions::
-
- (word1 AND word2) OR word3)
-
- This will return objects containing "word1" and "word2" *or*
- just objects that contain the term "word3".
-
- Wild cards
-
- Search for wild cards
- like::
-
- Z*
-
- which returns all words that begin with "Z",
- or::
-
- Zop?
-
- which returns all words that begin with "Zop" and have one
- more character - just like in a Un*x shell. Note though that
- wild cards cannot be at the beginning of a search phrase.
- "?ope" is an illegal search term and will be ignored.
-
- Phrase search
-
- Double-quoted text implies phrase search,
- for example::
-
- "carpet python" OR frogs
-
- will search for all occurences of the phrase "carpet python"
- or of the word "frogs"
-
- All of these advanced features can be mixed together. For
- example::
-
- ((bob AND uncle) AND NOT Zoo*)
-
- will return all objects that contain the terms "bob" and "uncle"
- but will not include any objects that contain words that start
- with "Zoo" like "Zoologist", "Zoology", or "Zoo" itself.
-
- Similarly, a search
- for::
-
- snakes OR frogs -"carpet python"
-
- will return all objects which contain the word "snakes" or
- "frogs" but do not contain the phrase "carpet python".
-
- Querying a ZCTextIndex with these advanced features works just
- like querying it with the original simple features. In the HTML
- search form for DTML Documents, for example, you could enter
- "Koala AND Lion" and get all documents about Koalas and Lions.
- Querying a ZCTextIndex from Python with advanced features works
- much the same; suppose you want to change your
- 'relevantSectionNews' Script to not include any news items that
- contain the word "catastrophic"::
-
- ## Script (Python) "relevantSectionNews"
- ##
- """ Returns relevant, non-catastropic news """"
- id=context.getId()
- return context.NewsCatalog(
- {'contentTextIdx' : id + ' -catastrophic'}
- )
-
- % guillaume_benoit - Mar. 4, 2004 8:40 am:
- Little error:
- """ Returns relevant, non-catastropic news """"
- ... should be ...
- """ Returns relevant, non-catastropic news """
-
- ZCTextIndexes are very powerful. When mixed with the Automatic
- Cataloging pattern described later in the chapter, they give you
- the ability to automatically full-text search all of your
- objects as you create and edit them.
-
- In addition, below, we'll talk about TextIndexNG indexes, which
- are a competing index type that can be added to Zope, and offers
- even more additional features for full-text indexing.
-
- Lexicons
-
- Lexicons are used by ZCTextIndexes. Lexicons process and store
- the words from the text and help in processing queries.
-
- Lexicons can:
-
- Normalize Case -- Often you want search terms to be case
- insensitive, eg. a search for "python", "Python" and "pYTHON"
- should return the same results. The lexicons' *Case
- Normalizer* does exactly that.
-
- Remove stop words -- Stop words are words that are very common
- in a given language and should be removed from the index. They
- would only cause bloat in the index and add little
- information. In addition, stop words, being common words,
- would appear in almost every page, without this option turned
- on, a user searching for "the python house" would get back
- practically every single document on the site (since they would
- all likely contain "the"), taking longer and adding no quality
- to their results.
-
- Split text into words -- A splitter parses text into words.
- Different texts have different needs of word splitting - if you
- are going to process HTML documents, you might want to use the
- HTML aware splitter which effectively removes HTML tags. On
- the other hand, if you are going to index plain text documents
- *about* HTML, you don't want to remove HTML tags - people might
- want to look them up. Also, an eg. chinese language document has a
- different concept of words and you might want to use a
- different splitter.
-
- The Lexicon uses a pipeline architecture. This makes it possible
- to mix and match pipeline components. For instance, you could
- implement a different splitting strategy for your language and
- use this pipeline element in conjunction with the standard text
- processing elements. Implementing a pipeline element is out of
- the scope of this book; for examples of implementing and
- registering a pipeline element see
- eg. 'lib/python/Products/ZCTextIndex/Lexicon.py'. A pipeline
- element should conform to the 'IPipelineElement' interface.
-
- To create a ZCTextIndex, you first have to create a Lexicon
- object. Multiple ZCTextIndexes can share the same lexicon.
-
- Searching Field Indexes
-
- *FieldIndexes* have a different aims than ZCTextIndexes. A ZCTextIndex
- will treat the value it finds in your object, for example the
- contents of a News Item, like text. This means that it breaks
- the text up into words and indexes all the individual words.
-
- A FieldIndex does not break up the value it finds. Instead, it
- indexes the entire value it finds. This is very useful for
- tracking object attributes that contain simple values, such as
- numbers or short string identifiers.
-
- In the news item example, you created a FieldIndex
- 'author'. With the existing search form, this field is
- not very useful. Unless you know exactly the name of the author
- you are looking for, you will not get any results. It would be
- better to be able to select from a list of all the *unique*
- authors indexed by the author index.
-
- There is a special method on the ZCatalog that does exactly this
- called 'uniqueValuesFor'. The 'uniqueValuesFor' method returns
- a list of unique values for a certain index. Let's change your
- search form and replace the original 'author' input box
- with something a little more useful::
-
- <html><body>
- <form action="Report" method="get">
- <h2 tal:content="template/title_or_id">Title</h2>
- Enter query parameters:<br><table>
- <tr><th>Author</th>
- <td>
- <select name="author:list" size="6" multiple>
- <option
- tal:repeat="item python:here.NewsCatalog.uniqueValuesFor('author')"
- tal:content="item"
- value="opt value">
- </option>
- </select>
- </td></tr>
- <tr><th>Content</th>
- <td><input name="content_index" width=30 value=""></td></tr>
- <tr><th>Date</th>
- <td><input name="date_index" width=30 value=""></td></tr>
- <tr><td colspan=2 align=center>
- <input type="SUBMIT" name="SUBMIT" value="Submit Query">
- </td></tr>
- </table>
- </form>
- </body></html>
-
- The new, important bit of code added to the search form
- is::
-
- <select name="author:list" size="6" multiple>
- <option
- tal:repeat="item python:here.NewsCatalog.uniqueValuesFor('author')"
- tal:content="item"
- value="opt value">
- </option>
- </select>
-
- In this example, you are changing the form element 'author' from
- just a simple text box to an HTML multiple select box. This box
- contains a unique list of all the authors that are indexed in
- the 'author' FieldIndex. When the form gets submitted, the
- select box will contain the exact value of an authors name, and
- thus match against one or more of the news objects. Your search
- form should look now like the figure below.
-
- "Range searching and unique Authors":img:16-6:Figures/uniqueauthorsform.png
-
- Be careful if you catalog objects with many different values; you
- can easily end up with a form with a thousand items in the drop-down
- menu. Also, items must match *exactly*, so strings that differ
- in capitalization will be considered different.
-
- That's it. You can continue to extend this search form using HTML
- form elements to be as complex as you'd like. In the next section,
- we'll show you how to use the next kind of index, keyword indexes.
-
- Searching Keyword Indexes
-
- A *KeywordIndex* indexes a sequence of keywords for objects and
- can be queried for any objects that have one or more of those
- keywords.
-
- Suppose that you have a number of Image objects that have a
- 'keywords' property. The 'keywords' property is a lines property
- that lists the relevant keywords for a given Image, for example,
- "Portraits", "19th Century", and "Women" for a picture of Queen
- Victoria.
-
- The keywords provide a way of categorizing Images. Each Image can
- belong in one or more categories depending on its 'keywords'
- property. For example, the portrait of Queen Victoria belongs to
- three categories and can thus be found by searching for any of the
- three terms.
-
- You can use a *Keyword* index to search the 'keywords' property. Define
- a *Keyword* index with the name 'keywords' on your ZCatalog. Then
- catalog your Images. Now you should be able to find all the Images
- that are portraits by creating a search form and searching for
- "Portraits" in the 'keywords' field. You can also find all pictures
- that represent 19th Century subjects by searching for "19th
- Century".
-
- % Anonymous User - Dec. 7, 2003 6:11 am:
- ops, it was "you cannot retrieve".
- there's a bug in pluginindexes, in common/util.py. When you have a record with an empty value, the variable
- key is not None as when you've an empty string. I was unable to delete the entry in the REQUEST, so I've
- patched util.py to send back None if keys is empty, when it is a record.
-
- It's important to realize that the same Image can be in more
- than one category. This gives you much more flexibility in
- searching and categorizing your objects than you get with a
- FieldIndex. Using a FieldIndex your portrait of Queen Victoria
- can only be categorized one way. Using a KeywordIndex it can be
- categorized a couple different ways.
-
- % Anonymous User - Dec. 6, 2003 5:51 am:
- problem: you can retreive all the records if you leave keywords blank (eg searching also with other indexes
- and not choose any keyword)
-
- Often you will use a small list of terms with KeywordIndexes.
- In this case you may want to use the 'uniqueValuesFor' method to
- create a custom search form. For example here's a snippet of a
- Page Template that will create a multiple select box for all the
- values in the 'keywords' index::
-
- <select name="keywords:list" multiple>
- <option
- tal:repeat="item python:here.uniqueValuesFor('keywords')"
- tal:content="item">
- opt value goes here
- </option>
- </select>
-
- % Anonymous User - Nov. 25, 2003 7:49 am:
- add
- <input type="hidden" name="keywords.operator:record" value="and">
- and change
- <select name="keywords:list" multiple>
- to
- <select name="keywords.query:list:record" multiple>
- to get as search-result only entries containing _all_ selected keywords.
-
- % Anonymous User - Dec. 6, 2003 7:00 am:
- what about if keyword is empty? eg <input name=keywords> and you leave it blank. It doesn' return anything
- instead of all. This is a problem.
-
- Using this search form you can provide users with a range of
- valid search terms. You can select as many keywords as you want and
- Zope will find all the Images that match one or more of your
- selected keywords. Not only can each object have several indexed
- terms, but you can provide several search terms and find all
- objects that have one or more of those values.
-
- Searching Path Indexes
-
- Path indexes allow you to search for objects based on their
- location in Zope. Suppose you have an object whose path is
- '/zoo/animals/Africa/tiger.doc'. You can find this object with
- the path queries: '/zoo', or '/zoo/animals', or
- '/zoo/animals/Africa'. In other words, a path index allows you
- to find objects within a given folder (and below).
-
- If you place related objects within the same folders, you can
- use path indexes to quickly locate these objects. For example::
-
- <h2>Lizard Pictures</h2>
- <p tal:repeat="item
- python:here.AnimalCatalog(pathindex='/Zoo/Lizards',
- meta_type='Image')">
- <a href="url" tal:attributes="href item/getURL" tal:content="item/title">
- document title
- </a>
- </p>
-
- This query searches a ZCatalog for all images that are located
- within the '/Zoo/Lizards' folder and below. It creates a link to
- each image. To make this work, you will have to create a
- FieldIndex 'meta_type' and a Metadata entries for 'title'.
-
- Depending on how you choose to arrange objects in your site, you
- may find that a path indexes are more or less effective. If you
- locate objects without regard to their subject (for example, if
- objects are mostly located in user "home" folders) then path
- indexes may be of limited value. In these cases, key word and
- field indexes will be more useful.
-
- Searching DateIndexes
-
- DateIndexes work like FieldIndexes, but are optimised for
- DateTime values. To minimize resource usage, DateIndexes have a
- resolution of one minute, which is considerably lower than the
- resolution of DateTime values.
-
- DateIndexes are used just like FieldIndexes; below in the
- section on "Advanced Searching with Records" we present an
- example of searching them.
-
- Searching DateRangeIndexes
-
- DateRangeIndexes are specialised for searching for ranges of
- DateTime values. An example application would be NewsItems
- which have two DateTime attributes 'effective' and 'expiration',
- and which should only be published if the current date would
- fall somewhere in between these two date values. Like
- DateIndexes, DateRangeIndexes have a resolution of one minute.
-
- DateRangeIndexes are widely used in CMF and Plone, where
- content is compared to an effective date and an expiration
- date.
-
- DateRangeIndexes also allow one or both of the boundary dates of
- the indexed objects to be left open which greatly simplifies
- application logic when querying for "active" content where expiration
- and effective dates are optional.
-
- Searching TopicIndexes
-
- A TopicIndex is a container for so-called FilteredSets. A
- FilteredSet consists of an expression and a set of internal
- ZCatalog document identifiers that represent a pre-calculated
- result list for performance reasons. Instead of executing the
- same query on a ZCatalog multiple times it is much faster to use
- a TopicIndex instead.
-
- TopicIndexes are also useful for indexing boolean attributes or
- attributes where only one value is queried for. They can do this more
- efficiently then a field index.
-
- Building up FilteredSets happens on the fly when objects are
- cataloged and uncatalogued. Every indexed object is evaluated
- against the expressions of every FilteredSet. An object is added
- to a FilteredSet if the expression with the object evaluates to
- True. Uncatalogued objects are removed from the FilteredSet.
-
- A built-in type of FilteredSet is the PythonFilteredSet - it
- would be possible to construct custom types though.
-
- A PythonFilteredSet evaluates using the eval() function inside the
- context of the FilteredSet class. The object to be indexes must
- be referenced inside the expression using "o.". Below are some
- examples of expressions.
-
- This would index all DTML
- Methods::
-
- o.meta_type=='DTML Method'
-
- This would index all folderish objects which have a non-empty
- title::
-
- o.isPrincipiaFolderish and o.title
-
- Querying of TopicIndexes is done much in the same way as with
- other Indexes. Eg., if we named the last FilteredSet above
- 'folders_with_titles', we could query our TopicIndex with a
- Python snippet like::
-
- zcat = context.AnimalCatalog
- results = zcat(topicindex='folders_with_titles')
-
- Provided our 'AnimalCatalog' contains a TopicIndex 'topicindex',
- this would return all folderish objects in 'AnimalCatalog' which
- had a non-empty title.
-
- TopicIndexes also support the 'operator' parameter with Records.
- More on Records below.
-
- Advanced Searching with Records
-
- A more advanced feature is the ability to query indexes more
- precisely using record objects. Record objects contain
- information about how to query an index. Records are Python
- objects with attributes, or mappings. Different indexes support
- different record attributes.
-
- Note that you don't have to use record-style queries unless you
- need the features introduced by them: you can continue to use
- traditional queries, as demonstrated above.
-
- A record style query involves passing a record (or dictionary)
- to the catalog instead of a simple query string.
-
- Keyword Index Record Attributes
-
- 'query' -- Either a sequence of words or a single word.
- (mandatory)
-
- 'operator' -- Specifies whether all keywords or only one need
- to match. Allowed values: 'and', 'or'. (optional, default:
- 'or')
-
- For example::
-
- # big or shiny
- results=ZCatalog(categories=['big, 'shiny'])
-
- # big and shiny
- results=ZCatalog(categories={'query':['big','shiny'],
- 'operator':'and'})
-
- The second query matches objects that have both the keywords
- "big" and "shiny". Without using the record syntax you can
- only match objects that are big or shiny.
-
- FieldIndex Record Attributes
-
- 'query' -- Either a sequence of objects or a single value to be
- passed as query to the index (mandatory)
-
- 'range' -- Defines a range search on a Field Index (optional,
- default: not set).
-
- Allowed values:
-
- 'min' -- Searches for all objects with values larger than
- the minimum of the values passed in the 'query' parameter.
-
- 'max' -- Searches for all objects with values smaller than
- the maximum of the values passed in the 'query' parameter.
-
- 'minmax' -- Searches for all objects with values smaller
- than the maximum of the values passed in the 'query'
- parameter and larger than the minimum of the values passwd
- in the 'query' parameter.
-
- For example, here is a PythonScript snippet using a range
- search::
-
- # animals with population count greater than 5
- zcat = context.AnimalCatalog
- results=zcat(population_count={
- 'query' : 5,
- 'range': 'min'}
- )
-
- This query matches all objects in the AnimalCatalog which have a
- population count greater than 5 (provided that there is a
- FieldIndex 'population_count' and an attribute 'population_count'
- present).
-
- Or::
-
- # animals with population count between 5 and 10
- zcat = context.AnimalCatalog
- results=zcat(population_count={
- 'query': [ 5, 10 ],
- 'range': 'minmax'}
- )
-
- This query mathches all animals with population count
- between 5 and 10 (provided that the same FieldIndex
- 'population_count' indexing the attribute 'population_count'.)
-
- Path Index Record Attributes
-
- 'query' -- Path to search for either as a string
- (e.g. "/Zoo/Birds") or list (e.g. ["Zoo",
- "Birds"]). (mandatory)
-
- 'level' -- The path level to begin searching at. Level defaults
- to 0, which means searching from the root. A level of -1 means
- start from anywhere in the path.
-
- Suppose you have a collection of objects with these paths:
-
- 1. '/aa/bb/aa'
-
- 2. '/aa/bb/bb'
-
- 3. '/aa/bb/cc'
-
- 4. '/bb/bb/aa'
-
- 5. '/bb/bb/bb'
-
- 6. '/bb/bb/cc'
-
- 7. '/cc/bb/aa'
-
- 8. '/cc/bb/bb'
-
- 9. '/cc/bb/cc'
-
- Here are some examples queries and their results to show how the
- 'level' attribute works:
-
- 'query="/aa/bb", level=0' -- This gives the same behaviour as our
- previous examples, ie. searching absolute from the root, and
- results in:
-
- * '/aa/bb/aa'
-
- * '/aa/bb/bb'
-
- * '/aa/bb/cc'
-
- 'query="/bb/bb", level=0' -- Again, this returns the
- default:
-
- * '/bb/bb/aa'
-
- * '/bb/bb/bb'
-
- * '/bb/bb/cc'
-
- 'query="/bb/bb", level=1' -- This searches for all objects which
- have '/bb/bb' one level down from the root:
-
- * '/aa/bb/bb'
-
- * '/bb/bb/bb'
-
- * '/cc/bb/bb'
-
- 'query="/bb/bb", level=-1' -- Gives all objects which have
- '/bb/bb' anywhere in their path:
-
- * '/aa/bb/bb'
-
- * '/bb/bb/aa'
-
- * '/bb/bb/bb'
-
- * '/bb/bb/cc'
-
- * '/cc/bb/bb'
-
- 'query="/xx", level=-1' -- Returns None
-
- You can use the level attribute to flexibly search different
- parts of the path.
-
- As of Zope 2.4.1, you can also include level information in a
- search without using a record. Simply use a tuple containing the
- query and the level. Here's an example tuple: '("/aa/bb", 1)'.
-
- DateIndex Record Attributes
-
- The supported Record Attributes are the same as those of the
- FieldIndex:
-
- 'query' -- Either a sequence of objects or a single value to be
- passed as query to the index (mandatory)
-
- 'range' -- Defines a range search on a DateIndex (optional,
- default: not set).
-
- Allowed values:
-
- 'min' -- Searches for all objects with values larger than
- the minimum of the values passed in the 'query' parameter.
-
- 'max' -- Searches for all objects with values smaller than
- the maximum of the values passed in the 'query' parameter.
-
- 'minmax' -- Searches for all objects with values smaller
- than the maximum of the values passed in the 'query'
- parameter and larger than the minimum of the values passwd
- in the 'query' parameter.
-
- As an example, we go back to the NewsItems we created in the
- Section *Searching with Forms*. For this example, we created
- news items with attributes 'content', 'author', and 'date'.
- Additionally, we created a search form and a report template for
- viewing search results.
-
- Searching for dates of NewsItems was not very comfortable
- though - we had to type in exact dates to match a document.
-
- With a 'range' query we are now able to search for ranges of
- dates. Take a look at this PythonScript snippet::
-
- # return NewsItems newer than a week
- zcat = context.NewsCatalog
- results = zcat( date={'query' : context.ZopeTime() - 7,
- 'range' : 'min'
- })
-
- DateRangeIndex Record Attributes
-
- DateRangeIndexes only support the 'query' attribute on Record
- objects. The 'query' attribute results in the same
- functionality as querying directly; returning matches where
- the date supplied to the query falls between the start and
- end dates from the indexed object.
-
- TopicIndex Record Attributes
-
- Like KeywordIndexes, TopicIndexes support the 'operator'
- attribute:
-
- 'operator' -- Specifies whether all FieldSets or only one need
- to match. Allowed values: 'and', 'or'. (optional, default:
- 'or')
-
- ZCTextIndex Record Attributes
-
- Because ZCTextIndex operators are embedded in the query string,
- there are no additional Record Attributes for ZCTextIndexes.
-
- Creating Records in HTML
-
- You can also perform record queries using HTML forms. Here's an
- example showing how to create a search form using records::
-
- <form action="Report" method="get">
- <table>
- <tr><th>Search Terms (must match all terms)</th>
- <td><input name="content.query:record" width=30 value=""></td></tr>
- <input type="hidden" name="content.operator:record" value="and">
- <tr><td colspan=2 align=center>
- <input type="SUBMIT" value="Submit Query">
- </td></tr>
- </table>
- </form>
-
- For more information on creating records in HTML see the section
- "Passing Parameters to Scripts" in Chapter 14, Advanced Zope
- Scripting.
-
- Automatic Cataloging
-
- Automatic Cataloging is an advanced ZCatalog usage pattern that
- keeps objects up to date as they are changed. It requires that as
- objects are created, changed, and destroyed, they are
- automatically tracked by a ZCatalog. This usually involves the
- objects notifying the ZCatalog when they are created, changed, or
- deleted.
-
- This usage pattern has a number of advantages in comparison to
- mass cataloging. Mass cataloging is simple but has drawbacks. The
- total amount of content you can index in one transaction is
- equivalent to the amount of free virtual memory available to the
- Zope process, plus the amount of temporary storage the system has.
- In other words, the more content you want to index all at once,
- the better your computer hardware has to be. Mass cataloging
- works well for indexing up to a few thousand objects, but beyond
- that automatic indexing works much better.
-
- If you can trade off memory for time, you can enable
- 'Subtransactions' in the 'Advanced' tab of the catalog. This
- commits the work in chunks, reducing memory requirements, but
- taking longer. It is a good solution for mass cataloging with a
- very large number of records.
-
- Another major advantage of automatic cataloging is that it can
- handle objects that change. As objects evolve and change, the
- index information is always current, even for rapidly changing
- information sources like message boards.
-
- On the other hand, cataloging a complex object when it changes
- (especially if the catalog index attempts to translate the
- information, as TextIndexNG, described below, can do with
- PDF files or Microsoft Office files). Some sites may benefit
- from mass cataloging, and having a cron job or other scheduled
- job initiate the mass cataloging every night.
-
- In standard (non-CMF, non-Plone) Zope, none of the built-in
- object types attempt to automatically catalog themselves. In
- CMF and Plone, the "contentish" object (Documents, News Item,
- Event, etc.) all use automatic cataloging to add themselves
- to the standard CMF catalog, 'portal_catalog'. The CMF
- and especially Plone offer many advantages; if you're interested
- in building a content-oriented site, you should consider
- these technologies. However, to help you understand the
- process of creating a simple, non-CMF, non-Zope object,
- we'll demonstrate another technique below.
-
- In this section, we'll show you an example that creates "news"
- items that people can add to your site. These items will get
- automatically cataloged. This example consists of two steps:
-
- o Creating a new type of object to catalog.
-
- o Creating a ZCatalog to catalog the newly created objects.
-
- As mentioned, none of the "out-of-the-box" non-CMF Zope objects
- support automatic cataloging. This is for backwards compatibility
- reasons. For now, you have to define your own kind of objects
- (or use CMF or Plone and one of the contentish types in these
- systems that automatically catalog themselves.)
- One of the ways you can create your own objects that catalog
- themselves is by defining a *ZClass*.
-
- A ZClass is a Zope object that defines new types of Zope objects.
- In a way, a ZClass is like a blueprint that describes how new Zope
- objects are built. Consider a news item as discussed in examples
- earlier in the chapter. News items not only have content, but
- they also have specific properties that make them news items.
- Often these Items come in collections that have their own
- properties. You want to build a News site that collects News
- Items, reviews them, and posts them online to a website where
- readers can read them.
-
- In this kind of system, you may want to create a new type of
- object called a *NewsItem*. This way, when you want to add a new
- *NewsItem* to your site, you just select it from the product add
- list. If you design this object to be automatically cataloged,
- then you can search your news content very powerfully. In this
- example, you will just skim a little over ZClasses, which are
- described in much more detail in Chapter 22, "Extending Zope."
-
- New types of objects are defined in the *Products* section of the
- Control Panel. This is reached by clicking on the Control Panel and
- then clicking on *Product Management*. Products contain new kinds of
- ZClasses. On this screen, click "Add" to add a New product. You will
- be taken to the Add form for new Products.
-
- Name the new Product *NewsItem* and click "Generate". This will take you
- back to the Products Management view and you will see your new Product.
-
- Select the *NewsItem* Product by clicking on it. This new Product looks a lot
- like a Folder. It contains one object called *Help* and has an Add
- menu, as well as the usual Folder "tabs" across the top. To add a new
- ZClass, pull down the Add menu and select *ZClass*. This will take you
- to the ZClass add form, as shown in the figure below.
-
- "ZClass add form":img:16-7:Figures/creatingzclass.png
-
- This is a complicated form which will be explained in much more
- detail in Chapter 14, "Extending Zope". For now, you only need to
- do three things to create your ZClass:
-
- o Specify the Id "NewsItem" This is the name of the new ZClass.
-
- o Specify the meta_type "News Item". This will be used to create the
- Add menu entry for your new type of object.
-
- o Select 'ZCatalog:CatalogPathAware' from the left hand *Base Classes*
- box, and click the button with the arrow pointing to the right hand
- *Base Classes* box. This should cause 'ZCatalog:CatalogPathAware' to
- show up in the right hand window. Note that if you are
- inheriting from more than one base class, 'CatalogPathAware'
- should be the first (specifically, it should come before
- 'ObjectManager').
-
- When you're done, don't change any of the other settings in the Form.
- To create your new ZClass, click *Add*. This will take you back to
- your *NewsItem* Product. Notice that there is now a new object called
- *NewsItem* as well as several other objects. The *NewsItem* object is
- your new ZClass. The other objects are "helpers" that you will examine
- more in Chapter 14, "Extending Zope".
-
- Select the *NewsItem* ZClass object. Your view should now look like
- the figure below.
-
- "A ZClass Methods View":img:16-8:Figures/zclassmethods.png
-
- This is the *Methods* View of a ZClass. Here, you can add Zope objects
- that will act as *methods on your new type of object*. Here, for
- example, you can create Page Templates or Scripts and these
- objects will become methods on any new *News Items* that are created.
- Before creating any methods however, let's review the needs of this new
- "News Item" object:
-
- News Content -- The news Item contains news content, this is its
- primary purpose. This content should be any kind of plain text or
- marked up content like HTML or XML.
-
- Author Credit -- The News Item should provide some kind of credit to
- the author or organization that created it.
-
- Date -- News Items are timely, so the date that the item was created
- is important.
-
- You may want your new News Item object to have other properties, these
- are just suggestions. To add new properties to your News Item click on
- the *Property Sheets* tab. This takes you to the *Property Sheets*
- view.
-
- Properties are added to new types of objects in groups called *Property
- Sheets*. Since your object has no property sheets defined, this view
- is empty. To add a New Property Sheet, click *Add Common Instance
- Property Sheet*, and give the sheet the name "News". Now click *Add*.
- This will add a new Property Sheet called *News* to your object.
- Clicking on the new Property Sheet will take you to the *Properties*
- view of the *News* Property Sheet, as shown in the figure below.
-
- "The properties screen for a Property Sheet":img:16-9:Figures/propertysheet.png
-
- This view is almost identical to the *Properties* view found on Folders
- and other objects. Here, you can create the properties of your News
- Item object. Create three new properties in this form:
-
- content -- This property's type should be *text*. Each newly
- created News Item will contain its own unique content property.
-
- author -- This property's type should be *string*. This will
- contain the name of the news author.
-
- date -- This property's type should be *date*. This will contain
- the time and date the news item was last updated. A *date* property
- requires a value, so for now you can enter the string "01/01/2000".
-
- That's it! Now you have created a Property Sheet that describes your
- News Items and what kind of information they contain. Properties can
- be thought of as the *data* that an object contains. Now that we have
- the data all set, you need to create an *interface* to your new kind of
- objects. This is done by creating a new Form/Action pair to
- change the data and assigning it to a new *View* for your object.
-
- The Form/Action pair will give you the ability to edit the data
- defined in the propertysheet, while the View binds the form to a
- tab of the Zope Management Interface.
-
- Propertysheets come with built-in forms for editing their data;
- however we need to build our own so we can signal changes to the
- ZCatalog.
-
- First we are going to create a form to display and edit
- properties. Click on the *Methods* tab. Select "Page Template"
- from the add drop-down menu, name it 'editPropertiesForm' and fill
- it with::
-
- <html><head>
- <title tal:content="here/title_or_id">title</title>
- <link rel="stylesheet" type="text/css" href="/manage_page_style.css">
- </head>
- <body bgcolor="#FFFFFF" link="#000099" vlink="#555555">
- <span
- tal:define="manage_tabs_message options/manage_tabs_message | nothing"
- tal:replace="structure here/manage_tabs">
- prefab management tabs
- </span>
- <form action="manage_editNewsProps" method="get">
- <table>
- <tr>
- <th valign="top">Content</th>
- <td>
- <textarea
- name="content:text" rows="6" cols="35"
- tal:content="here/content">content text</textarea>
- </td>
- </tr>
- <tr>
- <th>Author</th>
- <td>
- <input name="author:string"
- value="author string"
- tal:attributes="value here/author">
- </td>
- </tr>
- <tr>
- <th>Date</th>
- <td>
- <input name="date:date"
- value="the date"
- tal:attributes="value here/date">
- </td>
- </tr>
- <tr><td></td><td>
- <input type="submit">
- </td></tr>
- </form>
- </body>
- </html>
-
- This is the Form part of the Form/Action pair. Note the call of
- 'manage_tabs' at the top of the form - this will give your form
- the standard ZMI tabs.
-
- We will add the Action part now. Add a 'Script (Python)' object
- and fill in the id 'manage_editNewsProps' and the following code::
-
- # first get the request
- req = context.REQUEST
- # change the properties in the zclass' propertysheet
- context.propertysheets.News.manage_editProperties(req)
- # signal the change to the zcatalog
- context.reindex_object()
- # now return a message
- form = context.editPropertiesForm
- return form(REQUEST=req,
- manage_tabs_message="Saved changes.",
- )
-
- Done. The next step will be to define the View. Click on the
- *Views* tab. This will take you to the *Views* view.
-
- Here, you can see that Zope has created three default Views for
- you. These views will be described in much more detail in Chapter
- 14, "Extending Zope", but for now, it suffices to say that these
- views define the tabs that your objects will eventually have.
-
- To create a new view, use the form at the bottom of the Views
- view. Create a new View with the name "News" and select
- "editPropertiesForm" from the select box and click *Add*. This
- will create a new View on this screen under the original three
- Views, as shown in the figure below.
-
- "The Views view":img:16-10:Figures/zclassviews.png
-
- We want to make our View the first view that you see when you
- select a News Item object. To change the order of the views,
- select the newly created *News* view and click the *First* button.
- This should move the new view from the bottom to the top of the
- list.
-
- The final step in creating a ZClass is defining a method for
- displaying the class. Click on the *Methods* tab, select 'Page
- Template' from the add list and add a new Page Template with the
- id "index_html". This will be the default view of your news item.
- Add the following to the new template::
-
- <html><head>
- <title tal:content="template/title">The title</title>
- </head><body>
- <h1>News Flash</h1>
- <p tal:content="here/date">
- date goes here
- </p>
- <p tal:content="here/author">
- author goes here
- </p>
- <p tal:content="here/content">
- content goes here
- </p>
- </body></html>
-
- Finally, we will add a new management tab for the display method.
- Once again, click on the *Views* tab, and create a *View* named
- "View", and assign the 'index_html' to it. Reorder the views so
- that the 'News' view comes first, followed by the 'View' method.
-
- That's it! You've created your own kind of object called a *News
- Item*. When you go to the root folder, you will now see a new entry in
- your add list.
-
- But don't add any new News Items yet, because the second step in
- this exercise is to create a ZCatalog that will catalog your new
- News Items. Go to the root folder and create a new ZCatalog with
- the id 'Catalog'. The ZClass finds the ZCatalog by looking for a
- catalog named 'Catalog' through acquisition, so this ZCatalog
- should be where it can be acquired by all NewsItems you plan to
- create.
-
- Like the previous two examples of using a ZCatalog, you need to
- create Indexes and a Metadata Table that make sense for your
- objects. Create the following indexes:
-
- content -- This should be a ZCTextIndex. This will index the content
- of your News Items.
-
- author -- This should be a FieldIndex. This will index the author of
- the News Item.
-
- date -- This should be a DateIndex. This will index the date of the
- News Item.
-
- After creating these Indexes, add these Metadata
- columns:
-
- o author
-
- o date
-
- o absolute_url
-
- After creating the Indexes and Metadata Table columns, the
- automatic cataloguing is basically working. The last step is
- creating a search interface for the ZCatalog using the Z Search
- Interface tool described previously in this chapter:
-
- Now you are ready to go. Start by adding some new News Items to your
- Zope. Go anywhere in Zope and select *News Item* from the add list.
- This will take you to the add Form for News items.
-
- Give your new News Item the id "KoalaGivesBirth" and click *Add*.
- This will create a new News Item. Select the new News Item.
-
- Notice how it has four tabs that match the five Views that were in the
- ZClass. The first View is *News*, this view corresponds to the *News*
- Property Sheet you created in the News Item ZClass.
-
- Enter your news in the *contents* box::
-
- Today, Bob the Koala bear gave birth to little baby Jimbo.
-
- Enter your name in the *Author* box, and today's date in the *Date*
- box.
-
- Click *Change* and your News Item should now contain some news.
- Because the News Item object is *CatalogPathAware*, it is automatically
- cataloged when it is changed or added. Verify this by looking at the
- *Cataloged Objects* tab of the ZCatalog you created for this example.
-
- The News Item you added is the only object that is cataloged. As you
- add more News Items to your site, they will automatically get cataloged
- here. Add a few more items, and then experiment with searching the
- ZCatalog. For example, if you search for "Koala" you should get back
- the *KoalaGivesBirth* News Item.
-
- At this point you may want to use some of the more advanced search
- forms that you created earlier in the chapter. You can see for
- example that as you add new News Items with new authors, the
- authors select list on the search form changes to include the new
- information.
-
- Advanced Catalog Topics
-
- Sorting
-
- When you execute a ZCatalog call, your result set may or may not
- be returned in a particular order:
-
- - If your query contains no text index fields, your results will
- not be sorted in any particular order. For example, with a
- query based off a KeywordIndex, or query based off both
- a KeywordIndex and a DateIndex, you will get a indeterminate
- ordering.
-
- - For results that include a text index, your results will be
- returned in order of revelance of the text search. That is,
- the result set will be sorted based how often
- search words appear in the indexes. A search for the word
- 'frog' against a text index will give priority toward an object
- that uses that word many times compared with
- an object that uses that fewer. This is
- a simplified version of the way that many web search engines
- work: the more "relevant" your keywords are to an item, the
- higher its ordering in the results. In particular, with
- the ZCTextIndex, you have a choice between two algorithms
- for how to weight the sorting:
-
- - Okapi: is the best general choice. It does very well
- when comparing an ordinary "human query" against a longer
- text field. For example, querying a long description field
- for a short query like 'indoor OR mammal' would work very
- well.
-
- - Cosine: is better suited for when the length of the
- query comes close to matching the length of the field
- itself.
-
- You, of course, may want to force a particular order onto your
- results. You can do this after you get a result set using
- normal Python syntax::
-
- # get ordered results from search
- zcat=context.AnimalCatalog
- results=zcat({'title':'frog'})
- results=[(row.title, row) for row in results]
- results.sort()
-
- This can be, however, very inefficient.
-
- When results are returned by the ZCatalog, they are in a special
- form called a `LazyResults` set. This means that Zope hasn't
- gone to the trouble of actually creating the entire list, but
- has just sketched out the list and will fill it in at the exact
- point that you ask for each item. This is helpful, since it lets
- you query the catalog for a result set with 10,000 items without
- Zope having to really construct a 10,000 item long list of results.
- However, when we try to sort this, Zope will have to actually
- create this list since it can't rely on it's lazy, just-in-time
- method.
-
- Normally, you'll only show the first 20 or 50 or so of a result
- set, so sorting 10,000 items just to show the first 20 is a waste
- of time and memory. Instead, we can ask the catalog to do the
- sorting for us, saving both time and space.
-
- To do this, we'll pass along several additional keywords in our
- search method call or query:
-
- sort_on -- The field name to sort the results on
-
- sort_order -- 'ascending' or 'descending', with the default
- being 'ascending. Note that you can also use 'reverse'
- as a synonym for 'descending'
-
- sort_limit -- Since you're likely to only want to use the
- first 20 or 50 or so items, we can give a hint to the
- ZCatalog not to bother to sort beyond this by passing along
- a 'sort_limit' parameter, which is the number of records
- to sort.
-
- For example, assuming we have a 'latin_name' FieldIndex on our
- animals, we can sort them by name in a PythonScript with::
-
- zcat=context.AnimalCatalog
- zcat({'sort_on':'latin_name'})
-
- or::
-
- zcat=context.AnimalCatalog
- zcat({'sort_on':'latin_name', 'sort_order':'descending'})
-
- or, if we know we'll only want to show the first 20 records::
-
- zcat=context.AnimalCatalog
- zcat({'sort_on':'latin_name',
- 'sort_order':'descending',
- 'sort_limit':20})
-
- or, combining this with a query restriction::
-
- zcat=context.AnimalCatalog
- zcat({'title':'frog',
- 'sort_on':'latin_name',
- 'sort_order':'descending',
- 'sort_limit':20})
-
- This gives us all records with the 'title' "frog", sorted
- by 'latin_name', and doesn't bother to sort after the first
- 20 records.
-
- Note that using 'sort_limit' does not guarantee that we'll get
- exactly that number of records--we may get fewer if they're
- aren't that many matching or query, and we may get more.
- 'sort_limit' is merely a request for optimization. To
- ensure that we get no more than 20 records, we'll want to
- truncate our result set::
-
- zcat=context.AnimalCatalog
- zcat({'sort_on':'latin_name',
- 'sort_order':'descending',
- 'sort_limit':20})[:20]
-
- Unsortable Fields
-
- In order to sort on a index, we have to actually keep the
- full attribute or method value in that index. For many
- index types, such as DateIndex or FieldIndex, this is
- normally done. However, for text indexes, such as
- ZCTextIndex, TextIndex (deprecated), and TextIndexNG
- (described below), the index doesn't keep the actual
- attribute or method results in the index. Instead, it
- cleans up the input (often removing "stop words",
- normalizing input, lowercasing it, removing duplicates,
- etc., depending on the options chosen. So a term paper
- with an attribute value of::
-
- "A Critique of 'Tora! Tora! Tora!'"
-
- could actually be indexed as :
-
- ( 'critique', 'tora' )
-
- once the common stop words ("a", "of") are removed,
- it is lowercased and de-deduplicated. (In reality,
- the indexed information is much richer, as it keeps
- track of things like how often words appear, and which
- words appear earlier in the the stream, but this gives
- you an idea of what is stored.)
-
- This is a neccessary and positive step to make the index
- use less storage and less memory, and increases search
- results, as your site user doesn't have to worry about
- getting incidental words ("the", "a", etc.) correct,
- nor about capitalization, etc.
-
- **Note:** As we'll see, TextIndexNG indexes can even
- do advanced tricks, such as normalizing a word and
- stemming it, so that a search for "vehicles" could
- find "vehicle" or even "car".
-
- However, this process means that the index no longer knows
- the actual value, and, therefore, can't sort on it.
- Due to this, it is not possible to use the 'sort_on'
- feature with text indexes types.
-
- To work around this, you can either sort the results of
- the query using the normal python 'sort()' feature
- (shown above), or you can create an additional non-text
- index on the field, described below, in the section
- 'Indexing a Field with Two Index Types'.
-
- Similarly, the API call 'uniqueValuesFor', described above,
- cannot be used on text-type indexes, since the exact
- values are not kept.
-
- Searching in More Than One Index Using "OR"
-
- As mentioned, if you search in more than one index,
- you must meet your criteria for each index you search
- in, i.e., there is an implied 'AND' between each of the
- searches::
-
- # find sunset art by Van Gogh
- zcat=context.ArtCatalog
- results=zcat({'keyword':'sunsets', 'artist':'Van Gogh'})
-
- This query finds all sunset art by Van Gogh: both of
- these conditions must be true.
-
- There is no way to directly search in more than one
- index without this 'AND' condition; instead, you can
- perform two catalog searches and concatenate their
- results. For example::
-
- # find sunset art OR art by Van Gogh
- zcat=context.ArtCatalog
- results=zcat({'keyword':'sunsets'}) + \
- zcat({'artist':'Van Gogh'})
-
- This method, however, does not remove duplicates, so
- a painting of a sunset by VanGogh would appear twice.
-
- For alternate strategies about searching in two places,
- see 'PrincipiaSearchSource' and 'FieldedTextIndex', below,
- both of which can be used as possible workarounds.
-
- Indexing a Field With Two Index Types
-
- Since the different indexes act differently, it can be advantageous
- to have the same attribute indexed by more than one index. For
- example, our animals have a 'latin_name' attribute that gives their
- formal genus/species latin name. A user should be able to search
- that trying to match a name *exactly*, and we should be able to
- sort results based on that, both of which suggest a FieldIndex. In
- addition, though, users may want to search that like a text field,
- where they can match parts of words, in which case we would a
- ZCTextIndex (or TextIndexNG, described below).
-
- In a case like this, a good strategy is to create one index for the
- FieldIndex on 'latin_name'. Let's call that index 'latin_name'.
- Then, you can create a ZCTextIndex that uses a new feature: the
- ability to have the indexed attribute be different than the index
- name itself.
-
- When you create the second index, the ZCTextIndex, you can give it
- the Id 'latin_name_text', and have the 'Indexed attributes' field
- be 'latin_name'. Now, when we catalog our animals, their
- 'latin_name' attribute is indexed in two ways: once, as a
- FieldIndex, that we can sort against and match exactly, and once as
- a ZCTextIndex, that we can search like a text field with full text
- search.
-
- The second index has a different name, so when make our catalog
- call, we'll need to be sure to use that name if we want to search
- it like a text field::
-
- # search latin_name
- zcat=context.AnimalCatalog
- exact_results=zcat({'latin_name':'homo sapien'})
- fuzzy=zcat({'latin_name_text':'sap*'})
-
- Note that a good strategy is to have the search be against the
- ZCTextIndex, but sort it by the FieldIndex::
-
- # free text search, sorted
- zcat=context.AnimalCatalog
- results=zcat({'latin_name_text':'sap*',
- 'sort_on':'latin_name'})
-
- PrincipiaSearchSource
-
- You can choose to create indexes on any attribute or method that
- you would find useful to search on; however, one that is
- generally helpful is 'PrincipiaSearchSource'. Several of the
- built-in Zope objects, such as DTMLDocuments, and many add-on
- objects to Zope have a 'PrincipiaSearchSource' attribute or
- method that returns a value that is meant to be used for general
- purpose searching. Traditionally, 'PrincipiaSearchSource'
- would include the text in an object's title, it's body, and
- anywhere else you'd want to be able to search.
-
- For example, if you downloaded a zope product that managed
- our zoo, and it had an Animal type that you could add to your
- site, this animal type would probably expose a
- PrincipiaSearchSource that looked something like this::
-
- def PrincipiaSearchSource(self):
- "used for general searching for animal"
- return self.title + ' ' + self.latin_name + ' ' \
- + self.description + ' ' + self.environment
-
- So that, if you create a 'PrincipiaSearchSource' index and
- search again that, you can find this animal by using words
- that are in it's 'title', 'latin_name', 'description', or
- 'environment', without having to worry about which field,
- exactly, they're in. This is similar to searching with a
- web search engine, in that you use can use a single text string
- to find the "right" information, without needing to know about
- the type of object you're looking for. It is especially
- helpful in allowing you to create a site-wide search: searching
- animals specifically by their 'latin_name' or 'environment'
- might be useful for a biologist in the right section of your
- site, but for a general purpose visitor, they might like
- to search using the phrase "jungle" and find results without
- having to know to search for that in the 'environment' field
- of a search form.
-
- If you create custom types, either using ZClasses, as shown
- above, or by using more advanced techniques described
- elsewhere, you should create a PrincipiaSearchSource method
- that returns appropriate object-wide text searching capabilities.
-
- ZCatalogs and CMF/Plone
-
- The CMF was built from the ground up to understand the
- difference between things that are "content", such as a news item
- or press release, and those things that are not, such as
- a DTMLMethod used to show a press release, or a ZCatalog
- object. In addition, the CMF includes several stock items
- that are intended to be used for content, including:
- Document, Event, NewsItem, and others. These content items
- are already set up for autocataloging, so that any changes
- made will appear in the catalog.
-
- In non-CMF Zope, the traditional name for a general-purpose
- catalog is 'Catalog' (though you can always create your own
- catalog with any id you want; we've used the example
- 'AnimalCatalog' in this chapter for a special-purpose catalog
- for searching animal-specific info in our zoo.) Even though
- 'Catalog' is the traditional name, Zope does not come with
- such a catalog in the ZODB already, you have to create it.
-
- In CMF (and Plone, an out-of-the-box portal system built
- on top of the CMF), there is always a catalog created, called
- 'portal_catalog', at the root of the CMF site. All of the
- built-in content objects (and almost every add-on content
- object for the CMF/Plone) are set to autocatalog to this
- 'portal_catalog'. This is required, since many of the features
- of the CMF and Plone, such as listing current content, finding
- content of correct types, etc., rely on the 'portal_catalog'
- and the searching techniques shown here to function.
-
- In CMF and Plone, the index name 'PrincipiaSearchSource' is
- not traditionally used. Instead, an index is created called
- 'SearchableText', and used in the same manner as
- 'PrincipiaSearchSource'. All of the standard contentish
- objects have a 'SearchableText' method that returns things
- like title, description, body, etc., so that they can be
- general-text searched.
-
- Keeping Non-ZODB Content in ZCatalog
-
- The ZCatalog is such a useful and powerful tool for searching,
- it's possible that you may want to use it to search data that
- is stored in placed other than in the ZODB. Later in this book,
- you'll learn about storing data in relational databases and
- being able to access and view that data from Zope. While Zope
- excels at working with relational databases, many databases
- have poor full-text-indexing capabilities. In addition, site
- visitors may want to search your site, as described above, for
- a single phrase, like "jungle", and not know or care if the
- information they're looking for is in the ZODB or in a
- relational database.
-
- To help with this, you can store information about relational
- database information in the ZCatalog, too. It's an advanced
- technique, and will require that you understand ZSQLMethods
- (described in the relational database chapter) and Python
- scripting. You can learn about this technique in
- "Cataloging SQL Data and Almost Anything Else":http://zope.org/Members/rbickers/cataloganything
-
- Add-On Index Types
-
- TextIndexNG
-
- TextIndexNG is a new text index that competes with ZCTextIndex.
- Unlike ZCTextIndex, TextIndexNG is an add-on product that must be
- separately installed. It offers a large number of features:
-
- - Document Converters
-
- If your attribute value isn't plain text, TextIndexNG can convert
- it to text to index it. This will allow you to store, for
- instance, a PDF file in Zope
- and be able to search the text of that PDF file. Current
- formats it can convert are: HTML, PDF, Postscript, Word,
- Powerpoint, and OpenOffice.
-
- - Stemmer Support
-
- Reduces words to a stem (removes verb endings and
- plural-endings), so a user can search for "car" and get "car"
- and "cars", without having to try the search twice. It
- knows how to perform stemming in 13 different languages.
-
- - Similarity Search
-
- Can find words that are "similar" to your words, based on
- the Levenshtein algorithm. Essentially, this measures the
- distance between two terms using indicators such as how
- many letters differ from one to another.
-
- - Near Search
-
- Can look for words that are near each other. For example,
- a search for "Zope near Book" would find results where
- these words were close to each other in the document.
-
- - Customizable Parsers
-
- Rather than having only one way to express a query, TextIndexNG
- uses a "pluggable" architecture where a Python programmers can
- create new parsers. For example, to find a document that
- includes the word "snake" but not the word "python", you'd
- search for "snake andnot python" in the default parser.
- However, given your users expectations (and native language),
- they might prefer to say "snake and not python" or "snake
- -python" or such. TextIndexNG comes with three different
- parsers: a rich, default one, a simple one that is suitable for
- more general serarching, and a German one that uses
- german-language words ("nicht" for "not", for example).
- Although writing a new parser is an advanced task, it would be
- possible for you to do so if you wanted to let users express
- the question in a different form.
-
- - Stop Words
-
- You can customize the list of "stop words" that are too common
- to both indexing or search for.
-
- - Wilcard Search
-
- You can use a "wildcard" to search for part of a word, such as
- "doc*" to find all words starting with "doc". Unlike
- ZCTextIndex, you can also use wildcards are the start of a
- word, such as "*doc" to find all words ending with "doc", as
- well.
-
- - Normalization Support
-
- Removing accented characters so that users can search for an
- accented word without getting the accents exactly right.
-
- - Auto-Expansion
-
- This optional feature allows you to get better search results
- when some of the query terms could not be found. In this
- case, it uses a similarity matching to "expand" the query
- term to find more matches.
-
- - Ranking Support
-
- Sorting of results based on their word frequencies,
- similar to the sorting capabilities of ZCTextIndex.
-
- TextIndexNG is an excellent replacement for ZCTextIndex,
- especially if you have non-English language documents or expect to
- have users that will want to use a rich query syntax.
-
- Full information on TextIndexNG is available at
- "http://www.zope.org/Members/ajung/TextIndexNG":http://www.zope.org/Members/ajung/TextIndexNG.
-
- FieldedTextIndex
-
- FieldTextIndex is a new index type that is not (yet) a standard
- part of Zope, but is a separate product that can be installed
- and used with a standard catalog.
-
- Often, a site will have a combined field (normally
- 'PrincipiaSearchSource' or 'SearchableText', as described above)
- for site-wide searching, and individual fields for more
- content-aware searching, such as the indexes on 'latin_name',
- 'environment', etc.
-
- Since it's slows down performance to concatenate catalog result
- sets directly, the best strategy for searching across many fields
- is often use the 'PrincipiaSearchSource'/'SearchableText'
- strategy of a single text index. However, this can be *too*
- limiting, as sometimes users want to search in several fields at
- once, rather than in all.
-
- FieldedTextIndex solves these problems by extending the standard
- ZCTextIndex so that it can receive and index the textual data of an
- object's field attributes as a mapping of field names to field
- text. The index itself performs the aggregation of the fielded
- data and allows queries to be performed across all fields (like a
- standard text index) or any subset of the fields which have been
- encountered in the objects indexed.
-
- In other words, a normal 'PrincipiaSearchSource' method would
- look something like this::
-
- # concatenate all fields user might want to search
- def PrincipiaSearchSource(self):
- return self.title + ' ' + self.description \
- + self.latin_name + ' ' + self.environment
-
- However, you have to search this all at once--you can't opt to
- search just 'title' and 'latin_name', unless you created separate
- indexes for these fields. Creating separate indexes for these
- fields is a waste of space and memory, though, as the same
- information is indexed several times.
-
- With FieldedTextIndex, your 'PrincipiaSearchSource' method would
- look like this::
-
- # return all fields user might want to search
- def PrincipiaSearchSource(self):
- return { 'title':self.title,
- 'description':self.description,
- 'latin_name':self.latin_name,
- 'environment':self.environment }
-
- This index can be searched with the normal methods::
-
- # search like a normal index
- zcat=context.AnimalCatalog
- results=zcat({'PrincipiaSearchSource':'jungle'})
-
- In addition, it can be searched indicating which fields you want
- to search::
-
- # search only specific fields
- zcat=context.AnimalCatalog
- results=zcat(
- {'PrincipiaSearchSource':'query':'jungle',
- 'fields':['title','latin_name']})
-
- In this second example, only 'title' and 'latin_name' will be
- searched.
-
- In addition, FieldedTextIndexes support *weighing*, so that
- different fields "weigh" more in the query weigh, and a match in
- that field influences the results so that it appears earlier in the
- result list. For example, in our zoo, matching part of an animals
- 'latin_name' should count very highly, matching part of the
- 'title' should count highly, and matching part of the description
- should count less so.
-
- We can specify the weighing like this::
-
- # search with weighing
- zcat=context.AnimalCatalog
- results=zcat(
- {'PrincipiaSearchSource':'query':'jungle',
- 'field_weights':{
- 'latin_name':3,
- 'title':2,
- 'description':1 }})
-
- This is a *very* powerful feature for building a comprehensive
- search strategy for a site, since it lets us control the results to
- better give the user what they probaby want, rather than returning
- documents based solely on how many times their search word appears.
-
- The examples given here are for searching a FieldedIndex using
- PythonScripts, however they can be searched directly from the
- REQUEST in a form like other fields.
-
- Since a FieldedTextIndex can act just like a normal ZCTextIndex if
- queried with just a search string, yet offer additional features
- above and beyond the normal ZCTextIndex, it's a good idea to use
- this for any text index where you'd concatenate more than one
- attribute or method result together, such as for 'SearchableText'
- or 'PrincipiaSearchSource'.
-
- FieldedTextIndex can be downloaded at
- "http://zope.org/Members/Caseman/FieldedTextIndex":http://zope.org/Members/Caseman/FieldedTextIndex.
- Full documentation on how to create this type of index, and further
- information on how to search it, including how to search it from
- web forms, is available in the README file that comes with this
- product.
-
- Conclusion
-
- The cataloging features of ZCatalog allow you to search your objects
- for certain attributes very quickly. This can be very useful for sites
- with lots of content that many people need to be able to search in an
- efficient manner.
-
- Searching the ZCatalog works a lot like searching a relational
- database, except that the searching is more object-oriented. Not all
- data models are object-oriented however, so in some cases you will want
- to use the ZCatalog, but in other cases you may want to use a
- relational database. The next chapter goes into more details about how
- Zope works with relational databases, and how you can use relational
- data as objects in Zope.
Copied: zope2book/trunk/source/SearchingZCatalog.rst (from rev 96426, zope2book/trunk/SearchingZCatalog.stx)
===================================================================
--- zope2book/trunk/source/SearchingZCatalog.rst (rev 0)
+++ zope2book/trunk/source/SearchingZCatalog.rst 2009-02-10 23:26:53 UTC (rev 96431)
@@ -0,0 +1,2268 @@
+Searching and Categorizing Content
+==================================
+
+The ZCatalog is Zope's built in search engine. It allows you to
+categorize and search all kinds of Zope objects. You can also use it
+to search external data such as relational data, files, and remote
+web pages. In addition to searching you can use the ZCatalog to
+organize collections of objects.
+
+The ZCatalog supports a rich query interface. You can perform full text
+searching, can search multiple indexes at once, and can even specify
+weighing for different fields in your results. In addition, the
+ZCatalog keeps track of meta-data about indexed objects.
+
+The two most common ZCatalog usage patterns are:
+
+Mass Cataloging
+ Cataloging a large collection of objects all at once.
+
+Automatic Cataloging
+ Cataloging objects as they are created and tracking changes made to them.
+
+Getting started with Mass Cataloging
+------------------------------------
+
+Let's take a look at how to use the ZCatalog to search documents.
+Cataloging a bunch of objects all at once is called *mass cataloging*.
+Mass cataloging involves four steps:
+
+- Creating a ZCatalog
+
+- Creating indexes
+
+- Finding objects and cataloging them
+
+- Creating a web interface to search the ZCatalog.
+
+Creating a ZCatalog
+-------------------
+
+Choose *ZCatalog* from the product add list to create a ZCatalog
+object within a subfolder named 'Zoo'. This takes you to the
+ZCatalog add form, as shown in the figure below.
+
+.. figure:: ../Figures/creatingzcatalog.png
+
+ ZCatalog add form
+
+The Add form asks you for an *Id* and a *Title*. Give your
+ZCatalog the Id 'AnimalCatalog' and click *Add* to create your new
+ZCatalog. The ZCatalog icon looks like a folder with a small
+magnifying glass on it. Select the *AnimalCatalog* icon to see
+the *Contents* view of the ZCatalog.
+
+A ZCatalog looks a lot like a folder, but it has a few more
+tabs. Six tabs on the ZCatalog are the exact same six tabs you
+find on a standard folder. ZCatalog have the following views:
+*Contents*, *Catalog*, *Properties*, *Indexes*, *Metadata*,
+*Find Objects*, *Advanced*, *Undo*, *Security*, and *Ownership*.
+When you click on a ZCatalog, you are on the *Contents*
+view. Here, you can add new objects and the ZCatalog will
+contain them just as any folder does. Although a ZCatalog is
+like a normal Zope folder, this does not imply that the objects
+contained within it are automatically searchable. A ZCatalog
+can catalog objects at any level of your site, and it needs to
+be told exactly which ones to index.
+
+Creating Indexes
+~~~~~~~~~~~~~~~~
+
+In order to tell Zope what to catalog and where to store the
+information, we need to create a *Lexicon* and an *Index*. A
+*Lexicon* is necessary to provide word storage services for
+full-text searching, and an *Index* is the object which stores
+the data necessary to perform fast searching.
+
+In the contents view of the *AnimalCatalog* ZCatalog, choose
+*ZCTextIndex Lexicon*, and give it an id of *zooLexicon*
+
+.. figure:: ../Figures/creatinglexicon.png
+
+ ZCTextIndex Lexicon add form
+
+Now we can create an index that will record the information we
+want to have in the ZCatalog. Click on the *Indexes* tab of the
+ZCatalog. A drop down menu lists the available indexes. Choose
+*ZCTextIndex*; in the add form fill in the id *zooTextIdx*.
+Fill in *PrincipiaSearchSource* in the "Field name" input. This
+tells the ZCTextIndex to index the body text of the DTML
+Documents (*PrincipiaSearchSource* is an API method of all DTML
+Document and Method objects). Note that *zooLexicon* is
+preselected in the *Lexicon* menu.
+
+.. figure:: ../Figures/creatingtextindex.png
+
+ ZCTextIndex add form
+
+.. note::
+
+ When you want the textindex to work on other types of objects,
+ they have to provide a method named "PrincipiaSearchSource" which
+ returns the data of the object which has to be searched.
+
+To keep this example short we will skip over some of the options
+presented here. In the section on indexes below, we will
+discuss this more thoroughly.
+
+Additionally, we will have to tell the ZCatalog which attributes
+of each cataloged object that it should store directly. These
+attributes are called *Metadata*, however they should not be
+confused with the idea of metadata in Zope CMF, Plone, or other
+content management systems--here, this just means that these are
+attributes that will be stored directly in the catalog for
+performance benefits. For now, just go to the
+*Metadata* tab of the ZCatalog and add *id* and *title*.
+
+Finding and Cataloging Objects
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Now that you have created a ZCatalog and an Index, you can move
+onto the next step: finding objects and cataloging them.
+Suppose you have a zoo site with information about animals. To
+work with these examples, create two DTML Documents along-side
+the *AnimalCatalog* object (within the same folder that contains
+the *AnimalCatalog* ZCatalog) that contain information about
+reptiles and amphibians.
+
+The first should have an Id of "chilean_frog", a title "Chilean
+four-eyed frog" and its body text should read something like
+this::
+
+ The Chilean four-eyed frog has a bright
+ pair of spots on its rump that look like enormous eyes. When
+ seated, the frog's thighs conceal these eyespots. When
+ predators approach, the frog lowers its head and lifts its
+ rump, creating a much larger and more intimidating head.
+ Frogs are amphibians.
+
+For the second, fill in an id of "carpet_python" and a title of
+"Carpet Python"; its body text could be::
+
+ *Morelia spilotes variegata* averages 2.4 meters in length. It
+ is a medium-sized python with black-to-gray patterns of
+ blotches, crossbands, stripes, or a combination of these
+ markings on a light yellowish-to-dark brown background. Snakes
+ are reptiles.
+
+Visitors to your Zoo want to be able to search for information on
+the Zoo's animals. Eager herpetologists want to know if you have
+their favorite snake, so you should provide them with the ability
+to search for certain words and show all the documents that
+contain those words. Searching is one of the most useful and
+common web activities.
+
+The *AnimalCatalog* ZCatalog you created can catalog all of the
+documents in your Zope site and let your users search for specific
+words. To catalog your documents, go to the *AnimalCatalog*
+ZCatalog and click on the *Find Objects* tab.
+
+In this view, you tell the ZCatalog what kind of objects you are
+interested in. You want to catalog all DTML Documents so select
+*DTML Document* from the *Find objects of type* multiple selection
+and click *Find and Catalog*.
+
+The ZCatalog will now start from the folder where it is located
+and search for all DTML Documents. It will search the folder and
+then descend down into all of the sub-folders and their
+sub-folders. For example, if your ZCatalog is located at
+'/Zoo/AnimalCatalog', then the '/Zoo' folder and all its
+subfolders will get searched.
+
+If you have lots and lots of objects, this may take a long time
+to complete, so be patient.
+
+After a period of time, the ZCatalog will take you to the *Catalog*
+view automatically, with a status message telling you what it just
+did.
+
+Below the status information is a list of objects that are
+cataloged, they are all DTML Documents. To confirm that these are
+the objects you are interested in, you can click on them to visit
+them. Viewing an object in the catalog shows you what was indexed
+for the object, and what metadata items are stored for it.
+
+You have completed the first step of searching your objects,
+cataloging them into a ZCatalog. Now your documents are in the
+ZCatalog's database. Now you can move onto the fourth step,
+creating a web page and result form to query the ZCatalog.
+
+Search and Report Forms
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To create search and report forms, make sure you are inside the
+*AnimalCatalog* ZCatalog and select *Z Search Interface* from the
+add list. Select the *AnimalCatalog* ZCatalog as the searchable
+object, as shown in the figure below.
+
+.. figure:: ../Figures/creatingsearchinterface.png
+
+ Creating a search form for a ZCatalog
+
+Name the *Report Id* "SearchResults", the *Search Input Id*
+"SearchForm", select "Generate Page Templates" and click *Add*.
+This will create two new Page Templates in the *AnimalCatalog*
+ZCatalog named *SeachForm* and *SearchResults*.
+
+These objects are *contained in* the ZCatalog, but they are not
+*cataloged by* the ZCatalog. The *AnimalCatalog* has only
+cataloged DTML Documents. The search Form and Report templates
+are just a user interface to search the animal documents in the
+ZCatalog. You can verify this by noting that the search and
+report forms are not listed in the *Cataloged Objects* tab.
+
+To search the *AnimalCatalog* ZCatalog, select the *SearchForm*
+template and click on its *Test* tab.
+
+By typing words into the *ZooTextIdx* form element you can
+search all of the documents cataloged by the *AnimalCatalog*
+ZCatalog. For example, type in the word "Reptiles". The
+*AnimalCatalog* ZCatalog will be searched and return a simple
+table of objects that have the word "Reptiles" in them. The
+search results should include the carpet python. You can also
+try specifying multiple search terms like "reptiles OR
+amphibians". Search results for this query should include both
+the Chilean four-eyed Frog and the carpet python.
+Congratulations, you have successfully created a ZCatalog,
+cataloged content into it and searched it through the web.
+
+Configuring ZCatalogs
+---------------------
+
+The ZCatalog is capable of much more powerful and complex searches
+than the one you just performed. Let's take a look at how the
+ZCatalog stores information. This will help you tailor your
+ZCatalogs to provide the sort of searching you want.
+
+Defining Indexes
+~~~~~~~~~~~~~~~~
+
+ZCatalogs store information about objects and their contents in
+fast databases called *indexes*. Indexes can store and retrieve
+large volumes of information very quickly. You can create
+different kinds of indexes that remember different kinds of
+information about your objects. For example, you could have one
+index that remembers the text content of DTML Documents, and
+another index that remembers any objects that have a specific
+property.
+
+When you search a ZCatalog you are not searching through your
+objects one by one. That would take far too much time if you had
+a lot of objects. Before you search a ZCatalog, it looks at
+your objects and remembers whatever you tell it to remember
+about them. This process is called *indexing*. From then on,
+you can search for certain criteria and the ZCatalog will return
+objects that match the criteria you provide.
+
+A good way to think of an index in a ZCatalog is just like an
+index in a book. For example, in a book's index you can look up
+the word *Python*::
+
+ Python: 23, 67, 227
+
+The word *Python* appears on three pages. Zope indexes work
+like this except that they map the search term, in this case the
+word *Python*, to a list of all the objects that contain it,
+instead of a list of pages in a book.
+
+In Zope 2.6, indexes can be added and removed from a ZCatalog
+using the "pluggable" index interface as shown in the figure below:
+
+.. figure:: ../Figures/managingindexes.png
+
+ Managing indexes
+
+Each index has a name, like *PrincipiaSearchSource*,
+and a type, like *ZCTextIndex*.
+
+When you catalog an object the ZCatalog uses each index to
+examine the object. The ZCatalog consults attributes and methods
+to find an object's value for each index. For example, in the
+case of the DTML Documents cataloged with a
+'PrincipiaSearchSource' index, the ZCatalog calls each document's
+'PrincipiaSearchSource' method and records the results in its
+'PrincipiaSearchSource' index. If the ZCatalog cannot find an
+attribute or method for an index, then it ignores it. In other
+words it's fine if an object does not support a given
+index. There are eight kinds of indexes that come standard with
+Zope 2.6, and others that can be added. The standard eight are:
+
+ZCTextIndex
+ Searches text. Use this kind of index when you
+ want a full-text search.
+
+FieldIndex
+ Searches objects for specific values. Use this
+ kind of index when you want to search objects, numbers, or
+ specific strings.
+
+KeywordIndex
+ Searches collections of specific values. This
+ index is like a FieldIndex, but it allows you to search
+ collections rather than single values.
+
+PathIndex
+ Searches for all objects that contain certain URL
+ path elements. For example, you could search for all the
+ objects whose paths begin with '/Zoo/Animals'.
+
+TopicIndex
+ Searches among FilteredSets; each set contains
+ the document IDs of documents which match the set's filter
+ expression. Use this kind of index to optimize
+ frequently-accessed searches.
+
+DateIndex
+ A subclass of FieldIndex, optimized for date-time
+ values. Use this index for any field known to be a date or a
+ date-time.
+
+DateRangeIndex
+ Searches objects based on a pair of dates /
+ date-times. Use this index to search for objects which are
+ "current" or "in effect" at a given time.
+
+TextIndex
+ Old version of a full-text index. Only provided
+ for backward compatibility, use ZCTextIndex instead.
+
+We'll examine these different indexes more closely later in the
+chapter. New indexes can be created from the *Indexes* view of a
+ZCatalog. There, you can enter the *name* and select a *type*
+for your new index. This creates a new empty index in the
+ZCatalog. To populate this index with information, you need to
+go to the *Advanced* view and click the the *Update Catalog*
+button. Recataloging your content may take a while if you have
+lots of cataloged objects. For a ZCTextIndex, you will also
+need a *ZCTextIndex Lexicon* object in your ZCatalog - see below
+for details.
+
+To remove an index from a ZCatalog, select the Indexes and click
+on the *Delete* button. This will delete the index and all of
+its indexed content. As usual, this operation is undoable.
+
+Defining Meta Data
+~~~~~~~~~~~~~~~~~~
+
+The ZCatalog can not only index information about your object,
+but it can also store information about your object in a
+*tabular database* called the *Metadata Table*. The *Metadata
+Table* works similarly to a relational database table, it
+consists of one or more *columns* that define the *schema* of
+the table. The table is filled with *rows* of information about
+cataloged objects. These rows can contain information about
+cataloged objects that you want to store in the table. Your meta
+data columns don't need to match your ZCatalog's indexes. Indexes
+allow you to search; meta-data allows you to report search
+results.
+
+The Metadata Table is useful for generating search reports. It
+keeps track of information about objects that goes on your
+report forms. For example, if you create a Metadata Table
+column called *Title*, then your report forms can use this
+information to show the titles of your objects that are returned
+in search results instead of requiring that you actually obtain
+the object to show its title.
+
+To add a new Metadata Table column, type in the name of the column
+on the *Metadata Table* view and click *Add*. To remove a column
+from the Metadata Table, select the column check box and click on
+the *Delete* button. This will delete the column and all of its
+content for each row. As usual, this operation is undoable. Next
+let's look more closely at how to search a ZCatalog.
+
+While metadata columns are useful, there are performance tradeoffs
+from using too many. As more metadata columns are added, the
+catalog itself becomes larger (and slower), and getting the
+result objects becomes more memory- and performance-intensive.
+Therefore, you should choose metadata columns only for those
+fields that you'll want to show on common search results.
+Consider carefully before adding a field that returns a large
+result (like the full text of a document) to metadata.
+
+Searching ZCatalogs
+-------------------
+
+You can search a ZCatalog by passing it search terms. These search
+terms describe what you are looking for in one or more indexes. The
+ZCatalog can glean this information from the web request, or you
+can pass this information explicitly from DTML or Python. In
+response to a search request, a ZCatalog will return a list of
+records corresponding to the cataloged objects that match the
+search terms.
+
+Searching with Forms
+~~~~~~~~~~~~~~~~~~~~
+
+In this chapter you used the *Z Search Interface* to
+automatically build a Form/Action pair to query a ZCatalog (the
+Form/Action pattern is discussed in the chapter entitled
+`Advanced Page Templates <AdvZPT.html>`_ ). The *Z Search
+Interface* builds a very simple form and a very simple
+report. These two methods are a good place to start
+understanding how ZCatalogs are queried and how you can
+customize and extend your search interface.
+
+Suppose you have a ZCatalog that holds news items named
+'NewsCatalog'. Each news item has 'content', an 'author' and a
+'date' attribute. Your ZCatalog has three indexes that
+correspond to these attributes, namely "contentTextIdx",
+"author" and "date". The contents index is a ZCTextIndex, and
+the author and date indexes are a FieldIndex and a DateIndex.
+For the ZCTextIndex you will need a ZCTextIndexLexicon, and to
+display the search results in the 'Report' template, you should
+add the 'author', 'date' and 'absolute_url' attributes as
+Metadata. Here is a search form that would allow you to query
+such a ZCatalog::
+
+ <html><body>
+ <form action="Report" method="get">
+ <h2 tal:content="template/title_or_id">Title</h2>
+ Enter query parameters:<br><table>
+ <tr><th>Author</th>
+ <td><input name="author" width=30 value=""></td></tr>
+ <tr><th>Content</th>
+ <td><input name="contentTextIdx" width=30 value=""></td></tr>
+ <tr><th>Date</th>
+ <td><input name="date" width=30 value=""></td></tr>
+ <tr><td colspan=2 align=center>
+ <input type="SUBMIT" name="SUBMIT" value="Submit Query">
+ </td></tr>
+ </table>
+ </form>
+ </body></html>
+
+This form consists of three input boxes named 'contentTextIdx',
+'author', and 'date'. These names must match the names of the
+ZCatalog's indexes for the ZCatalog to find the search terms.
+Here is a report form that works with the search form::
+
+ <html>
+ <body tal:define="searchResults here/NewsCatalog;">
+ <table border>
+ <tr>
+ <th>Item no.</th>
+ <th>Author</th>
+ <th>Absolute url</th>
+ <th>Date</th>
+ </tr>
+ <div tal:repeat="item searchResults">
+ <tr>
+ <td>
+ <a href="link to object" tal:attributes="href item/absolute_url">
+ #<span tal:replace="repeat/item/number">
+ search item number goes here
+ </span>
+ </a>
+ </td>
+ <td><span tal:replace="item/author">author goes here</span></td>
+ <td><span tal:replace="item/date">date goes here</span></td>
+ </tr>
+ </div>
+ </table>
+ </body></html>
+
+There are a few things going on here which merit closer
+examination. The heart of the whole thing is in the definition
+of the 'searchResults' variable::
+
+ <body tal:define="searchResults here/NewsCatalog;">
+
+This calls the 'NewsCatalog' ZCatalog. Notice how the form
+parameters from the search form ( 'contentTextIdx' ,
+'author', 'date' ) are not mentioned here at all.
+Zope automatically makes sure that the query parameters from the
+search form are given to the ZCatalog. All you have to do is
+make sure the report form calls the ZCatalog. Zope locates the
+search terms in the web request and passes them to the ZCatalog.
+
+The ZCatalog returns a sequence of *Record Objects* (just like
+ZSQL Methods). These record objects correspond to *search
+hits*, which are objects that match the search criteria you
+typed in. For a record to match a search, it must match all
+criteria for each specified index. So if you enter an author and
+some search terms for the contents, the ZCatalog will only return
+records that match both the author and the contents.
+
+ZSQL Record objects have an attribute for every column in the
+database table. Record objects for ZCatalogs work very
+similarly, except that a ZCatalog Record object has an attribute
+for every column in the Metadata Table. In fact, the purpose of
+the Metadata Table is to define the schema for the Record
+objects that ZCatalog queries return.
+
+Searching from Python
+~~~~~~~~~~~~~~~~~~~~~
+
+Page Templates make querying a ZCatalog from a form very simple.
+For the most part, Page Templates will automatically make sure
+your search parameters are passed properly to the ZCatalog.
+
+Sometimes though you may not want to search a ZCatalog from a web
+form; some other part of your application may want to query a
+ZCatalog. For example, suppose you want to add a sidebar to the
+Zope Zoo that shows news items that only relate to the animals
+in the section of the site that you are currently looking at.
+As you've seen, the Zope Zoo site is built up from Folders that
+organize all the sections according to animal. Each Folder's id
+is a name that specifies the group or animal the folder
+contains. Suppose you want your sidebar to show you all the
+news items that contain the id of the current section. Here is
+a Script called 'relevantSectionNews' that queries the news
+ZCatalog with the currentfolder's id::
+
+ ## Script (Python) "relevantSectionNews"
+ ##
+ """ Returns news relevant to the current folder's id """
+ id=context.getId()
+ return context.NewsCatalog({'contentTextIdx' : id})
+
+This script queries the 'NewsCatalog' by calling it like a
+method. ZCatalogs expect a *mapping* as the first argument when
+they are called. The argument maps the name of an index to the
+search terms you are looking for. In this case, the
+'contentTextIdx' index will be queried for all news items that
+contain the name of the current Folder. To use this in your
+sidebar place you could insert this snippet where appropriate in
+the main ZopeZoo Page Template::
+
+ ...
+ <ul>
+ <li tal:repeat="item here/relevantSectionNews">
+ <a href="news link" tal:attributes="href item/absolute_url">
+ <span tal:replace="item/title">news title</span>
+ </a>
+ </li>
+ </ul>
+ ...
+
+This template assumes that you have defined 'absolute_url' and
+'title' as Metadata columns in the 'NewsCatalog'. Now, when you
+are in a particular section, the sidebar will show a simple list
+of links to news items that contain the id of the current animal
+section you are viewing. (Note: in reality, you shouldn't use
+an index called 'absolute_url', but should rely instead on the
+getURL() method call below, as that works even in virtual hosting
+settings.
+
+Methods of Search Results
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The list of results you get for a catalog search is actually
+a list of Catalog Brain objects. In addition to having an
+attribute for each item of your metadata, they also have
+several useful methods:
+
+has_key(key)
+ Returns true if the result object has a meta-data element
+ named key.
+
+getPath()
+ Returns the physical path of the result object. This can be
+ used to uniquely identify each object if some kind of
+ post-processing is performed.
+
+getURL()
+ Returns the URL of the result object. You should use this
+ instead of creating a metadata element for 'absolute_url',
+ This can differ from getPath() if you are using virtual hosting.
+
+getObject()
+ Returns the actual zope object from the result object. This
+ is useful if you want to examine or show an attribute or
+ method of the object that isn't in the metadata--once we have
+ the actual object, we can get any normal attribute or method
+ of it. However, be careful not to use this instead of defining
+ metadata. Metadata, being stored in the catalog, is
+ pre-calculated and quickly accessed; getting the same type of
+ information by using 'getObject().attribute_name' requires
+ actually pulling your real object from the ZODB and may be
+ a good deal slower. On the other hand, stuffing everything
+ you might ever need into metadata will slow down all querying
+ of your catalog, so you'll want to strike a balance. A good
+ idea is to list in metadata those things that would normally
+ appear on a tabular search results form; other things that
+ might be needed less commonly (and for fewer result objects
+ at a time) can be retried with getObject.
+
+getRID()
+ Returns the Catalog's record id for the result object. This
+ is an implementation detail, and is not useful except for
+ advanced uses.
+
+Searching and Indexing Details
+------------------------------
+
+Earlier you saw that the ZCatalog includes eight types of
+indexes. Let's examine these indexes more closely, and look
+at some of the additional available indexes, to understand
+what they are good for and how to search them.
+
+Searching ZCTextIndexes
+~~~~~~~~~~~~~~~~~~~~~~~
+
+A ZCTextIndex is used to index text. After indexing, you can
+search the index for objects that contain certain words.
+ZCTextIndexes support a rich search grammar for doing more
+advanced searches than just looking for a word.
+
+Boolean expressions
+%%%%%%%%%%%%%%%%%%%
+
+ Search for Boolean expressions
+ like::
+
+ word1 AND word2
+
+ This will search for all objects that contain *both* "word1"
+ and "word2". Valid Boolean operators include AND, OR, and
+ NOT. A synonym for NOT is a leading hyphen::
+
+ word1 -word2
+
+ which would search for occurences of "word1" but would
+ exclude documents which contain "word2". A sequence of words
+ without operators implies AND. A search for "carpet python
+ snakes" translates to "carpet AND python AND snakes".
+
+Parentheses
+%%%%%%%%%%%
+
+ Control search order with parenthetical
+ expressions::
+
+ (word1 AND word2) OR word3)
+
+ This will return objects containing "word1" and "word2" *or*
+ just objects that contain the term "word3".
+
+Wild cards
+%%%%%%%%%%
+
+ Search for wild cards
+ like::
+
+ Z*
+
+ which returns all words that begin with "Z",
+ or::
+
+ Zop?
+
+ which returns all words that begin with "Zop" and have one
+ more character - just like in a Un*x shell. Note though that
+ wild cards cannot be at the beginning of a search phrase.
+ "?ope" is an illegal search term and will be ignored.
+
+Phrase search
+%%%%%%%%%%%%%
+
+ Double-quoted text implies phrase search,
+ for example::
+
+ "carpet python" OR frogs
+
+ will search for all occurences of the phrase "carpet python"
+ or of the word "frogs"
+
+All of these advanced features can be mixed together. For
+example::
+
+ ((bob AND uncle) AND NOT Zoo*)
+
+will return all objects that contain the terms "bob" and "uncle"
+but will not include any objects that contain words that start
+with "Zoo" like "Zoologist", "Zoology", or "Zoo" itself.
+
+Similarly, a search
+for::
+
+ snakes OR frogs -"carpet python"
+
+will return all objects which contain the word "snakes" or
+"frogs" but do not contain the phrase "carpet python".
+
+Querying a ZCTextIndex with these advanced features works just
+like querying it with the original simple features. In the HTML
+search form for DTML Documents, for example, you could enter
+"Koala AND Lion" and get all documents about Koalas and Lions.
+Querying a ZCTextIndex from Python with advanced features works
+much the same; suppose you want to change your
+'relevantSectionNews' Script to not include any news items that
+contain the word "catastrophic"::
+
+ ## Script (Python) "relevantSectionNews"
+ ##
+ """ Returns relevant, non-catastropic news """"
+ id=context.getId()
+ return context.NewsCatalog(
+ {'contentTextIdx' : id + ' -catastrophic'}
+ )
+
+ % guillaume_benoit - Mar. 4, 2004 8:40 am:
+ Little error:
+ """ Returns relevant, non-catastropic news """"
+ ... should be ...
+ """ Returns relevant, non-catastropic news """
+
+ZCTextIndexes are very powerful. When mixed with the Automatic
+Cataloging pattern described later in the chapter, they give you
+the ability to automatically full-text search all of your
+objects as you create and edit them.
+
+In addition, below, we'll talk about TextIndexNG indexes, which
+are a competing index type that can be added to Zope, and offers
+even more additional features for full-text indexing.
+
+Lexicons
+~~~~~~~~
+
+Lexicons are used by ZCTextIndexes. Lexicons process and store
+the words from the text and help in processing queries.
+
+Lexicons can:
+
+Normalize Case
+ Often you want search terms to be case insensitive, eg. a search for
+ "python", "Python" and "pYTHON" should return the same results. The
+ lexicons' *Case Normalizer* does exactly that.
+
+Remove stop words
+ Stop words are words that are very common in a given language and should
+ be removed from the index. They would only cause bloat in the index and
+ add little information. In addition, stop words, being common words,
+ would appear in almost every page, without this option turned on, a user
+ searching for "the python house" would get back practically every single
+ document on the site (since they would all likely contain "the"), taking
+ longer and adding no quality to their results.
+
+Split text into words
+ A splitter parses text into words. Different texts have different needs
+ of word splitting - if you are going to process HTML documents, you might
+ want to use the HTML aware splitter which effectively removes HTML tags.
+ On the other hand, if you are going to index plain text documents *about*
+ HTML, you don't want to remove HTML tags - people might want to look them
+ up. Also, an eg. chinese language document has a different concept of
+ words and you might want to use a different splitter.
+
+The Lexicon uses a pipeline architecture. This makes it possible
+to mix and match pipeline components. For instance, you could
+implement a different splitting strategy for your language and
+use this pipeline element in conjunction with the standard text
+processing elements. Implementing a pipeline element is out of
+the scope of this book; for examples of implementing and
+registering a pipeline element see
+eg. 'lib/python/Products/ZCTextIndex/Lexicon.py'. A pipeline
+element should conform to the 'IPipelineElement' interface.
+
+To create a ZCTextIndex, you first have to create a Lexicon
+object. Multiple ZCTextIndexes can share the same lexicon.
+
+Searching Field Indexes
+~~~~~~~~~~~~~~~~~~~~~~~
+
+*FieldIndexes* have a different aims than ZCTextIndexes. A ZCTextIndex
+will treat the value it finds in your object, for example the
+contents of a News Item, like text. This means that it breaks
+the text up into words and indexes all the individual words.
+
+A FieldIndex does not break up the value it finds. Instead, it
+indexes the entire value it finds. This is very useful for
+tracking object attributes that contain simple values, such as
+numbers or short string identifiers.
+
+In the news item example, you created a FieldIndex
+'author'. With the existing search form, this field is
+not very useful. Unless you know exactly the name of the author
+you are looking for, you will not get any results. It would be
+better to be able to select from a list of all the *unique*
+authors indexed by the author index.
+
+There is a special method on the ZCatalog that does exactly this
+called 'uniqueValuesFor'. The 'uniqueValuesFor' method returns
+a list of unique values for a certain index. Let's change your
+search form and replace the original 'author' input box
+with something a little more useful::
+
+ <html><body>
+ <form action="Report" method="get">
+ <h2 tal:content="template/title_or_id">Title</h2>
+ Enter query parameters:<br><table>
+ <tr><th>Author</th>
+ <td>
+ <select name="author:list" size="6" multiple>
+ <option
+ tal:repeat="item python:here.NewsCatalog.uniqueValuesFor('author')"
+ tal:content="item"
+ value="opt value">
+ </option>
+ </select>
+ </td></tr>
+ <tr><th>Content</th>
+ <td><input name="content_index" width=30 value=""></td></tr>
+ <tr><th>Date</th>
+ <td><input name="date_index" width=30 value=""></td></tr>
+ <tr><td colspan=2 align=center>
+ <input type="SUBMIT" name="SUBMIT" value="Submit Query">
+ </td></tr>
+ </table>
+ </form>
+ </body></html>
+
+The new, important bit of code added to the search form
+is::
+
+ <select name="author:list" size="6" multiple>
+ <option
+ tal:repeat="item python:here.NewsCatalog.uniqueValuesFor('author')"
+ tal:content="item"
+ value="opt value">
+ </option>
+ </select>
+
+In this example, you are changing the form element 'author' from
+just a simple text box to an HTML multiple select box. This box
+contains a unique list of all the authors that are indexed in
+the 'author' FieldIndex. When the form gets submitted, the
+select box will contain the exact value of an authors name, and
+thus match against one or more of the news objects. Your search
+form should look now like the figure below.
+
+.. figure:: ../Figures/uniqueauthorsform.png
+
+ Range searching and unique Authors
+
+Be careful if you catalog objects with many different values; you
+can easily end up with a form with a thousand items in the drop-down
+menu. Also, items must match *exactly*, so strings that differ
+in capitalization will be considered different.
+
+That's it. You can continue to extend this search form using HTML
+form elements to be as complex as you'd like. In the next section,
+we'll show you how to use the next kind of index, keyword indexes.
+
+Searching Keyword Indexes
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A *KeywordIndex* indexes a sequence of keywords for objects and
+can be queried for any objects that have one or more of those
+keywords.
+
+Suppose that you have a number of Image objects that have a
+'keywords' property. The 'keywords' property is a lines property
+that lists the relevant keywords for a given Image, for example,
+"Portraits", "19th Century", and "Women" for a picture of Queen
+Victoria.
+
+The keywords provide a way of categorizing Images. Each Image can
+belong in one or more categories depending on its 'keywords'
+property. For example, the portrait of Queen Victoria belongs to
+three categories and can thus be found by searching for any of the
+three terms.
+
+You can use a *Keyword* index to search the 'keywords' property. Define
+a *Keyword* index with the name 'keywords' on your ZCatalog. Then
+catalog your Images. Now you should be able to find all the Images
+that are portraits by creating a search form and searching for
+"Portraits" in the 'keywords' field. You can also find all pictures
+that represent 19th Century subjects by searching for "19th
+Century".
+
+It's important to realize that the same Image can be in more
+than one category. This gives you much more flexibility in
+searching and categorizing your objects than you get with a
+FieldIndex. Using a FieldIndex your portrait of Queen Victoria
+can only be categorized one way. Using a KeywordIndex it can be
+categorized a couple different ways.
+
+Often you will use a small list of terms with KeywordIndexes.
+In this case you may want to use the 'uniqueValuesFor' method to
+create a custom search form. For example here's a snippet of a
+Page Template that will create a multiple select box for all the
+values in the 'keywords' index::
+
+ <select name="keywords:list" multiple>
+ <option
+ tal:repeat="item python:here.uniqueValuesFor('keywords')"
+ tal:content="item">
+ opt value goes here
+ </option>
+ </select>
+
+Using this search form you can provide users with a range of
+valid search terms. You can select as many keywords as you want and
+Zope will find all the Images that match one or more of your
+selected keywords. Not only can each object have several indexed
+terms, but you can provide several search terms and find all
+objects that have one or more of those values.
+
+Searching Path Indexes
+~~~~~~~~~~~~~~~~~~~~~~
+
+Path indexes allow you to search for objects based on their
+location in Zope. Suppose you have an object whose path is
+'/zoo/animals/Africa/tiger.doc'. You can find this object with
+the path queries: '/zoo', or '/zoo/animals', or
+'/zoo/animals/Africa'. In other words, a path index allows you
+to find objects within a given folder (and below).
+
+If you place related objects within the same folders, you can
+use path indexes to quickly locate these objects. For example::
+
+ <h2>Lizard Pictures</h2>
+ <p tal:repeat="item
+ python:here.AnimalCatalog(pathindex='/Zoo/Lizards',
+ meta_type='Image')">
+ <a href="url" tal:attributes="href item/getURL" tal:content="item/title">
+ document title
+ </a>
+ </p>
+
+This query searches a ZCatalog for all images that are located
+within the '/Zoo/Lizards' folder and below. It creates a link to
+each image. To make this work, you will have to create a
+FieldIndex 'meta_type' and a Metadata entries for 'title'.
+
+Depending on how you choose to arrange objects in your site, you
+may find that a path indexes are more or less effective. If you
+locate objects without regard to their subject (for example, if
+objects are mostly located in user "home" folders) then path
+indexes may be of limited value. In these cases, key word and
+field indexes will be more useful.
+
+Searching DateIndexes
+~~~~~~~~~~~~~~~~~~~~~
+
+DateIndexes work like FieldIndexes, but are optimised for
+DateTime values. To minimize resource usage, DateIndexes have a
+resolution of one minute, which is considerably lower than the
+resolution of DateTime values.
+
+DateIndexes are used just like FieldIndexes; below in the
+section on "Advanced Searching with Records" we present an
+example of searching them.
+
+Searching DateRangeIndexes
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+DateRangeIndexes are specialised for searching for ranges of
+DateTime values. An example application would be NewsItems
+which have two DateTime attributes 'effective' and 'expiration',
+and which should only be published if the current date would
+fall somewhere in between these two date values. Like
+DateIndexes, DateRangeIndexes have a resolution of one minute.
+
+DateRangeIndexes are widely used in CMF and Plone, where
+content is compared to an effective date and an expiration
+date.
+
+DateRangeIndexes also allow one or both of the boundary dates of
+the indexed objects to be left open which greatly simplifies
+application logic when querying for "active" content where expiration
+and effective dates are optional.
+
+Searching TopicIndexes
+~~~~~~~~~~~~~~~~~~~~~~
+
+A TopicIndex is a container for so-called FilteredSets. A
+FilteredSet consists of an expression and a set of internal
+ZCatalog document identifiers that represent a pre-calculated
+result list for performance reasons. Instead of executing the
+same query on a ZCatalog multiple times it is much faster to use
+a TopicIndex instead.
+
+TopicIndexes are also useful for indexing boolean attributes or
+attributes where only one value is queried for. They can do this more
+efficiently then a field index.
+
+Building up FilteredSets happens on the fly when objects are
+cataloged and uncatalogued. Every indexed object is evaluated
+against the expressions of every FilteredSet. An object is added
+to a FilteredSet if the expression with the object evaluates to
+True. Uncatalogued objects are removed from the FilteredSet.
+
+A built-in type of FilteredSet is the PythonFilteredSet - it
+would be possible to construct custom types though.
+
+A PythonFilteredSet evaluates using the eval() function inside the
+context of the FilteredSet class. The object to be indexes must
+be referenced inside the expression using "o.". Below are some
+examples of expressions.
+
+This would index all DTML
+Methods::
+
+ o.meta_type=='DTML Method'
+
+This would index all folderish objects which have a non-empty
+title::
+
+ o.isPrincipiaFolderish and o.title
+
+Querying of TopicIndexes is done much in the same way as with
+other Indexes. Eg., if we named the last FilteredSet above
+'folders_with_titles', we could query our TopicIndex with a
+Python snippet like::
+
+ zcat = context.AnimalCatalog
+ results = zcat(topicindex='folders_with_titles')
+
+Provided our 'AnimalCatalog' contains a TopicIndex 'topicindex',
+this would return all folderish objects in 'AnimalCatalog' which
+had a non-empty title.
+
+TopicIndexes also support the 'operator' parameter with Records.
+More on Records below.
+
+Advanced Searching with Records
+-------------------------------
+
+A more advanced feature is the ability to query indexes more
+precisely using record objects. Record objects contain
+information about how to query an index. Records are Python
+objects with attributes, or mappings. Different indexes support
+different record attributes.
+
+Note that you don't have to use record-style queries unless you
+need the features introduced by them: you can continue to use
+traditional queries, as demonstrated above.
+
+A record style query involves passing a record (or dictionary)
+to the catalog instead of a simple query string.
+
+Keyword Index Record Attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+'query'
+ Either a sequence of words or a single word.
+ (mandatory)
+
+'operator'
+ Specifies whether all keywords or only one need
+ to match. Allowed values: 'and', 'or'. (optional, default:
+ 'or')
+
+For example::
+
+ # big or shiny
+ results=ZCatalog(categories=['big, 'shiny'])
+
+ # big and shiny
+ results=ZCatalog(categories={'query':['big','shiny'],
+ 'operator':'and'})
+
+The second query matches objects that have both the keywords
+"big" and "shiny". Without using the record syntax you can
+only match objects that are big or shiny.
+
+FieldIndex Record Attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+'query'
+ Either a sequence of objects or a single value to be
+ passed as query to the index (mandatory)
+
+'range'
+ Defines a range search on a Field Index (optional, default: not set).
+
+ Allowed values:
+
+ 'min'
+ Searches for all objects with values larger than
+ the minimum of the values passed in the 'query' parameter.
+
+ 'max'
+ Searches for all objects with values smaller than
+ the maximum of the values passed in the 'query' parameter.
+
+ 'minmax'
+ Searches for all objects with values smaller than the maximum of the
+ values passed in the 'query' parameter and larger than the minimum of
+ the values passwd in the 'query' parameter.
+
+For example, here is a PythonScript snippet using a range
+search::
+
+ # animals with population count greater than 5
+ zcat = context.AnimalCatalog
+ results=zcat(population_count={
+ 'query' : 5,
+ 'range': 'min'}
+ )
+
+This query matches all objects in the AnimalCatalog which have a
+population count greater than 5 (provided that there is a
+FieldIndex 'population_count' and an attribute 'population_count'
+present).
+
+Or::
+
+ # animals with population count between 5 and 10
+ zcat = context.AnimalCatalog
+ results=zcat(population_count={
+ 'query': [ 5, 10 ],
+ 'range': 'minmax'}
+ )
+
+This query mathches all animals with population count
+between 5 and 10 (provided that the same FieldIndex
+'population_count' indexing the attribute 'population_count'.)
+
+Path Index Record Attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+'query'
+ Path to search for either as a string (e.g. "/Zoo/Birds") or list (e.g.
+ ["Zoo", "Birds"]). (mandatory)
+
+'level'
+ The path level to begin searching at. Level defaults to 0, which means
+ searching from the root. A level of -1 means start from anywhere in the
+ path.
+
+Suppose you have a collection of objects with these paths:
+
+- '/aa/bb/aa'
+
+- '/aa/bb/bb'
+
+- '/aa/bb/cc'
+
+- '/bb/bb/aa'
+
+- '/bb/bb/bb'
+
+- '/bb/bb/cc'
+
+- '/cc/bb/aa'
+
+- '/cc/bb/bb'
+
+- '/cc/bb/cc'
+
+Here are some examples queries and their results to show how the
+'level' attribute works:
+
+'query="/aa/bb", level=0'
+ This gives the same behaviour as our previous examples, ie. searching
+ absolute from the root, and results in:
+
+ - '/aa/bb/aa'
+
+ - '/aa/bb/bb'
+
+ - '/aa/bb/cc'
+
+'query="/bb/bb", level=0'
+ Again, this returns the default:
+
+ - '/bb/bb/aa'
+
+ - '/bb/bb/bb'
+
+ - '/bb/bb/cc'
+
+'query="/bb/bb", level=1'
+ This searches for all objects which have '/bb/bb' one level down from
+ the root:
+
+ - '/aa/bb/bb'
+
+ - '/bb/bb/bb'
+
+ - '/cc/bb/bb'
+
+'query="/bb/bb", level=-1'
+ Gives all objects which have '/bb/bb' anywhere in their path:
+
+ - '/aa/bb/bb'
+
+ - '/bb/bb/aa'
+
+ - '/bb/bb/bb'
+
+ - '/bb/bb/cc'
+
+ - '/cc/bb/bb'
+
+'query="/xx", level=-1'
+ Returns None
+
+You can use the level attribute to flexibly search different
+parts of the path.
+
+As of Zope 2.4.1, you can also include level information in a
+search without using a record. Simply use a tuple containing the
+query and the level. Here's an example tuple: '("/aa/bb", 1)'.
+
+DateIndex Record Attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The supported Record Attributes are the same as those of the
+FieldIndex:
+
+'query'
+ Either a sequence of objects or a single value to be
+ passed as query to the index (mandatory)
+
+'range'
+ Defines a range search on a DateIndex (optional,
+ default: not set).
+
+ Allowed values:
+
+ 'min'
+ Searches for all objects with values larger than
+ the minimum of the values passed in the 'query' parameter.
+
+ 'max'
+ Searches for all objects with values smaller than
+ the maximum of the values passed in the 'query' parameter.
+
+ 'minmax'
+ Searches for all objects with values smaller
+ than the maximum of the values passed in the 'query'
+ parameter and larger than the minimum of the values passwd
+ in the 'query' parameter.
+
+As an example, we go back to the NewsItems we created in the
+Section *Searching with Forms*. For this example, we created
+news items with attributes 'content', 'author', and 'date'.
+Additionally, we created a search form and a report template for
+viewing search results.
+
+Searching for dates of NewsItems was not very comfortable
+though - we had to type in exact dates to match a document.
+
+With a 'range' query we are now able to search for ranges of
+dates. Take a look at this PythonScript snippet::
+
+ # return NewsItems newer than a week
+ zcat = context.NewsCatalog
+ results = zcat( date={'query' : context.ZopeTime() - 7,
+ 'range' : 'min'
+ })
+
+DateRangeIndex Record Attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+DateRangeIndexes only support the 'query' attribute on Record
+objects. The 'query' attribute results in the same
+functionality as querying directly; returning matches where
+the date supplied to the query falls between the start and
+end dates from the indexed object.
+
+TopicIndex Record Attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Like KeywordIndexes, TopicIndexes support the 'operator'
+attribute:
+
+'operator'
+ Specifies whether all FieldSets or only one need to match.
+ Allowed values: 'and', 'or'. (optional, default: 'or')
+
+ZCTextIndex Record Attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Because ZCTextIndex operators are embedded in the query string,
+there are no additional Record Attributes for ZCTextIndexes.
+
+Creating Records in HTML
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can also perform record queries using HTML forms. Here's an
+example showing how to create a search form using records::
+
+ <form action="Report" method="get">
+ <table>
+ <tr><th>Search Terms (must match all terms)</th>
+ <td><input name="content.query:record" width=30 value=""></td></tr>
+ <input type="hidden" name="content.operator:record" value="and">
+ <tr><td colspan=2 align=center>
+ <input type="SUBMIT" value="Submit Query">
+ </td></tr>
+ </table>
+ </form>
+
+For more information on creating records in HTML see the section
+"Passing Parameters to Scripts" in Chapter 14, Advanced Zope
+Scripting.
+
+Automatic Cataloging
+--------------------
+
+Automatic Cataloging is an advanced ZCatalog usage pattern that
+keeps objects up to date as they are changed. It requires that as
+objects are created, changed, and destroyed, they are
+automatically tracked by a ZCatalog. This usually involves the
+objects notifying the ZCatalog when they are created, changed, or
+deleted.
+
+This usage pattern has a number of advantages in comparison to
+mass cataloging. Mass cataloging is simple but has drawbacks. The
+total amount of content you can index in one transaction is
+equivalent to the amount of free virtual memory available to the
+Zope process, plus the amount of temporary storage the system has.
+In other words, the more content you want to index all at once,
+the better your computer hardware has to be. Mass cataloging
+works well for indexing up to a few thousand objects, but beyond
+that automatic indexing works much better.
+
+If you can trade off memory for time, you can enable
+'Subtransactions' in the 'Advanced' tab of the catalog. This
+commits the work in chunks, reducing memory requirements, but
+taking longer. It is a good solution for mass cataloging with a
+very large number of records.
+
+Another major advantage of automatic cataloging is that it can
+handle objects that change. As objects evolve and change, the
+index information is always current, even for rapidly changing
+information sources like message boards.
+
+On the other hand, cataloging a complex object when it changes
+(especially if the catalog index attempts to translate the
+information, as TextIndexNG, described below, can do with
+PDF files or Microsoft Office files). Some sites may benefit
+from mass cataloging, and having a cron job or other scheduled
+job initiate the mass cataloging every night.
+
+In standard (non-CMF, non-Plone) Zope, none of the built-in
+object types attempt to automatically catalog themselves. In
+CMF and Plone, the "contentish" object (Documents, News Item,
+Event, etc.) all use automatic cataloging to add themselves
+to the standard CMF catalog, 'portal_catalog'. The CMF
+and especially Plone offer many advantages; if you're interested
+in building a content-oriented site, you should consider
+these technologies. However, to help you understand the
+process of creating a simple, non-CMF, non-Zope object,
+we'll demonstrate another technique below.
+
+In this section, we'll show you an example that creates "news"
+items that people can add to your site. These items will get
+automatically cataloged. This example consists of two steps:
+
+- Creating a new type of object to catalog.
+
+- Creating a ZCatalog to catalog the newly created objects.
+
+As mentioned, none of the "out-of-the-box" non-CMF Zope objects
+support automatic cataloging. This is for backwards compatibility
+reasons. For now, you have to define your own kind of objects
+(or use CMF or Plone and one of the contentish types in these
+systems that automatically catalog themselves.)
+One of the ways you can create your own objects that catalog
+themselves is by defining a *ZClass*.
+
+A ZClass is a Zope object that defines new types of Zope objects.
+In a way, a ZClass is like a blueprint that describes how new Zope
+objects are built. Consider a news item as discussed in examples
+earlier in the chapter. News items not only have content, but
+they also have specific properties that make them news items.
+Often these Items come in collections that have their own
+properties. You want to build a News site that collects News
+Items, reviews them, and posts them online to a website where
+readers can read them.
+
+In this kind of system, you may want to create a new type of
+object called a *NewsItem*. This way, when you want to add a new
+*NewsItem* to your site, you just select it from the product add
+list. If you design this object to be automatically cataloged,
+then you can search your news content very powerfully. In this
+example, you will just skim a little over ZClasses, which are
+described in much more detail in Chapter 22, "Extending Zope."
+
+New types of objects are defined in the *Products* section of the
+Control Panel. This is reached by clicking on the Control Panel and
+then clicking on *Product Management*. Products contain new kinds of
+ZClasses. On this screen, click "Add" to add a New product. You will
+be taken to the Add form for new Products.
+
+Name the new Product *NewsItem* and click "Generate". This will take you
+back to the Products Management view and you will see your new Product.
+
+Select the *NewsItem* Product by clicking on it. This new Product looks a lot
+like a Folder. It contains one object called *Help* and has an Add
+menu, as well as the usual Folder "tabs" across the top. To add a new
+ZClass, pull down the Add menu and select *ZClass*. This will take you
+to the ZClass add form, as shown in the figure below.
+
+.. figure:: ../Figures/creatingzclass.png
+
+ ZClass add form
+
+This is a complicated form which will be explained in much more
+detail in Chapter 14, "Extending Zope". For now, you only need to
+do three things to create your ZClass:
+
+- Specify the Id "NewsItem" This is the name of the new ZClass.
+
+- Specify the meta_type "News Item". This will be used to create the
+ Add menu entry for your new type of object.
+
+- Select 'ZCatalog:CatalogPathAware' from the left hand *Base Classes*
+ box, and click the button with the arrow pointing to the right hand
+ *Base Classes* box. This should cause 'ZCatalog:CatalogPathAware' to
+ show up in the right hand window. Note that if you are inheriting from
+ more than one base class, 'CatalogPathAware' should be the first
+ (specifically, it should come before 'ObjectManager').
+
+When you're done, don't change any of the other settings in the Form.
+To create your new ZClass, click *Add*. This will take you back to
+your *NewsItem* Product. Notice that there is now a new object called
+*NewsItem* as well as several other objects. The *NewsItem* object is
+your new ZClass. The other objects are "helpers" that you will examine
+more in Chapter 14, "Extending Zope".
+
+Select the *NewsItem* ZClass object. Your view should now look like
+the figure below.
+
+.. figure:: ../Figures/zclassmethods.png
+
+ A ZClass Methods View
+
+This is the *Methods* View of a ZClass. Here, you can add Zope objects
+that will act as *methods on your new type of object*. Here, for
+example, you can create Page Templates or Scripts and these
+objects will become methods on any new *News Items* that are created.
+Before creating any methods however, let's review the needs of this new
+"News Item" object:
+
+News Content
+ The news Item contains news content, this is its
+ primary purpose. This content should be any kind of plain text or
+ marked up content like HTML or XML.
+
+Author Credit
+ The News Item should provide some kind of credit to
+ the author or organization that created it.
+
+Date
+ News Items are timely, so the date that the item was created
+ is important.
+
+You may want your new News Item object to have other properties, these
+are just suggestions. To add new properties to your News Item click on
+the *Property Sheets* tab. This takes you to the *Property Sheets*
+view.
+
+Properties are added to new types of objects in groups called *Property
+Sheets*. Since your object has no property sheets defined, this view
+is empty. To add a New Property Sheet, click *Add Common Instance
+Property Sheet*, and give the sheet the name "News". Now click *Add*.
+This will add a new Property Sheet called *News* to your object.
+Clicking on the new Property Sheet will take you to the *Properties*
+view of the *News* Property Sheet, as shown in the figure below.
+
+.. figure:: ../Figures/propertysheet.png
+
+ The properties screen for a Property Sheet
+
+This view is almost identical to the *Properties* view found on Folders
+and other objects. Here, you can create the properties of your News
+Item object. Create three new properties in this form:
+
+content
+ This property's type should be *text*. Each newly
+ created News Item will contain its own unique content property.
+
+author
+ This property's type should be *string*. This will
+ contain the name of the news author.
+
+date
+ This property's type should be *date*. This will contain
+ the time and date the news item was last updated. A *date* property
+ requires a value, so for now you can enter the string "01/01/2000".
+
+That's it! Now you have created a Property Sheet that describes your
+News Items and what kind of information they contain. Properties can
+be thought of as the *data* that an object contains. Now that we have
+the data all set, you need to create an *interface* to your new kind of
+objects. This is done by creating a new Form/Action pair to
+change the data and assigning it to a new *View* for your object.
+
+The Form/Action pair will give you the ability to edit the data
+defined in the propertysheet, while the View binds the form to a
+tab of the Zope Management Interface.
+
+Propertysheets come with built-in forms for editing their data;
+however we need to build our own so we can signal changes to the
+ZCatalog.
+
+First we are going to create a form to display and edit
+properties. Click on the *Methods* tab. Select "Page Template"
+from the add drop-down menu, name it 'editPropertiesForm' and fill
+it with::
+
+ <html><head>
+ <title tal:content="here/title_or_id">title</title>
+ <link rel="stylesheet" type="text/css" href="/manage_page_style.css">
+ </head>
+ <body bgcolor="#FFFFFF" link="#000099" vlink="#555555">
+ <span
+ tal:define="manage_tabs_message options/manage_tabs_message | nothing"
+ tal:replace="structure here/manage_tabs">
+ prefab management tabs
+ </span>
+ <form action="manage_editNewsProps" method="get">
+ <table>
+ <tr>
+ <th valign="top">Content</th>
+ <td>
+ <textarea
+ name="content:text" rows="6" cols="35"
+ tal:content="here/content">content text</textarea>
+ </td>
+ </tr>
+ <tr>
+ <th>Author</th>
+ <td>
+ <input name="author:string"
+ value="author string"
+ tal:attributes="value here/author">
+ </td>
+ </tr>
+ <tr>
+ <th>Date</th>
+ <td>
+ <input name="date:date"
+ value="the date"
+ tal:attributes="value here/date">
+ </td>
+ </tr>
+ <tr><td></td><td>
+ <input type="submit">
+ </td></tr>
+ </form>
+ </body>
+ </html>
+
+This is the Form part of the Form/Action pair. Note the call of
+'manage_tabs' at the top of the form - this will give your form
+the standard ZMI tabs.
+
+We will add the Action part now. Add a 'Script (Python)' object
+and fill in the id 'manage_editNewsProps' and the following code::
+
+ # first get the request
+ req = context.REQUEST
+ # change the properties in the zclass' propertysheet
+ context.propertysheets.News.manage_editProperties(req)
+ # signal the change to the zcatalog
+ context.reindex_object()
+ # now return a message
+ form = context.editPropertiesForm
+ return form(REQUEST=req,
+ manage_tabs_message="Saved changes.",
+ )
+
+Done. The next step will be to define the View. Click on the
+*Views* tab. This will take you to the *Views* view.
+
+Here, you can see that Zope has created three default Views for
+you. These views will be described in much more detail in Chapter
+14, "Extending Zope", but for now, it suffices to say that these
+views define the tabs that your objects will eventually have.
+
+To create a new view, use the form at the bottom of the Views
+view. Create a new View with the name "News" and select
+"editPropertiesForm" from the select box and click *Add*. This
+will create a new View on this screen under the original three
+Views, as shown in the figure below.
+
+.. figure:: ../Figures/zclassviews.png
+
+ The Views view
+
+We want to make our View the first view that you see when you
+select a News Item object. To change the order of the views,
+select the newly created *News* view and click the *First* button.
+This should move the new view from the bottom to the top of the
+list.
+
+The final step in creating a ZClass is defining a method for
+displaying the class. Click on the *Methods* tab, select 'Page
+Template' from the add list and add a new Page Template with the
+id "index_html". This will be the default view of your news item.
+Add the following to the new template::
+
+ <html><head>
+ <title tal:content="template/title">The title</title>
+ </head><body>
+ <h1>News Flash</h1>
+ <p tal:content="here/date">
+ date goes here
+ </p>
+ <p tal:content="here/author">
+ author goes here
+ </p>
+ <p tal:content="here/content">
+ content goes here
+ </p>
+ </body></html>
+
+Finally, we will add a new management tab for the display method.
+Once again, click on the *Views* tab, and create a *View* named
+"View", and assign the 'index_html' to it. Reorder the views so
+that the 'News' view comes first, followed by the 'View' method.
+
+That's it! You've created your own kind of object called a *News
+Item*. When you go to the root folder, you will now see a new entry in
+your add list.
+
+But don't add any new News Items yet, because the second step in
+this exercise is to create a ZCatalog that will catalog your new
+News Items. Go to the root folder and create a new ZCatalog with
+the id 'Catalog'. The ZClass finds the ZCatalog by looking for a
+catalog named 'Catalog' through acquisition, so this ZCatalog
+should be where it can be acquired by all NewsItems you plan to
+create.
+
+Like the previous two examples of using a ZCatalog, you need to
+create Indexes and a Metadata Table that make sense for your
+objects. Create the following indexes:
+
+content
+ This should be a ZCTextIndex. This will index the content
+ of your News Items.
+
+author
+ This should be a FieldIndex. This will index the author of
+ the News Item.
+
+date
+ This should be a DateIndex. This will index the date of the
+ News Item.
+
+After creating these Indexes, add these Metadata
+columns:
+
+- author
+
+- date
+
+- absolute_url
+
+After creating the Indexes and Metadata Table columns, the
+automatic cataloguing is basically working. The last step is
+creating a search interface for the ZCatalog using the Z Search
+Interface tool described previously in this chapter:
+
+Now you are ready to go. Start by adding some new News Items to your
+Zope. Go anywhere in Zope and select *News Item* from the add list.
+This will take you to the add Form for News items.
+
+Give your new News Item the id "KoalaGivesBirth" and click *Add*.
+This will create a new News Item. Select the new News Item.
+
+Notice how it has four tabs that match the five Views that were in the
+ZClass. The first View is *News*, this view corresponds to the *News*
+Property Sheet you created in the News Item ZClass.
+
+Enter your news in the *contents* box::
+
+ Today, Bob the Koala bear gave birth to little baby Jimbo.
+
+Enter your name in the *Author* box, and today's date in the *Date*
+box.
+
+Click *Change* and your News Item should now contain some news.
+Because the News Item object is *CatalogPathAware*, it is automatically
+cataloged when it is changed or added. Verify this by looking at the
+*Cataloged Objects* tab of the ZCatalog you created for this example.
+
+The News Item you added is the only object that is cataloged. As you
+add more News Items to your site, they will automatically get cataloged
+here. Add a few more items, and then experiment with searching the
+ZCatalog. For example, if you search for "Koala" you should get back
+the *KoalaGivesBirth* News Item.
+
+At this point you may want to use some of the more advanced search
+forms that you created earlier in the chapter. You can see for
+example that as you add new News Items with new authors, the
+authors select list on the search form changes to include the new
+information.
+
+Advanced Catalog Topics
+-----------------------
+
+Sorting
+~~~~~~~
+
+When you execute a ZCatalog call, your result set may or may not
+be returned in a particular order:
+
+- If your query contains no text index fields, your results will
+ not be sorted in any particular order. For example, with a
+ query based off a KeywordIndex, or query based off both
+ a KeywordIndex and a DateIndex, you will get a indeterminate
+ ordering.
+
+- For results that include a text index, your results will be
+ returned in order of revelance of the text search. That is,
+ the result set will be sorted based how often
+ search words appear in the indexes. A search for the word
+ 'frog' against a text index will give priority toward an object
+ that uses that word many times compared with
+ an object that uses that fewer. This is
+ a simplified version of the way that many web search engines
+ work: the more "relevant" your keywords are to an item, the
+ higher its ordering in the results. In particular, with
+ the ZCTextIndex, you have a choice between two algorithms
+ for how to weight the sorting:
+
+ - Okapi: is the best general choice. It does very well
+ when comparing an ordinary "human query" against a longer
+ text field. For example, querying a long description field
+ for a short query like 'indoor OR mammal' would work very
+ well.
+
+ - Cosine: is better suited for when the length of the
+ query comes close to matching the length of the field
+ itself.
+
+You, of course, may want to force a particular order onto your
+results. You can do this after you get a result set using
+normal Python syntax::
+
+ # get ordered results from search
+ zcat=context.AnimalCatalog
+ results=zcat({'title':'frog'})
+ results=[(row.title, row) for row in results]
+ results.sort()
+
+This can be, however, very inefficient.
+
+When results are returned by the ZCatalog, they are in a special
+form called a `LazyResults` set. This means that Zope hasn't
+gone to the trouble of actually creating the entire list, but
+has just sketched out the list and will fill it in at the exact
+point that you ask for each item. This is helpful, since it lets
+you query the catalog for a result set with 10,000 items without
+Zope having to really construct a 10,000 item long list of results.
+However, when we try to sort this, Zope will have to actually
+create this list since it can't rely on it's lazy, just-in-time
+method.
+
+Normally, you'll only show the first 20 or 50 or so of a result
+set, so sorting 10,000 items just to show the first 20 is a waste
+of time and memory. Instead, we can ask the catalog to do the
+sorting for us, saving both time and space.
+
+To do this, we'll pass along several additional keywords in our
+search method call or query:
+
+sort_on
+ The field name to sort the results on
+
+sort_order
+ 'ascending' or 'descending', with the default
+ being 'ascending. Note that you can also use 'reverse'
+ as a synonym for 'descending'
+
+sort_limit
+ Since you're likely to only want to use the
+ first 20 or 50 or so items, we can give a hint to the
+ ZCatalog not to bother to sort beyond this by passing along
+ a 'sort_limit' parameter, which is the number of records
+ to sort.
+
+For example, assuming we have a 'latin_name' FieldIndex on our
+animals, we can sort them by name in a PythonScript with::
+
+ zcat=context.AnimalCatalog
+ zcat({'sort_on':'latin_name'})
+
+or::
+
+ zcat=context.AnimalCatalog
+ zcat({'sort_on':'latin_name', 'sort_order':'descending'})
+
+or, if we know we'll only want to show the first 20 records::
+
+ zcat=context.AnimalCatalog
+ zcat({'sort_on':'latin_name',
+ 'sort_order':'descending',
+ 'sort_limit':20})
+
+or, combining this with a query restriction::
+
+ zcat=context.AnimalCatalog
+ zcat({'title':'frog',
+ 'sort_on':'latin_name',
+ 'sort_order':'descending',
+ 'sort_limit':20})
+
+This gives us all records with the 'title' "frog", sorted
+by 'latin_name', and doesn't bother to sort after the first
+20 records.
+
+Note that using 'sort_limit' does not guarantee that we'll get
+exactly that number of records--we may get fewer if they're
+aren't that many matching or query, and we may get more.
+'sort_limit' is merely a request for optimization. To
+ensure that we get no more than 20 records, we'll want to
+truncate our result set::
+
+ zcat=context.AnimalCatalog
+ zcat({'sort_on':'latin_name',
+ 'sort_order':'descending',
+ 'sort_limit':20})[:20]
+
+Unsortable Fields
+%%%%%%%%%%%%%%%%%
+
+In order to sort on a index, we have to actually keep the
+full attribute or method value in that index. For many
+index types, such as DateIndex or FieldIndex, this is
+normally done. However, for text indexes, such as
+ZCTextIndex, TextIndex (deprecated), and TextIndexNG
+(described below), the index doesn't keep the actual
+attribute or method results in the index. Instead, it
+cleans up the input (often removing "stop words",
+normalizing input, lowercasing it, removing duplicates,
+etc., depending on the options chosen. So a term paper
+with an attribute value of::
+
+ "A Critique of 'Tora! Tora! Tora!'"
+
+could actually be indexed as :
+
+ ( 'critique', 'tora' )
+
+once the common stop words ("a", "of") are removed,
+it is lowercased and de-deduplicated. (In reality,
+the indexed information is much richer, as it keeps
+track of things like how often words appear, and which
+words appear earlier in the the stream, but this gives
+you an idea of what is stored.)
+
+This is a neccessary and positive step to make the index
+use less storage and less memory, and increases search
+results, as your site user doesn't have to worry about
+getting incidental words ("the", "a", etc.) correct,
+nor about capitalization, etc.
+
+**Note:** As we'll see, TextIndexNG indexes can even
+do advanced tricks, such as normalizing a word and
+stemming it, so that a search for "vehicles" could
+find "vehicle" or even "car".
+
+However, this process means that the index no longer knows
+the actual value, and, therefore, can't sort on it.
+Due to this, it is not possible to use the 'sort_on'
+feature with text indexes types.
+
+To work around this, you can either sort the results of
+the query using the normal python 'sort()' feature
+(shown above), or you can create an additional non-text
+index on the field, described below, in the section
+'Indexing a Field with Two Index Types'.
+
+Similarly, the API call 'uniqueValuesFor', described above,
+cannot be used on text-type indexes, since the exact
+values are not kept.
+
+Searching in More Than One Index Using "OR"
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As mentioned, if you search in more than one index,
+you must meet your criteria for each index you search
+in, i.e., there is an implied 'AND' between each of the
+searches::
+
+ # find sunset art by Van Gogh
+ zcat=context.ArtCatalog
+ results=zcat({'keyword':'sunsets', 'artist':'Van Gogh'})
+
+This query finds all sunset art by Van Gogh: both of
+these conditions must be true.
+
+There is no way to directly search in more than one
+index without this 'AND' condition; instead, you can
+perform two catalog searches and concatenate their
+results. For example::
+
+ # find sunset art OR art by Van Gogh
+ zcat=context.ArtCatalog
+ results=zcat({'keyword':'sunsets'}) + \
+ zcat({'artist':'Van Gogh'})
+
+This method, however, does not remove duplicates, so
+a painting of a sunset by VanGogh would appear twice.
+
+For alternate strategies about searching in two places,
+see 'PrincipiaSearchSource' and 'FieldedTextIndex', below,
+both of which can be used as possible workarounds.
+
+Indexing a Field With Two Index Types
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since the different indexes act differently, it can be advantageous
+to have the same attribute indexed by more than one index. For
+example, our animals have a 'latin_name' attribute that gives their
+formal genus/species latin name. A user should be able to search
+that trying to match a name *exactly*, and we should be able to
+sort results based on that, both of which suggest a FieldIndex. In
+addition, though, users may want to search that like a text field,
+where they can match parts of words, in which case we would a
+ZCTextIndex (or TextIndexNG, described below).
+
+In a case like this, a good strategy is to create one index for the
+FieldIndex on 'latin_name'. Let's call that index 'latin_name'.
+Then, you can create a ZCTextIndex that uses a new feature: the
+ability to have the indexed attribute be different than the index
+name itself.
+
+When you create the second index, the ZCTextIndex, you can give it
+the Id 'latin_name_text', and have the 'Indexed attributes' field
+be 'latin_name'. Now, when we catalog our animals, their
+'latin_name' attribute is indexed in two ways: once, as a
+FieldIndex, that we can sort against and match exactly, and once as
+a ZCTextIndex, that we can search like a text field with full text
+search.
+
+The second index has a different name, so when make our catalog
+call, we'll need to be sure to use that name if we want to search
+it like a text field::
+
+ # search latin_name
+ zcat=context.AnimalCatalog
+ exact_results=zcat({'latin_name':'homo sapien'})
+ fuzzy=zcat({'latin_name_text':'sap*'})
+
+Note that a good strategy is to have the search be against the
+ZCTextIndex, but sort it by the FieldIndex::
+
+ # free text search, sorted
+ zcat=context.AnimalCatalog
+ results=zcat({'latin_name_text':'sap*',
+ 'sort_on':'latin_name'})
+
+PrincipiaSearchSource
+~~~~~~~~~~~~~~~~~~~~~
+
+You can choose to create indexes on any attribute or method that
+you would find useful to search on; however, one that is
+generally helpful is 'PrincipiaSearchSource'. Several of the
+built-in Zope objects, such as DTMLDocuments, and many add-on
+objects to Zope have a 'PrincipiaSearchSource' attribute or
+method that returns a value that is meant to be used for general
+purpose searching. Traditionally, 'PrincipiaSearchSource'
+would include the text in an object's title, it's body, and
+anywhere else you'd want to be able to search.
+
+For example, if you downloaded a zope product that managed
+our zoo, and it had an Animal type that you could add to your
+site, this animal type would probably expose a
+PrincipiaSearchSource that looked something like this::
+
+ def PrincipiaSearchSource(self):
+ "used for general searching for animal"
+ return self.title + ' ' + self.latin_name + ' ' \
+ + self.description + ' ' + self.environment
+
+So that, if you create a 'PrincipiaSearchSource' index and
+search again that, you can find this animal by using words
+that are in it's 'title', 'latin_name', 'description', or
+'environment', without having to worry about which field,
+exactly, they're in. This is similar to searching with a
+web search engine, in that you use can use a single text string
+to find the "right" information, without needing to know about
+the type of object you're looking for. It is especially
+helpful in allowing you to create a site-wide search: searching
+animals specifically by their 'latin_name' or 'environment'
+might be useful for a biologist in the right section of your
+site, but for a general purpose visitor, they might like
+to search using the phrase "jungle" and find results without
+having to know to search for that in the 'environment' field
+of a search form.
+
+If you create custom types, either using ZClasses, as shown
+above, or by using more advanced techniques described
+elsewhere, you should create a PrincipiaSearchSource method
+that returns appropriate object-wide text searching capabilities.
+
+ZCatalogs and CMF/Plone
+~~~~~~~~~~~~~~~~~~~~~~~
+
+ The CMF was built from the ground up to understand the
+ difference between things that are "content", such as a news item
+ or press release, and those things that are not, such as
+ a DTMLMethod used to show a press release, or a ZCatalog
+ object. In addition, the CMF includes several stock items
+ that are intended to be used for content, including:
+ Document, Event, NewsItem, and others. These content items
+ are already set up for autocataloging, so that any changes
+ made will appear in the catalog.
+
+ In non-CMF Zope, the traditional name for a general-purpose
+ catalog is 'Catalog' (though you can always create your own
+ catalog with any id you want; we've used the example
+ 'AnimalCatalog' in this chapter for a special-purpose catalog
+ for searching animal-specific info in our zoo.) Even though
+ 'Catalog' is the traditional name, Zope does not come with
+ such a catalog in the ZODB already, you have to create it.
+
+In CMF (and Plone, an out-of-the-box portal system built
+on top of the CMF), there is always a catalog created, called
+'portal_catalog', at the root of the CMF site. All of the
+built-in content objects (and almost every add-on content
+object for the CMF/Plone) are set to autocatalog to this
+'portal_catalog'. This is required, since many of the features
+of the CMF and Plone, such as listing current content, finding
+content of correct types, etc., rely on the 'portal_catalog'
+and the searching techniques shown here to function.
+
+In CMF and Plone, the index name 'PrincipiaSearchSource' is
+not traditionally used. Instead, an index is created called
+'SearchableText', and used in the same manner as
+'PrincipiaSearchSource'. All of the standard contentish
+objects have a 'SearchableText' method that returns things
+like title, description, body, etc., so that they can be
+general-text searched.
+
+Keeping Non-ZODB Content in ZCatalog
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ZCatalog is such a useful and powerful tool for searching,
+it's possible that you may want to use it to search data that
+is stored in placed other than in the ZODB. Later in this book,
+you'll learn about storing data in relational databases and
+being able to access and view that data from Zope. While Zope
+excels at working with relational databases, many databases
+have poor full-text-indexing capabilities. In addition, site
+visitors may want to search your site, as described above, for
+a single phrase, like "jungle", and not know or care if the
+information they're looking for is in the ZODB or in a
+relational database.
+
+To help with this, you can store information about relational
+database information in the ZCatalog, too. It's an advanced
+technique, and will require that you understand ZSQLMethods
+(described in the relational database chapter) and Python
+scripting. You can learn about this technique in
+`Cataloging SQL Data and Almost Anything Else
+<http://zope.org/Members/rbickers/cataloganything>`_.
+
+Add-On Index Types
+------------------
+
+TextIndexNG
+~~~~~~~~~~~
+
+TextIndexNG is a new text index that competes with ZCTextIndex.
+Unlike ZCTextIndex, TextIndexNG is an add-on product that must be
+separately installed. It offers a large number of features:
+
+- Document Converters
+
+ If your attribute value isn't plain text, TextIndexNG can convert
+ it to text to index it. This will allow you to store, for
+ instance, a PDF file in Zope
+ and be able to search the text of that PDF file. Current
+ formats it can convert are: HTML, PDF, Postscript, Word,
+ Powerpoint, and OpenOffice.
+
+- Stemmer Support
+
+ Reduces words to a stem (removes verb endings and
+ plural-endings), so a user can search for "car" and get "car"
+ and "cars", without having to try the search twice. It
+ knows how to perform stemming in 13 different languages.
+
+- Similarity Search
+
+ Can find words that are "similar" to your words, based on
+ the Levenshtein algorithm. Essentially, this measures the
+ distance between two terms using indicators such as how
+ many letters differ from one to another.
+
+- Near Search
+
+ Can look for words that are near each other. For example,
+ a search for "Zope near Book" would find results where
+ these words were close to each other in the document.
+
+- Customizable Parsers
+
+ Rather than having only one way to express a query, TextIndexNG
+ uses a "pluggable" architecture where a Python programmers can
+ create new parsers. For example, to find a document that
+ includes the word "snake" but not the word "python", you'd
+ search for "snake andnot python" in the default parser.
+ However, given your users expectations (and native language),
+ they might prefer to say "snake and not python" or "snake
+ -python" or such. TextIndexNG comes with three different
+ parsers: a rich, default one, a simple one that is suitable for
+ more general serarching, and a German one that uses
+ german-language words ("nicht" for "not", for example).
+ Although writing a new parser is an advanced task, it would be
+ possible for you to do so if you wanted to let users express
+ the question in a different form.
+
+- Stop Words
+
+ You can customize the list of "stop words" that are too common
+ to both indexing or search for.
+
+- Wilcard Search
+
+ You can use a "wildcard" to search for part of a word, such as
+ "doc*" to find all words starting with "doc". Unlike
+ ZCTextIndex, you can also use wildcards are the start of a
+ word, such as "\*doc" to find all words ending with "doc", as
+ well.
+
+- Normalization Support
+
+ Removing accented characters so that users can search for an
+ accented word without getting the accents exactly right.
+
+- Auto-Expansion
+
+ This optional feature allows you to get better search results
+ when some of the query terms could not be found. In this
+ case, it uses a similarity matching to "expand" the query
+ term to find more matches.
+
+- Ranking Support
+
+ Sorting of results based on their word frequencies,
+ similar to the sorting capabilities of ZCTextIndex.
+
+TextIndexNG is an excellent replacement for ZCTextIndex,
+especially if you have non-English language documents or expect to
+have users that will want to use a rich query syntax.
+
+Full information on TextIndexNG is available at
+http://www.zope.org/Members/ajung/TextIndexNG.
+
+FieldedTextIndex
+~~~~~~~~~~~~~~~~
+
+FieldTextIndex is a new index type that is not (yet) a standard
+part of Zope, but is a separate product that can be installed
+and used with a standard catalog.
+
+Often, a site will have a combined field (normally
+'PrincipiaSearchSource' or 'SearchableText', as described above)
+for site-wide searching, and individual fields for more
+content-aware searching, such as the indexes on 'latin_name',
+'environment', etc.
+
+Since it's slows down performance to concatenate catalog result
+sets directly, the best strategy for searching across many fields
+is often use the 'PrincipiaSearchSource'/'SearchableText'
+strategy of a single text index. However, this can be *too*
+limiting, as sometimes users want to search in several fields at
+once, rather than in all.
+
+FieldedTextIndex solves these problems by extending the standard
+ZCTextIndex so that it can receive and index the textual data of an
+object's field attributes as a mapping of field names to field
+text. The index itself performs the aggregation of the fielded
+data and allows queries to be performed across all fields (like a
+standard text index) or any subset of the fields which have been
+encountered in the objects indexed.
+
+In other words, a normal 'PrincipiaSearchSource' method would
+look something like this::
+
+ # concatenate all fields user might want to search
+ def PrincipiaSearchSource(self):
+ return self.title + ' ' + self.description \
+ + self.latin_name + ' ' + self.environment
+
+However, you have to search this all at once--you can't opt to
+search just 'title' and 'latin_name', unless you created separate
+indexes for these fields. Creating separate indexes for these
+fields is a waste of space and memory, though, as the same
+information is indexed several times.
+
+With FieldedTextIndex, your 'PrincipiaSearchSource' method would
+look like this::
+
+ # return all fields user might want to search
+ def PrincipiaSearchSource(self):
+ return { 'title':self.title,
+ 'description':self.description,
+ 'latin_name':self.latin_name,
+ 'environment':self.environment }
+
+This index can be searched with the normal methods::
+
+ # search like a normal index
+ zcat=context.AnimalCatalog
+ results=zcat({'PrincipiaSearchSource':'jungle'})
+
+In addition, it can be searched indicating which fields you want
+to search::
+
+ # search only specific fields
+ zcat=context.AnimalCatalog
+ results=zcat(
+ {'PrincipiaSearchSource':'query':'jungle',
+ 'fields':['title','latin_name']})
+
+In this second example, only 'title' and 'latin_name' will be
+searched.
+
+In addition, FieldedTextIndexes support *weighing*, so that
+different fields "weigh" more in the query weigh, and a match in
+that field influences the results so that it appears earlier in the
+result list. For example, in our zoo, matching part of an animals
+'latin_name' should count very highly, matching part of the
+'title' should count highly, and matching part of the description
+should count less so.
+
+We can specify the weighing like this::
+
+ # search with weighing
+ zcat=context.AnimalCatalog
+ results=zcat(
+ {'PrincipiaSearchSource':'query':'jungle',
+ 'field_weights':{
+ 'latin_name':3,
+ 'title':2,
+ 'description':1 }})
+
+This is a *very* powerful feature for building a comprehensive
+search strategy for a site, since it lets us control the results to
+better give the user what they probaby want, rather than returning
+documents based solely on how many times their search word appears.
+
+The examples given here are for searching a FieldedIndex using
+PythonScripts, however they can be searched directly from the
+REQUEST in a form like other fields.
+
+Since a FieldedTextIndex can act just like a normal ZCTextIndex if
+queried with just a search string, yet offer additional features
+above and beyond the normal ZCTextIndex, it's a good idea to use
+this for any text index where you'd concatenate more than one
+attribute or method result together, such as for 'SearchableText'
+or 'PrincipiaSearchSource'.
+
+FieldedTextIndex can be downloaded at
+http://zope.org/Members/Caseman/FieldedTextIndex.
+Full documentation on how to create this type of index, and further
+information on how to search it, including how to search it from
+web forms, is available in the README file that comes with this
+product.
+
+Conclusion
+----------
+
+The cataloging features of ZCatalog allow you to search your objects
+for certain attributes very quickly. This can be very useful for sites
+with lots of content that many people need to be able to search in an
+efficient manner.
+
+Searching the ZCatalog works a lot like searching a relational
+database, except that the searching is more object-oriented. Not all
+data models are object-oriented however, so in some cases you will want
+to use the ZCatalog, but in other cases you may want to use a
+relational database. The next chapter goes into more details about how
+Zope works with relational databases, and how you can use relational
+data as objects in Zope.
Modified: zope2book/trunk/source/index.rst
===================================================================
--- zope2book/trunk/source/index.rst 2009-02-10 23:18:36 UTC (rev 96430)
+++ zope2book/trunk/source/index.rst 2009-02-10 23:26:53 UTC (rev 96431)
@@ -27,6 +27,7 @@
AdvZPT.rst
ScriptingZope.rst
ZopeServices.rst
+ SearchingZCatalog.rst
AppendixA.rst
Contributions.rst
More information about the Checkins
mailing list