Brian, Some code you can use as a starting point to produce KWIC (key word in context) can be found here: http://zope.org/Members/Ioan/SiteSearch. It's old but works for me under Zope 2.7.3. This implementation requires the searchable text to be in Catalog metadata, which seems to be a bad thing, but I have never really understood just how bad.. The recent discussions about memory use have got me thinking again about this; I would appreciate any pointers to discussion/explanation of catalog metadata and memory. Best, Ken __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
----- Original Message ----- From: "Ken Ara" <feedreader@yahoo.com>
Some code you can use as a starting point to produce KWIC (key word in context) can be found here: http://zope.org/Members/Ioan/SiteSearch. It's old but works for me under Zope 2.7.3.
This implementation requires the searchable text to be in Catalog metadata, which seems to be a bad thing, but I have never really understood just how bad..
We have experimented with storing compressed and uncompressed metadata (up to 20k bytes per zcatalog record). This worked fine for us (in terms of retrieval speed and zodb size) until we hit about 500k records (we used our own KWIC scripts not SiteSearch). At that time retrieval time started to increase to unacceptable times (over 2 seconds per search) and the zodb started to become unwieldly (5gb+). At that time (slow retrieval speed) we had about 5 million objects in the zodb (after packing). I would suggest that storing lots of metadata is workable if: 1) you don't have a lot of records 2) you don't store too much metadata/record 3) you have lots (1gb+) of RAM 4) you have fast disks 5) you have a fast cpu We currently have a zcatalog with a single ZCTextindex which holds about 1 million records (zodb size is under 3gb). Our retrieval speed, include KWIC processing, is under 1.5 seconds per search. We have very little metadata (less than 100 bytes per record), and access the final result set objects to get the data we need for KWIC processing and result set display. HTH Jonathan
--On Donnerstag, 20. Januar 2005 12:24 Uhr -0500 Jonathan Hobbs <toolkit@magma.ca> wrote:
----- Original Message ----- From: "Ken Ara" <feedreader@yahoo.com>
Some code you can use as a starting point to produce KWIC (key word in context) can be found here: http://zope.org/Members/Ioan/SiteSearch. It's old but works for me under Zope 2.7.3.
This implementation requires the searchable text to be in Catalog metadata, which seems to be a bad thing, but I have never really understood just how bad..
We have experimented with storing compressed and uncompressed metadata (up to 20k bytes per zcatalog record). This worked fine for us (in terms of retrieval speed and zodb size) until we hit about 500k records (we used our own KWIC scripts not SiteSearch). At that time retrieval time started to increase to unacceptable times (over 2 seconds per search) and the zodb started to become unwieldly (5gb+). At that time (slow retrieval speed) we had about 5 million objects in the zodb (after packing).
I would suggest that storing lots of metadata is workable if:
1) you don't have a lot of records 2) you don't store too much metadata/record 3) you have lots (1gb+) of RAM 4) you have fast disks 5) you have a fast cpu
We currently have a zcatalog with a single ZCTextindex which holds about 1 million records (zodb size is under 3gb). Our retrieval speed, include KWIC processing, is under 1.5 seconds per search. We have very little metadata (less than 100 bytes per record), and access the final result set objects to get the data we need for KWIC processing and result set display.
Dieters AdvancedQuery and ManagableIndexes products might help you all to optimize your catalogs. -aj
On Thu, Jan 20, 2005 at 06:48:57PM +0100, Andreas Jung wrote:
Dieters AdvancedQuery and ManagableIndexes products might help you all to optimize your catalogs.
+1 on AdvancedQuery. It's let me do some filtering that I have no idea how I could have done otherwise. -- Paul Winkler http://www.slinkp.com
Andreas Jung wrote at 2005-1-20 18:48 +0100:
...
--On Donnerstag, 20. Januar 2005 12:24 Uhr -0500 Jonathan Hobbs
----- Original Message ----- From: "Ken Ara" <feedreader@yahoo.com>
Some code you can use as a starting point to produce KWIC (key word in context) can be found here: http://zope.org/Members/Ioan/SiteSearch. It's old but works for me under Zope 2.7.3.
This implementation requires the searchable text to be in Catalog metadata, which seems to be a bad thing, but I have never really understood just how bad.. ... Dieters AdvancedQuery and ManagableIndexes products might help you all to optimize your catalogs.
Thank you Andreas! But neither of these products can help with KWIC or large data in the Catalog metadata. -- Dieter
Ken Ara wrote at 2005-1-20 09:06 -0800:
... This implementation requires the searchable text to be in Catalog metadata, which seems to be a bad thing, but I have never really understood just how bad..
You should *NOT* put fields into the metadata table if they could be larger. We had put the "description" into it and observed huge transactions (a few byte change to a news resulted in a 500 kB transaction) with correspondingly large modification times. Searches, too, were very slow because such huge blocks had to be loaded from the ZEO server. Read large volume data from the object directly. This is efficient, when you need it only for individual objects or the objects in a moderately sized batch. -- Dieter
participants (5)
-
Andreas Jung -
Dieter Maurer -
Jonathan Hobbs -
Ken Ara -
Paul Winkler