[Zope-dev] Shared lexicons for ZCTextIndex (was: Re: [Zope-Checkins] CVS: Zope/lib/python/Products/ZCTextIndex - ZCTextIndex.py:1.32)

Casey Duncan casey@zope.com
Thu, 15 Aug 2002 09:52:21 -0400


On Thursday 15 August 2002 09:21 am, Jim Fulton wrote:
> The original reason to share vocabularies was that multiple fields
> often came from the same human "vocabulaties". The idea was that=20
vocabularies
> would encompass a number of features including:
>=20
> - Words (or n-grams) used
>=20
> - Synonyms
>=20
> - Stemming rules
>=20
> - Stop words
>=20
> - Splitting rules
>=20
> There was, potentially, a lot of information to be shared and it would
> often be important, for consistency to share the same rules for differe=
nt
> fields that contained the same sort of content. Sharing had as much
> to do with using consistent rules than it did with optimization.
>=20
> Unfortunately, the old text index never implemented a lot of these idea=
s. :(
>=20
> The pipe-lining model used by ZCTextIndex moves some of this functional=
ity
> out of the lexicon and leaves some of these ideas unimplemented, as did
> TextIndex.

I'm not sure what you mean. The pipelining is defined and executed in the=
=20
lexicon.
=20
> I think that there is at least potential value in sharing lexicons.
> Of course, a down side is that it complicates set up.

I guess the main complaint was that given a set of indexes sharing a lexi=
con,=20
deleting the lexicon and replacing it with another one had no effect on t=
he=20
indexes and in fact removes your ability to manage their lexicon at all. =
So=20
you must replace all of the indexes to use the new lexicon by hand.

Admittedly this is really more of a user interface and management issue t=
hen=20
anything. Zope is just not very good at managing one to many relationship=
s=20
unless the one is the container of the many. 8^(
=20
> On the subject of referencing lexicons by path rather than using direct
> references, I'm inclined to agree that direct references are better for
> simplicity and speed. It's easy enough to add a new index when you
> want to change a lexicon. (Well, there are some complications having to=
 do
> with making sure that you get all the needed data into the new index...=
)

The current fix is a compromise that does a traversal as seldom as possib=
le.=20
unfortunately it means it must be even more complex then either a simple=20
direct ref or path reference would be.

I'm thinking about adopting an alternative fix, which keeps the direct=20
reference and the path to the lexicon and gives you a management interfac=
e to=20
select a new lexicon or simply connect to a replacement (which would clea=
r=20
the index). It could also tell you if the lexicon used by the index is th=
e=20
actual one referenced from the path.=20

I dunno though, maybe we would be better off as before and just document =
how=20
you go about the replacement procedure by hand. The management interface=20
could still be improved though, perhaps allowing you to manage the lexico=
n=20
through the index in the case that the original lexicon reference was=20
removed. Before there was no disclosure and no way to get to the "deleted=
"=20
lexicon.

-Casey