Hi there Are the current indexing solutions that we have in Zope 2 and 3 the best that we can have? Is there any way to improve the amount of concurrent indexing a field or text index can handle. The Zope 3 implementation doesn't look significantly different to the Zope 2 one in that it still uses a BTree for forward and reverse index. I have a basic stress test where 10 concurrent threads index 10 random words in a FieldIndex and get too many conflict errors to even consider this a usable scalable indexing solution. I don't really want to index objects in another backend. I really would like to make the ZODB work for me here. Are there solutions here? Can one employ some of the QueueCatalog conflict resolution strategies to make indexes more resilient? Or should one use some locking strategy instead? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Roché Compaan wrote:
Hi there
Are the current indexing solutions that we have in Zope 2 and 3 the best that we can have? Is there any way to improve the amount of concurrent indexing a field or text index can handle. The Zope 3 implementation doesn't look significantly different to the Zope 2 one in that it still uses a BTree for forward and reverse index.
I have a basic stress test where 10 concurrent threads index 10 random words in a FieldIndex and get too many conflict errors to even consider this a usable scalable indexing solution. I don't really want to index objects in another backend. I really would like to make the ZODB work for me here.
Are there solutions here? Can one employ some of the QueueCatalog conflict resolution strategies to make indexes more resilient? Or should one use some locking strategy instead?
Check out QueueCatalog, which batches up indexing changes for processing within a single thread. svn://svn.zope.org/repos/main/Products.QueueCatalog Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG7xSB+gerLs4ltQ4RAl+1AJ91FF672IEULtl4wMlSm+eoDIIaSgCfZJfP vzzs3S+KsfEliNXdWWNos7E= =3p6D -----END PGP SIGNATURE-----
On Mon, 2007-09-17 at 19:57 -0400, Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Roché Compaan wrote:
Hi there
Are the current indexing solutions that we have in Zope 2 and 3 the best that we can have? Is there any way to improve the amount of concurrent indexing a field or text index can handle. The Zope 3 implementation doesn't look significantly different to the Zope 2 one in that it still uses a BTree for forward and reverse index.
I have a basic stress test where 10 concurrent threads index 10 random words in a FieldIndex and get too many conflict errors to even consider this a usable scalable indexing solution. I don't really want to index objects in another backend. I really would like to make the ZODB work for me here.
Are there solutions here? Can one employ some of the QueueCatalog conflict resolution strategies to make indexes more resilient? Or should one use some locking strategy instead?
Check out QueueCatalog, which batches up indexing changes for processing within a single thread.
svn://svn.zope.org/repos/main/Products.QueueCatalog
I use QueueCatalog often and I know how it works. But if an application requires immediate indexing then QueueCatalog is not a solution. Sorry if I was unclear but what I'm really asking is if it is possible to improve the conflict handling of the current indexes that we have in Zope. I am also asking if concurrent indexing in the ZODB is a realistic goal. The reason I mentioned QueueCatalog is not because of its batch indexing in one thread, but because it has a lot of conflict handling on the queue itself. Sessions might be another good example of a product that tries hard to handle conflicts. Do these products have strategies that can be made to work for indexes? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
Roché Compaan wrote at 2007-9-18 08:55 +0200:
... Sorry if I was unclear but what I'm really asking is if it is possible to improve the conflict handling of the current indexes that we have in Zope. I am also asking if concurrent indexing in the ZODB is a realistic goal.
I have implemented "Conflict Reduced Indexes". They essentially work as follows: Standard indexes use for efficiency reasons a complex dance with a quite high conflict potential: The document list for a term can have 3 implementation missing, represented as an integer, represented as an IITreeSet. Whenever the implentation type changes, a conflict will occur when a concurrent request accesses the same document list. The conflict reduced indexes use only 2 implementation types: missing and IITreeSet and once, the list used an IITreeSet, it remains this way. This can leverage the conflict resolution build in "OOBTree" and "IITreeSet" quite well. Nevertheless, it turned out that these separate indexes were not worth the efford (meanwhile, they have been replaced by "ManagableIndex"). Keep in mind, that the conflict behaviour improves when your have lots of indexed data because the modifications then spread over a large tree, significantly reducing the conflict probability. -- Dieter
On Tue, 2007-09-18 at 20:01 +0200, Dieter Maurer wrote:
Roché Compaan wrote at 2007-9-18 08:55 +0200:
... Sorry if I was unclear but what I'm really asking is if it is possible to improve the conflict handling of the current indexes that we have in Zope. I am also asking if concurrent indexing in the ZODB is a realistic goal.
I have implemented "Conflict Reduced Indexes".
They essentially work as follows:
Standard indexes use for efficiency reasons a complex dance with a quite high conflict potential: The document list for a term can have 3 implementation missing, represented as an integer, represented as an IITreeSet.
Whenever the implentation type changes, a conflict will occur when a concurrent request accesses the same document list.
The conflict reduced indexes use only 2 implementation types: missing and IITreeSet and once, the list used an IITreeSet, it remains this way. This can leverage the conflict resolution build in "OOBTree" and "IITreeSet" quite well.
Nevertheless, it turned out that these separate indexes were not worth the efford (meanwhile, they have been replaced by "ManagableIndex").
Thanks for your feedback. I refactored things a little bit so that I don't require immediate indexing which makes QueueCatalog a good solution. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
Roché Compaan wrote: <snip />
I use QueueCatalog often and I know how it works. But if an application requires immediate indexing then QueueCatalog is not a solution.
Sorry if I was unclear but what I'm really asking is if it is possible to improve the conflict handling of the current indexes that we have in Zope. I am also asking if concurrent indexing in the ZODB is a realistic goal.
The reason I mentioned QueueCatalog is not because of its batch indexing in one thread, but because it has a lot of conflict handling on the queue itself. Sessions might be another good example of a product that tries hard to handle conflicts. Do these products have strategies that can be made to work for indexes?
By definition, if you're not using QueueCatalog then you don't have a queue to perform conflict resolution on. QueueCatalog does not perform conflict resolution as such, but optimises the number of cataloguing operations by applying rules to the queue to remove duplicate operations. Indexes are inherently difficult to perform conflict resolution on. As Dieter mentioned their implementation is designed for efficient reading, not efficient writing. Do you really need to have concurrent indexing? Would concurrently updating the metadata records (there is an option in QueueCatalog for this) be sufficient? Laurence
Laurence Rowe wrote at 2007-9-19 10:03 +0100:
... Indexes are inherently difficult to perform conflict resolution on. As Dieter mentioned their implementation is designed for efficient reading, not efficient writing.
You did not mean me? I have implemented the "Conflict Reduced Indexes" because my colleague thought he would need to import mass data in parallel threads -- thus, the reason was to support parallel writing. Fortunately, the Zope based system was much faster than its C++ implemented predecessor. Thus, the need for mass import parallelization was not given to the expected extent.... We also use the QueueCatalog (in fact a two level queue: one for fast indexes and one for text indexes) and have only the most important workflow indexes updated inline. The workflow indexes, of course, need to be updated inline as otherwise, the workflows would not work reliably. -- Dieter
participants (4)
-
Dieter Maurer -
Laurence Rowe -
Roché Compaan -
Tres Seaver