Hi all, Giovanni Maruzzelli wrote:
We think that Abel is absolutely right:
if in the same almost empty folder we add and catalog an object with one word (and now we have optimized and reduced the number of indexes to 11) it make a transaction of 73K, while if the object contains 300 words with the same other indexes or properties, the transaction is 224K, and if all is the same but the object contains 535 words the transaction is 331K.
And we are using now a catalog with only some 3000 document indexed with a medium lenght of each document around 1K.
Well, Chris certainly knows more about the internals of ZCatalog than I do, so we should not ignore his comments to my mail :) Chris McDonough wrote:
If you now add a new document containing 5 of these frequent words, 5 larger BTrees will be updated. [Chris, let me know, if I'm now going to tell nonsense...] I assume that the entire updated BTrees = 120000 bytes will be appended to the ZODB (ignoring the less frequent words) -- even if the document contains only 1 kB text.
Nah... I don't think so. At least I hope not! Each bucket in a BTree is a separate persistent object. So only the sum of the data in the updated buckets will be appended to the ZODB. So if you add an item to a BTree, you don't add 24000+ bytes for each update. You just add the amount of space taken up by the bucket... unfortunately I don't know exactly how much this is, but I'd imagine it's pretty close to the datasize with only a little overhead.
OK, this made me curious, so I made test similar to the one by Giovanni. I started with a ZCatalog containing 21616 records; the catalog contains only one text index, no keyword index, no field index. I copied one of the indexed documents; the text is 2645 bytes long; wc tells me that it has 313 words. Next, I packed the data base in order to have a "clean start point". After packing, Data.fs has a size of 233661963 byte. Then I cataloged the new object using my "lazy catalog". Since I have only one new document, this is basically the same as using CatalogAwareness. After indexing, the data base has grown to 233851090 bytes -- an increase of 189127 bytes. Then I packed the data base again, resulting in a size of 233666237 bytes. So the "net increase" is indeed 233666237-233661963 = 4274 bytes, as you expected, but obviously a few more data base records need to be updated. Abel