Chris McDonough wrote:
Yikes. I wonder if this overhead comes from Vocabulary updates... thanks very much for doing this test.
No, this should definetely _not_ be related to vocabulary: I simply copied an already indexed document and let ZCatalog.catalog_object munge the copy. So all words appearing in this copy already have an entry in the Vocabulary. I also checked it during a test without meta data: The vocabulary doed not increase.
Clearly we need to pin it down. This is very disappointing. :-( Any further info you dig up is appreciated.
Well, I don't have any at present. But allow me to make some guess :) If a new record is added to a BTree, is can be necessary to move a few other records around in order to keep the tree balanced. And some of the BTrees affected by my test are definitely somewhat larger, because I did not use German stop words during the test, so words like "und", "der", "die" are indexed which appear in _every_ document. (well, at least in _nearly_ every document)
You didn't have any metadata stuff set up, did you? I imagine even if you did, that they couldn't possibly account for 200K worth of extra stuff.
Ouch, I forgot about the meta data. So here is the result of another test, with all meta data thrown away: Packed data base size, one document (same during the last test) to be cataloged: 229170221 bytes. data base size after updating the catalog run: 229310316 bytes size after packing: 229172566 bytes So, same as before :( Abel